Hacker News new | past | comments | ask | show | jobs | submit login

Anyone have tips on getting good stack traces in opt builds? I am really struggling with it at the moment. LLVM sanitizers all generate brilliant stack traces by forking llvm-symbolizer and feeding it the goods. But during runtime crashes on optimized binaries I don't seem to get good stack traces. One of the problems is that some library backtrace functions do not print the base address of the DSO mapping, which means they are printing a meaningless PC that can't be used to find file and line later.



Mozilla has a tool to fix up the bad dladdr-based printing methods in log files here: https://github.com/mozilla/fix-stacks/. Note that it relies on doing a little bit of post-processing on dladdr to get the base of the DSO it is in: https://searchfox.org/mozilla-central/source/mozglue/misc/St...

As for whether or not you can use this in a signal handler... well, I hate reading the POSIX standard with regard to signal safety because it's just not well-written, but as far as I can tell, a non-async-signal-safe function can be safely called from a signal handler for a synchronous signal (which most of the interesting signals for dumping stack traces are--it's only something like dump-stack-trace-on-SIGUSR1 that's actually going to be an asynchronous signal), so long as it is not interrupting a non-async-signal-safe function. So as long as you're not crashing in libc, it should be kosher.


Have you looked into using a library like Breakpad (https://chromium.googlesource.com/breakpad/breakpad/)? It's probably too much work to integrate for local debugging only though.


If you're on *NIX have you tried just invoking gstack or similar as an external process? https://linux.die.net/man/1/gstack

Or, indeed, getting a core dump and applying GDB to it. GDB seems generally pretty good at reconstructing stacks at arbitrary points in application runtime.

We've also used a combination of libunwind and https://linux.die.net/man/1/addr2line to produce good crash dumps when GDB is not necessarily available.


To which of the projects that are all named "libunwind" do you refer?


This one, I believe: https://github.com/libunwind/libunwind

ETA: Thinking about it, I'm not really sure what it'd do for C++ - I guess you'd end up with mangled names, so if you want sensible names you might need to demangle (either as a post-processing step or within the dumper) too.

I don't think you'll get any decoded argument values out of it either, so I guess it depends what backtrace info is needed.


FWIW, on Windows, the ETW event instrumentation that captures dispatch (i.e. thread scheduling) and loader info (I think it's literally the DISPATCH+LOADER flags to xperf) solves this problem, which, inherently is: at any arbitrary point in time, given an IP/PC, what module/function am I in?

If you have timestamped module load/unload info with base address + range, plus context switch times that allow you to figure out which specific thread & address space was running at any given CPU node ID + point in time, you can always answer that question. (Assuming the debug infrastructure is robust enough to map any given IP to one specific function, which it should be able to do, even if the optimizer has hoisted out cold paths into separate, non-contiguous areas.)

I realize this isn't very helpful to you on Linux (if it's any consolation I'm on Linux these days too), but, sometimes it's interesting to know how other platforms handle it.


Rule number one: never use Clang; its optimizers destroy too much information unlike GCC.

You can use `dl_iterate_phdr` at startup if you need DSO info?


I've had good luck with -fno-omit-frame-pointer, omitting it is an unfortunate default, and makes stack traces horrible.



Looks worth investigating. Also making me wonder how many different backtrace implementations are out there on GitHub with Google copyrights!


Is calling dladdr on the addresses not enough for you?


It's not async signal safe, so I did not even try that.

I think there's a huge amount of complexity both inherent to the problem and caused by fifty years of accumulated bad habits, which is indicated by the thousands of lines of code in compiler-rt dedicated to handling this issue. I'd like to call their library functions but they are all in private-looking namespaces. I also tried to use the Abseil failure signal handler but it often fails to unwind and even when it does unwind has a habit of just printing unknown for the symbol name or file, and never prints the DSO base addresses.


libbacktrace does it all, and there is also that feature in the C++ standard library now.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: