Anyone have tips on getting good stack traces in opt builds? I am really struggl...

jcranmer · 2024-07-29T19:20:23 1722280823

Mozilla has a tool to fix up the bad dladdr-based printing methods in log files here: https://github.com/mozilla/fix-stacks/. Note that it relies on doing a little bit of post-processing on dladdr to get the base of the DSO it is in: https://searchfox.org/mozilla-central/source/mozglue/misc/St...

As for whether or not you can use this in a signal handler... well, I hate reading the POSIX standard with regard to signal safety because it's just not well-written, but as far as I can tell, a non-async-signal-safe function can be safely called from a signal handler for a synchronous signal (which most of the interesting signals for dumping stack traces are--it's only something like dump-stack-trace-on-SIGUSR1 that's actually going to be an asynchronous signal), so long as it is not interrupting a non-async-signal-safe function. So as long as you're not crashing in libc, it should be kosher.

bogwog · 2024-07-29T17:06:21 1722272781

Have you looked into using a library like Breakpad (https://chromium.googlesource.com/breakpad/breakpad/)? It's probably too much work to integrate for local debugging only though.

mark_undoio · 2024-07-29T17:23:47 1722273827

If you're on *NIX have you tried just invoking gstack or similar as an external process? https://linux.die.net/man/1/gstack

Or, indeed, getting a core dump and applying GDB to it. GDB seems generally pretty good at reconstructing stacks at arbitrary points in application runtime.

We've also used a combination of libunwind and https://linux.die.net/man/1/addr2line to produce good crash dumps when GDB is not necessarily available.

jeffbee · 2024-07-29T17:27:10 1722274030

To which of the projects that are all named "libunwind" do you refer?

mark_undoio · 2024-07-29T17:37:27 1722274647

This one, I believe: https://github.com/libunwind/libunwind

ETA: Thinking about it, I'm not really sure what it'd do for C++ - I guess you'd end up with mangled names, so if you want sensible names you might need to demangle (either as a post-processing step or within the dumper) too.

I don't think you'll get any decoded argument values out of it either, so I guess it depends what backtrace info is needed.

trentnelson · 2024-07-30T17:04:48 1722359088

FWIW, on Windows, the ETW event instrumentation that captures dispatch (i.e. thread scheduling) and loader info (I think it's literally the DISPATCH+LOADER flags to xperf) solves this problem, which, inherently is: at any arbitrary point in time, given an IP/PC, what module/function am I in?

If you have timestamped module load/unload info with base address + range, plus context switch times that allow you to figure out which specific thread & address space was running at any given CPU node ID + point in time, you can always answer that question. (Assuming the debug infrastructure is robust enough to map any given IP to one specific function, which it should be able to do, even if the optimizer has hoisted out cold paths into separate, non-contiguous areas.)

I realize this isn't very helpful to you on Linux (if it's any consolation I'm on Linux these days too), but, sometimes it's interesting to know how other platforms handle it.

o11c · 2024-07-29T17:49:02 1722275342

Rule number one: never use Clang; its optimizers destroy too much information unlike GCC.

You can use `dl_iterate_phdr` at startup if you need DSO info?

whatsakandr · 2024-08-01T20:06:24 1722542784

I've had good luck with -fno-omit-frame-pointer, omitting it is an unfortunate default, and makes stack traces horrible.

dllu · 2024-07-29T19:07:03 1722280023

I enjoy using backward: https://github.com/bombela/backward-cpp

jeffbee · 2024-07-29T19:12:41 1722280361

Looks worth investigating. Also making me wonder how many different backtrace implementations are out there on GitHub with Google copyrights!

saagarjha · 2024-07-29T16:39:26 1722271166

Is calling dladdr on the addresses not enough for you?

jeffbee · 2024-07-29T16:46:34 1722271594

It's not async signal safe, so I did not even try that.

I think there's a huge amount of complexity both inherent to the problem and caused by fifty years of accumulated bad habits, which is indicated by the thousands of lines of code in compiler-rt dedicated to handling this issue. I'd like to call their library functions but they are all in private-looking namespaces. I also tried to use the Abseil failure signal handler but it often fails to unwind and even when it does unwind has a habit of just printing unknown for the symbol name or file, and never prints the DSO base addresses.

mgaunard · 2024-07-30T08:56:28 1722329788

libbacktrace does it all, and there is also that feature in the C++ standard library now.