Anyone have tips on getting good stack traces in opt builds? I am really struggling with it at the moment. LLVM sanitizers all generate brilliant stack traces by forking llvm-symbolizer and feeding it the goods. But during runtime crashes on optimized binaries I don't seem to get good stack traces. One of the problems is that some library backtrace functions do not print the base address of the DSO mapping, which means they are printing a meaningless PC that can't be used to find file and line later.
As for whether or not you can use this in a signal handler... well, I hate reading the POSIX standard with regard to signal safety because it's just not well-written, but as far as I can tell, a non-async-signal-safe function can be safely called from a signal handler for a synchronous signal (which most of the interesting signals for dumping stack traces are--it's only something like dump-stack-trace-on-SIGUSR1 that's actually going to be an asynchronous signal), so long as it is not interrupting a non-async-signal-safe function. So as long as you're not crashing in libc, it should be kosher.
Or, indeed, getting a core dump and applying GDB to it. GDB seems generally pretty good at reconstructing stacks at arbitrary points in application runtime.
We've also used a combination of libunwind and https://linux.die.net/man/1/addr2line to produce good crash dumps when GDB is not necessarily available.
ETA: Thinking about it, I'm not really sure what it'd do for C++ - I guess you'd end up with mangled names, so if you want sensible names you might need to demangle (either as a post-processing step or within the dumper) too.
I don't think you'll get any decoded argument values out of it either, so I guess it depends what backtrace info is needed.
FWIW, on Windows, the ETW event instrumentation that captures dispatch (i.e. thread scheduling) and loader info (I think it's literally the DISPATCH+LOADER flags to xperf) solves this problem, which, inherently is: at any arbitrary point in time, given an IP/PC, what module/function am I in?
If you have timestamped module load/unload info with base address + range, plus context switch times that allow you to figure out which specific thread & address space was running at any given CPU node ID + point in time, you can always answer that question. (Assuming the debug infrastructure is robust enough to map any given IP to one specific function, which it should be able to do, even if the optimizer has hoisted out cold paths into separate, non-contiguous areas.)
I realize this isn't very helpful to you on Linux (if it's any consolation I'm on Linux these days too), but, sometimes it's interesting to know how other platforms handle it.
It's not async signal safe, so I did not even try that.
I think there's a huge amount of complexity both inherent to the problem and caused by fifty years of accumulated bad habits, which is indicated by the thousands of lines of code in compiler-rt dedicated to handling this issue. I'd like to call their library functions but they are all in private-looking namespaces. I also tried to use the Abseil failure signal handler but it often fails to unwind and even when it does unwind has a habit of just printing unknown for the symbol name or file, and never prints the DSO base addresses.