Hacker News new | past | comments | ask | show | jobs | submit login
"Unstripping" binaries: Restoring debugging information in GDB with Pwndbg (trailofbits.com)
165 points by aa_is_op 4 months ago | hide | past | favorite | 23 comments



GDB loses significant functionality when debugging binaries that lack debugging symbols

IMHO from experience with other debuggers, GDB is actually hostile to debugging at the Asm level, due to many perplexing design choices which may or may not be deliberate. Things like needing to add a superfluous asterisk when breakpointing on an address, the "disassemble" command not being able to do what it says and instead complaining about a lack of functions, etc.


I have definitely gotten to the point of "Fine, I'll just use IDA Pro then".

Of course, visual debuggers are another story entirely, but I'm not really thrilled with them either. For example I don't recall there being a good way to say "Decode the address at rax as a WNDCLASSA" or something like that in IDA. (I'm crossing my fingers for a Cunningham's Law here.)


You can jump to the address and then declare the data as a struct.

It is annoying though that you can't make the register display show a particular type, it only shows unsigned hex integers. If I'm tracking a 32-bit float it is very frustrating, it won't even show you alternate representations on hover...

For as good and expensive as IDA is, the UX sure is lacking.


Man, I can not believe it never occurred to me to do that. It's certainly what I would do when using IDA outside of debugging, but I guess it's just hard to internalize that you can do all of the same things to live memory in the debugger.


I don't recall what it was called in the menu, but it was definitely possible to assume a struct on a particular address. Muscle memory tells me the button is U, even though actual memory fails me.


They were deliberate.

I added support for a lot of DWARF2 way back in the dark ages of time, and hit most of these design choices.

It was basically built to support STABS (or was it DBX, i always forget).

Everything else was an afterthought.

That doesn't make it the wrong choice mind you, and lots of things have been made better or redesigned in the past 2 decades.

But to your point, it wasn't really meant for the use case of assembly debugging.

In part because assembly debuggers already existed on most OSen.

Also, because if you were using GDB, you were supposed to have both the source, and the debug info (because it was the FSF debugger)


+1 there are many pain points, probably for historic reasons. *nix almost always comes with the source, so binary only debugging is never a priority.


True up to UNIX V6, and the FOSS clones, not so much for all big iron UNIXes.


> the "disassemble" command not being able to do what it says and instead complaining about a lack of functions, etc.

For reference (for other readers), while I totally agree with you, disassembling at a particular address can be done via something like "x/20i $pc" ("display 20 instructions starting from content at $pc").

Something quite annoying as well is that you have to patch GDB in order to be able to use software single-step on custom OS/baremetal targets (IDA has no such issue, it is an option in their UI). GDB doesn't even honor their own remote stub protocol.


I read “pwndgb” as Welsh for a good 5 seconds before realising which site I was on.



Beautiful


I’m curious how this compares with https://github.com/mahaloz/decomp2dbg


`decomp2dbg` mostly just provides pc to source mapping. It also makes function arguments and local stack variables available through gdb convenience variables (function arguments are broken since it assumes they stay in the registers and are never clobbered). This plugin is a bit nicer because of the syncing it does with binja. Coincidentally, I am working on a binja plugin which live imports types, symbols, and function data using dwarf.


maybe a hated take but: debugging with symbols is like playing a shooter with aimbot. i don't feel GDB has many issues. you can trace program execution, put breaks, disassemble. what more do you need?

I don't think needing symbols is a debugger problem. a lot of code that needs reversing / debugging doesn't come with any debug information. Is that really the problem of a (simple) debugger? GDB is a 'simple' debugger, and it does what it says on the box. It doesn't try to interpret stuff for you.

The difficulty with tools like IDA and binary ninja is that a lot of the heuristics it uses to pull debugging info for you, are basically guesstimates it makes. They usually do not come with guarantees of correctness. What GDB provides you, is mostly correct, albeit much more limited. You really need to dig into the sources of these other tools to understand how its doing its guesswork, and if that's what you trust and want. Do you know all the ways your tools uses to guess what it's looking at? Do you want to reverse your tools to find out? In GDB, you can (and must) use your own wits (and feverish note taking) to do this work.

The smarter your tool seems, the more 'interpretation' it does. which might not be what you need. If you run on a common OS target, with a sanely built binary, it might help a lot. But if you want to look at weird binaries or targets, it's imho better to do this work yourself. (I do like me a good headache though - that might be different for other ppl.)

all in all, ofcourse, there's good use-cases for each tool, and ofcourse a big part is personal taste - if you do debugging / reversing for a living I totally get you might want a tool that does more for you out of the box. GDB does little. It doesn't claim to do more

As for this work done integrating binary-ninja with pwndbg. that's pretty epic still regardless of my sad opinions ;)! Great job! Know a lot of ppl are going to love it.


sounds like an interesting direction, but I don't understand why should we have it coupled to specific tool (pwndbg)? Why not implement a BinaryNinja plugin to dump all user-defined names (function names, stack variables), together with an original (stripped) binary to the new ELF/.exe file, with symbol table and presumably with DWARF section?


I've developed a Ghidra extension that exports object files. I've considered generating debugging symbols in order to improve the debugging experience when reusing these object files in new programs, but I keep postponing that feature for various reasons.

Executable formats have at least one and often multiple debugging data formats which are very different from each other: ELF has STABS and DWARF version 1 to 5, MSVC has at least COFF symbols and PDB (which isn't documented)... Even discarding the old or obsolete stuff, there's no universal solution here. gdb+pwndbg seems to side-step this issue by integrating the debugger with Binary Ninja.

Projecting reverse-engineered information into a debugging data format would also be a technical challenge once you go past global variables and type definitions. Debuggers already have a terrible user experience when stepping through functions in an optimized executable ; I doubt that reverse-engineered debugging data would be any better.

Toolchains also don't do a lot of validation or diagnostics on their inputs and I can tell from experience that writing correct object files from scratch is already quite tricky. I expect that serializing correct and meaningful debugging data would be much harder than that.

Doing this at the native executable level has the obvious advantage of working out of the box with standard tooling, but it would be a lot of work. I've already taken 2 1/2 years to make an object file exporter that's good enough for my needs and I'm still balking at generating DWARF debugging data every time I've considered it. I'm resigned to a terrible debugging experience and so far I've managed to muddle through it.


> Debuggers already have a terrible user experience when stepping through functions in an optimized executable ; I doubt that reverse-engineered debugging data would be any better.

Actually, I suspect it could be a world of difference. The main failures of optimized debugging are that code motion makes line number tables more of a suggestion, the source code values may disappear in favor of other related values (e.g., SROA or changing A[i] to p++), live ranges of variables may shrink, and debugging information may only support variables in stack slot or other specific locations. If you generating debugging based on decompiled code, you can control the output code to make the first two problems more or less go away, and so you only really have to worry about narrow live ranges (can't do much about that) or debuggers not supporting the features you need (which, given DWARF, is a very real possibility).


> PDB (which isn't documented)...

What about https://github.com/Microsoft/microsoft-pdb ? Not technically documentation, but the closest thing to it.


I was mostly aware of https://llvm.org/docs/PDB/index.html, which paints a rather complex high-level picture.

At any rate, while my extension has exporters for both ELF and COFF I'm personally using only the former. The latter is user-contributed and I don't know much about the Microsoft toolchain and file formats.


that's already available in binaryninja out of the box, via the (seemingly undocumented) 'plugins' -> 'export as dwarf' menu item.


Not really, export as dwarf only gives types and symbols, it does not generate dwarf pc -> source line information, which this plugin provides via a custom context pane. Manually running the export every time a pc step occurs would also be quite painful.


This is really cool. Anyone know what license level of binja is required? Does it work with the free edition?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: