I tried reversing a medium-complexity program, but after a long time of digging through seemingly nonsensical data structures, it seemed very clear to me that it was a C++ program and the pointery mess I was seeing was likely a result of classes and inheritance.
Has Ghidra gotten better at dealing with those, and is there a good tutorial how to best handle it?
I am reversing for almost 25-30 years, a lot of things have changed in those years, especially compilers and optimizations. It is hard to give advice, as I learned gradually.
But I think best advice is aggressive labeling of the functions with their intent I guess.
It is very similar to jigsaw puzzle, more pieces you put it is easier to put next.
Where "vtable" is a compiler generated struct with function pointers. So here's your "pointery mess", neatly abstracted away by the syntax in the first example.
With multiple inheritance, it gets a bit more hairy than this, because the compiler has to do some offset adjustments on the casted pointer to get at the right vtable.
Although rare, virtual inheritance (used in multiple inheritance) makes things even worse because even "this" needs to be found through a vtable! So you get another level of indirection.
I was going to illustrate with an example like yours but I couldn't figure my way out through all the bits (including the virtual-base offsets in the vtable, or the VTT which is a table of vtables!).
The most recent version of Ghidra has a plugin to try and construct classes via RTTI info. This still only helps if your binary has RTTI (e.g. there's a dynamic_cast anywhere) and Ghidra still doesn't handle OOP that well even with it. There are other community plugins on Github for the same type of thing.
For heavy C++ binaries, IDA and BinaryNinja handle them a lot better, with the decompiled code looking like normal C++. You still can use Ghidra, it just means you have to do manual recognition of `this` pointers and virtual calls more.
If you haven’t tried it already, Compiler Explorer (https://godbolt.org/) is a great way to “cheat” at learning reverse engineering. Why? Because you can control what the C code looks like, and it’ll show you the assembly to compare. Eventually you’ll get a feel for how compilers generate code for various operations.
I just for the life of me don't think I'd ever be able to understand the x86 instruction set and truly get into reverse engineering. Most RE's I've known have seemingly had immeasurably deep knowledge about everything from the x86 platform to OS API's and their weird quirks (looking at you Windows)
You sort of just have to dive into it, it gets better with actual practice. The first time I did some practical RE i was trying to ahem crack something and spent over a few weeks on it. Then one night I couldn't go to sleep and just thought fk it i'll give it another stab, made myself a mug of coffee and voila, everything just clicked together.
All those people were exactly in the situation you are in right now at some point in their lives. You don’t have to learn the whole thing in one go: start by understanding how your C code works and you’ll slowly come to understand more and more low-level code as a side effect.
The x86 instruction set is insanely complicated due to decades of history, but that’s exactly what drew me in, personally. Over a few years of diving more and more into the architecture, I’ve become somewhat fascinated by it. Despite all history, there’s still some interesting patterns that can be uncovered in the bit representations.
A nice exercise that’s helped me a lot is to write a very simple program in assembly, look at the output, and see how it all pieces together. Even random instructions that make no sense other than to see how it’s turned into machine code.
Could someone add some references to more advanced reverse engineering resources? I was sold a bot that was closed source and ran it in a VM and then recompiled it using Ghidra. It ended up being malware because it tried to execute a function called sendToEmail() which basically just sent decrypted wallet info to the malware writers email address (hardcoded, of course).
I wasn't able to arrive at this conclusion by reverse engineering the code - it only happened to crash suddenly when it started to execute the sendToEmail() function. (Lucky me!)
I post reverse-engineering write-ups at https://re.kv.io, not as regularly as I'd like but some of these exercises were pretty interesting (or so I thought). They sort of go in order of increasing difficulty, although it's hard to tell when I start working on a binary how long it'll eventually take.
These are almost exclusively crackme/keygen-me exercises found on crackmes.one or similar websites, and came from my realization that the notes I was taking for myself while disassembling these programs might be useful to others trying to learn how to do the same, or to those who might be stuck on a particular binary that I managed to understand.
I was also thinking of exploring malware, although the articles would necessarily have a different structure since there isn't a clear challenge like with a crackme asking you for a password, and there could always be more to say. It still seems like it would be interesting, especially if they use advanced obfuscation techniques.
This is an excellent article with great simple tips, like highlighting most Calculator apps' hex mode and the mile-counter analogy. Great work by the author.
Has Ghidra gotten better at dealing with those, and is there a good tutorial how to best handle it?