If you're ever wondering what the load address of a piece of code is, my favourite trick is to:
1) identify relative addresses of all C-style strings from the start of the binary
2) identify all absolute read addresses
3) find the load address where the most read addresses correspond to the most strings
This solves the problem for me in the vast majority of cases. If you don't have strings, function prologues work as well.
I don't think there's any tooling for this yet. I've had to write the same algorithm over and over. It works regardless of any header information at the start of the binary, though it will fail if there is data between chunks (but you may see multiple valid alignments that can hint to something like this happening)
Yep! But before doing all of this, check if any .so/.dll for emulating the raw piece of code you are studying is available, so you can do this:
1. check if in the .so/.dll you can spot any function referring to any recognizable (occurring only once) kind of binary blob - like images, or strings too
2. find the same function in the raw image, Ghidra's decompiler helps so much in this
3. find the address supposed to be pointing to the same binary blob (assuming that this blob is the same in the emulated and the real version)
4. scan the raw image for that same blob you have in the emulated version
5. do the maths to find the base address of the raw image.
Obviously all of this works if the raw binary image you have all refers to the same flat memory space and there's no partial relocation by the bootloader/program accepting this image going on.
Don't ask me about that time I painfully spent so much time without knowing I had one file with the odd name of "emulator.dll" staring at me all that time...
I wish I had seen this article sooner. It’s a really nice and succinct primer to some of the basics of using Ghidra for firmware analysis.
I have an expensive robotic cat toy that the manufacturer stopped supporting and I’ve been working on reverse engineering the firmware in my free time to better understand how it works. I have no background in reverse engineering so I’ve been learning a lot about identifying hardware, reading data sheets, and of course fumbling through Ghidra to disassemble the firmware.
After getting your bearings it becomes pretty easy to recognize the patterns of various standard C library functions like strlen, memcpy, etc. Others can be more challenging. Of course, with bespoke embedded hardware it can be much more difficult.
Ghidra really is an amazingly useful tool (clunky, yes — but very powerful and of course it’s free.) It makes me curious about commercial offerings and if they are worth buying as a hobbyist.
It really feels like magic sometimes. When using IDA I always knew that I was working with assembly. With ghidra, once everything has decent names it really feels like I have a copy of the original source code.
I used it for the Disney bb8 robot firmware which was a fun hobby project.
Another interesting project along these lines is a couple of hackers managed to reverse engineer the firmware for the Elektron Machinedrum (a classic early 00’s digital drum machine) and add a bunch of new functionality, such as new effects, a melodic mode and more.
Shout out to Hex Fiend! My favorite feature is the template system[0]. It makes it much easier to figure out file formats for which you have no documentation. You write a little tcl code to describe the parts of the format you understand as you go.
Very cool! This is really great work! It's awesome that there's been so many synthesiser related topics on Hacker News lately. I did a similar project myself to disassemble, and fully annotate the firmware for the Yamaha DX7: https://github.com/ajxs/yamaha_dx7_rom_disassembly
The biggest hint I could give anyone looking to disassemble a synthesiser operating system is to direct your attention towards the code processing individual MIDI messages. The code is invariably is huge mess, however you'll be able to very quickly identify the operating system's core functions, since the corresponding SysEx parameter numbers clearly identify what functionality you're looking at.
Just this afternoon I was working on reversing a closed source library that wasn’t working on M1 under Rosetta, using Ghidra. If you get the chance, you should do a post on how you actually modified the program to get it to do what you want (as long as the fix isn’t trivial, like changing a constant).
My exercise today made me realize just how much more difficult the modification of the binary is than simply understanding it, as well as how much I hate the x86 architecture (and CISC in general).
I'll give you a hint - the post was originally twice as long and _did_ go into detail about why I wanted to patch something, but a reviewer of the post pointed out that I could potentially have run afoul of one or more laws in one or more countries if I had succeeded in doing so.
Thank you very much for the follow up as well. At a risk of generalizing, it's frustrating, to say the least, that a 26-year-old device can be treated like safeguarded property or an industry secret at the same time when they're happily cast in the dustbins of a product cycle; or lost as insignificant pieces of company mergers.
This product line is not in the dustbin at all. Kurzweil just introduced the K2700, which still includes the VAST engine introduced in the K2000. It’s pretty amazing how long and continuous the Kurzweil technology timeline has been.
The K2000 was ahead of its time, and is still an amazing instrument today. I'm intrigued to discover that MAME is adding support for it, which sounds like an awesome project, I wonder if I can help in any way (K2000R is still here, but not been booted for a while).
As the other comment notes, I’m not sure if MAME can actually properly emulate any music hardware. There’s a skeleton Elektron Machinedrum/Monomachine in there for example but it can’t really do anything interesting (although this thread talks a bit about how some hackers used it to help reverse engineer and ultimately create their own firmware for the Machinedrum, see my other comment for more details: https://github.com/jmamma/MIDICtrl20_MegaCommand/issues/88)
Another project which may be of interest is this emulation of the Motorola DSP536xx DSP, which was used in a lot of classic late 90s hardware synths: https://dsp56300.wordpress.com. It can actually run Access Virus ROMs pretty much perfectly and apparently on an M1 the performance is pretty good, it’s usable but crackly on my i9 Mac. They’re hoping to be able to emulate many more synths which used the same DSP but for now the Virus is the focus.
MAME seems to have had "skeleton driver" (i.e.: very rough and unfinished) support for a number of 80s- and 90s-era synths, given that they're very similar to video game hardware from the same era. In my reverse engineering for this project, I realized that the MAME driver might now boot, but there's an extremely long way to go before audio comes out; the K2 series used custom ASICs for waveform generation that have no public documentation. I've heard that some folks much smarter than me know how to de-cap chips and reverse engineer them from the hardware, but I'll be extremely impressed if someone can make that happen and emulate those chips efficiently.
Just speculating but anyway, it could be related to allowing the digital dump of the various waveform samples, so that they could be used to create for example a software plugin that replicates some or all the functions of that instrument.
Modern hardware would allow the recreation of many old instruments at a fraction of their cost, either in hardware and/or software, so it's understandable that manufacturers are fiercely protecting their IP.
Another approach for reverse engineering the format of the KOS file might have been to run the bootloader code through a 68k emulator, step through it with a debugger (and resolve any undefined syscalls or built-in functions etc.), and observe what it's actually doing.
if you're into this I highly recommend checking out Bob Grieb's work[1]. He has reversed MCUs on many vintage synths and has a website dedicated to his explorations. Really impressive stuff...
But I am almost existentially disappointed by the opening sentence - ‘For reasons I won’t get into’…
It’s like - why?
The reasons we drive ourselves to do these frankly insane hacking experiments are almost always as interesting as the process itself for me.
The reason for that is - in this case - there are literally thousands of sampler synths with frankly a shitton more features than even what OP has implemented - why use this particular one?
The turntablist community - for instance - vastly prefers a frankly ancient and specific model of turntable (Technical SL-1200 or equivalent) - therefore the community to mod and update this decades-old hardware is dedicated and does similarly amazing things.
The SL-1200 is preferred for its build quality, it’s amazing motor, it’s weight; and it’s reliability.
It however lacks some frankly essential features from newer turntables - reverse, ultrapitch, pitch lock, USB support - but it’s still the most highly sought after and standard unit for the craft.
What makes this particular synthesizer the same - so valuable, so irreplaceable to OP’s craft - that drove them to such an insane level of deconstructing and reflashing the software? I, like - must know…XD
I'll give you a hint - the post was originally twice as long and _did_ go into detail about why I wanted to patch something, but a reviewer of the post pointed out that I could potentially have run afoul of one or more laws in one or more countries if I had succeeded in doing so.
It's not necessarily that this keyboard has any particular irreplaceable mojo, but possibly because it feels like it _should_ be able to do something but doesn't. So many great devices feel unfinished because of glaring functionality holes that appear as you invest time in using them. That sensation of a machine being unjustly crippled can be a powerful driving force for that kind of project.
It might be some ease of programming things. I remember owning a few K1000 family synths (which are close relatives) and while they sounded great, found it very hard to step very far outside the presets with the built-in UI.
1) identify relative addresses of all C-style strings from the start of the binary
2) identify all absolute read addresses
3) find the load address where the most read addresses correspond to the most strings
This solves the problem for me in the vast majority of cases. If you don't have strings, function prologues work as well.
I don't think there's any tooling for this yet. I've had to write the same algorithm over and over. It works regardless of any header information at the start of the binary, though it will fail if there is data between chunks (but you may see multiple valid alignments that can hint to something like this happening)