Patching an embedded synthesiser OS from 1996 with Ghidra

mmastrac · on April 24, 2022

If you're ever wondering what the load address of a piece of code is, my favourite trick is to:

1) identify relative addresses of all C-style strings from the start of the binary

2) identify all absolute read addresses

3) find the load address where the most read addresses correspond to the most strings

This solves the problem for me in the vast majority of cases. If you don't have strings, function prologues work as well.

I don't think there's any tooling for this yet. I've had to write the same algorithm over and over. It works regardless of any header information at the start of the binary, though it will fail if there is data between chunks (but you may see multiple valid alignments that can hint to something like this happening)

CaciaraAsAServi · on April 25, 2022

Yep! But before doing all of this, check if any .so/.dll for emulating the raw piece of code you are studying is available, so you can do this:

1. check if in the .so/.dll you can spot any function referring to any recognizable (occurring only once) kind of binary blob - like images, or strings too

2. find the same function in the raw image, Ghidra's decompiler helps so much in this

3. find the address supposed to be pointing to the same binary blob (assuming that this blob is the same in the emulated and the real version)

4. scan the raw image for that same blob you have in the emulated version

5. do the maths to find the base address of the raw image.

Obviously all of this works if the raw binary image you have all refers to the same flat memory space and there's no partial relocation by the bootloader/program accepting this image going on.

Don't ask me about that time I painfully spent so much time without knowing I had one file with the odd name of "emulator.dll" staring at me all that time...

logbiscuitswave · on April 24, 2022

I wish I had seen this article sooner. It’s a really nice and succinct primer to some of the basics of using Ghidra for firmware analysis.

I have an expensive robotic cat toy that the manufacturer stopped supporting and I’ve been working on reverse engineering the firmware in my free time to better understand how it works. I have no background in reverse engineering so I’ve been learning a lot about identifying hardware, reading data sheets, and of course fumbling through Ghidra to disassemble the firmware.

After getting your bearings it becomes pretty easy to recognize the patterns of various standard C library functions like strlen, memcpy, etc. Others can be more challenging. Of course, with bespoke embedded hardware it can be much more difficult.

Ghidra really is an amazingly useful tool (clunky, yes — but very powerful and of course it’s free.) It makes me curious about commercial offerings and if they are worth buying as a hobbyist.

psobot · on April 24, 2022

I couldn't think of a better response to a blog post than "I wish I had seen this article sooner." Thank you!

russdill · on April 24, 2022

It really feels like magic sometimes. When using IDA I always knew that I was working with assembly. With ghidra, once everything has decent names it really feels like I have a copy of the original source code.

I used it for the Disney bb8 robot firmware which was a fun hobby project.

tomduncalf · on April 24, 2022

Another interesting project along these lines is a couple of hackers managed to reverse engineer the firmware for the Elektron Machinedrum (a classic early 00’s digital drum machine) and add a bunch of new functionality, such as new effects, a melodic mode and more.

You can download it from https://www.elektronauts.com/t/machinedrum-sps1-uw-x-06-rele.... They’ve unfortunately kept the details of how they did it somewhat private (I believe at the request of Elektron), though some notes on their initial reverse engineering can be found here: https://github.com/jmamma/MIDICtrl20_MegaCommand/issues/88

thewebcount · on April 24, 2022

Shout out to Hex Fiend! My favorite feature is the template system[0]. It makes it much easier to figure out file formats for which you have no documentation. You write a little tcl code to describe the parts of the format you understand as you go.

[0] https://github.com/HexFiend/HexFiend/tree/master/templates

jonhohle · on April 25, 2022

That’s awesome. I use Hex Fiend frequently and never knew about templates.

ajxs · on April 24, 2022

Very cool! This is really great work! It's awesome that there's been so many synthesiser related topics on Hacker News lately. I did a similar project myself to disassemble, and fully annotate the firmware for the Yamaha DX7: https://github.com/ajxs/yamaha_dx7_rom_disassembly

The biggest hint I could give anyone looking to disassemble a synthesiser operating system is to direct your attention towards the code processing individual MIDI messages. The code is invariably is huge mess, however you'll be able to very quickly identify the operating system's core functions, since the corresponding SysEx parameter numbers clearly identify what functionality you're looking at.

skobovm · on April 24, 2022

Just this afternoon I was working on reversing a closed source library that wasn’t working on M1 under Rosetta, using Ghidra. If you get the chance, you should do a post on how you actually modified the program to get it to do what you want (as long as the fix isn’t trivial, like changing a constant).

My exercise today made me realize just how much more difficult the modification of the binary is than simply understanding it, as well as how much I hate the x86 architecture (and CISC in general).

underdeserver · on April 24, 2022

Cool!

It's unfortunate that he wouldn't share what the goal was - what he wanted to patch and why.

psobot · on April 24, 2022

I'll give you a hint - the post was originally twice as long and _did_ go into detail about why I wanted to patch something, but a reviewer of the post pointed out that I could potentially have run afoul of one or more laws in one or more countries if I had succeeded in doing so.

ILMostro7 · on April 24, 2022

Thank you very much for the follow up as well. At a risk of generalizing, it's frustrating, to say the least, that a 26-year-old device can be treated like safeguarded property or an industry secret at the same time when they're happily cast in the dustbins of a product cycle; or lost as insignificant pieces of company mergers.

wrs · on April 24, 2022

This product line is not in the dustbin at all. Kurzweil just introduced the K2700, which still includes the VAST engine introduced in the K2000. It’s pretty amazing how long and continuous the Kurzweil technology timeline has been.

cesaref · on April 24, 2022

The K2000 was ahead of its time, and is still an amazing instrument today. I'm intrigued to discover that MAME is adding support for it, which sounds like an awesome project, I wonder if I can help in any way (K2000R is still here, but not been booted for a while).

tomduncalf · on April 24, 2022

Hey Cesare, fancy seeing you here ;)

As the other comment notes, I’m not sure if MAME can actually properly emulate any music hardware. There’s a skeleton Elektron Machinedrum/Monomachine in there for example but it can’t really do anything interesting (although this thread talks a bit about how some hackers used it to help reverse engineer and ultimately create their own firmware for the Machinedrum, see my other comment for more details: https://github.com/jmamma/MIDICtrl20_MegaCommand/issues/88)

Another project which may be of interest is this emulation of the Motorola DSP536xx DSP, which was used in a lot of classic late 90s hardware synths: https://dsp56300.wordpress.com. It can actually run Access Virus ROMs pretty much perfectly and apparently on an M1 the performance is pretty good, it’s usable but crackly on my i9 Mac. They’re hoping to be able to emulate many more synths which used the same DSP but for now the Virus is the focus.

psobot · on April 24, 2022

MAME seems to have had "skeleton driver" (i.e.: very rough and unfinished) support for a number of 80s- and 90s-era synths, given that they're very similar to video game hardware from the same era. In my reverse engineering for this project, I realized that the MAME driver might now boot, but there's an extremely long way to go before audio comes out; the K2 series used custom ASICs for waveform generation that have no public documentation. I've heard that some folks much smarter than me know how to de-cap chips and reverse engineer them from the hardware, but I'll be extremely impressed if someone can make that happen and emulate those chips efficiently.

underdeserver · on April 24, 2022

Oh wow... I'd be interested to hear what laws those might be before I publish something similar myself.

squarefoot · on April 24, 2022

Just speculating but anyway, it could be related to allowing the digital dump of the various waveform samples, so that they could be used to create for example a software plugin that replicates some or all the functions of that instrument. Modern hardware would allow the recreation of many old instruments at a fraction of their cost, either in hardware and/or software, so it's understandable that manufacturers are fiercely protecting their IP.

shxdow · on April 24, 2022

Very interesting work indeed.

I second the sentiment, knowledge of the main goal would add context about the reverse engineering process

cmrdporcupine · on April 24, 2022

Another approach for reverse engineering the format of the KOS file might have been to run the bootloader code through a 68k emulator, step through it with a debugger (and resolve any undefined syscalls or built-in functions etc.), and observe what it's actually doing.

davtbaum · on April 24, 2022

if you're into this I highly recommend checking out Bob Grieb's work[1]. He has reversed MCUs on many vintage synths and has a website dedicated to his explorations. Really impressive stuff...

[1]http://tauntek.com/synthesizerinfo.htm

aDfbrtVt · on April 24, 2022

Interesting, the checksum looks like a standard LFSR but uses addition instead of XOR.

1) Seed checkum with initial_value

2) Add a data uint sized to the checksum

3) Shift up by 1 bit while feeding back the MSB to the LSB.

4) GOTO 2 until you have consume all the data.

Not sure why you would process with usual addition instead of GF2 addition.

lostgame · on April 24, 2022

This is very, very impressive.

But I am almost existentially disappointed by the opening sentence - ‘For reasons I won’t get into’…

It’s like - why?

The reasons we drive ourselves to do these frankly insane hacking experiments are almost always as interesting as the process itself for me.

The reason for that is - in this case - there are literally thousands of sampler synths with frankly a shitton more features than even what OP has implemented - why use this particular one?

The turntablist community - for instance - vastly prefers a frankly ancient and specific model of turntable (Technical SL-1200 or equivalent) - therefore the community to mod and update this decades-old hardware is dedicated and does similarly amazing things.

The SL-1200 is preferred for its build quality, it’s amazing motor, it’s weight; and it’s reliability.

It however lacks some frankly essential features from newer turntables - reverse, ultrapitch, pitch lock, USB support - but it’s still the most highly sought after and standard unit for the craft.

What makes this particular synthesizer the same - so valuable, so irreplaceable to OP’s craft - that drove them to such an insane level of deconstructing and reflashing the software? I, like - must know…XD

I mean - cool flex - but why?

psobot · on April 24, 2022

(also posted in response to another comment)

I'll give you a hint - the post was originally twice as long and _did_ go into detail about why I wanted to patch something, but a reviewer of the post pointed out that I could potentially have run afoul of one or more laws in one or more countries if I had succeeded in doing so.

TedDoesntTalk · on April 25, 2022

Free legal advice from non-lawyers is worth the same as juggling advice from cats.

speed_spread · on April 24, 2022

It's not necessarily that this keyboard has any particular irreplaceable mojo, but possibly because it feels like it _should_ be able to do something but doesn't. So many great devices feel unfinished because of glaring functionality holes that appear as you invest time in using them. That sensation of a machine being unjustly crippled can be a powerful driving force for that kind of project.

pinewurst · on April 24, 2022

It might be some ease of programming things. I remember owning a few K1000 family synths (which are close relatives) and while they sounded great, found it very hard to step very far outside the presets with the built-in UI.

unwind · on April 24, 2022

Very cool!

My observations:

- The renaming into *source = *destination was very confusing.

- I expected (sweet, sweet) 68k assembly, but it's all decompiled to C!

- For some reason MAME supports this musical instrument, and OP's work helped that project boot their emulation, sweet!

TAForObvReasons · on April 24, 2022

The screenshot is mis-labeled, but the body text is correct.

> It looks like we’re coping a bunch of data from ROM into RAM - specifically from 0x0001860a to 0x02130000.

line 33 should have been named `src` and 34 should have been named `dst`

harel · on April 24, 2022

That synth was the Rolls Royce of synths in the 90s. Price wise at least. It seemed like it could do anything at the time.

jakedata · on April 25, 2022

If someone digs into MOTU synth firmware there is a an easter egg picture of a couple of dogs. No idea how to access it but I've seen it.