> When I started pokegb, I spent most of the first few days implementing the CPU. Every time I hit an instruction that wasn't implemented, I'd implement it and see if it would run any further.
If you've ever written a drop-in replacement for some existing piece of software and filled the implementation with a bunch of `notimplemented()` (example)[1], you know exactly how satisfying this is. I love it.
This x1000. I started implementing a JVM for shits and giggles a while ago, and once I had laid out ~220 panics for each opcode, it was incredibly satisfying to implement them one at a time to work towards bootstrapping the standard library.
Granted, most opcodes are implemented incorrectly as minor things like method and field resolution aren’t quite there yet.. as soon as I’ve implemented enough opcodes to actually run something and return/exit/throw, the correctness will be a bugger to fix :^)
JVM is a great choice! You might also want to check out WebAssembly too (full disclosure: I used to work on wasm). There aren't too many opcodes, and a lot of them are relatively simple math operations.
Given your experience at this level of abstraction, do you have any thoughts on what an ideal set of opcodes look like? E.g. something like Knuth's MMIX. Do you prefer RISC or CISC? What about RISC-V? How much do the designers of real-world micro-architectures talk to folks like you?
A bit of a tangent, I know, but it feels important.
I'm probably the wrong person to ask, to be honest. I was around for a lot of the design of Wasm, but wasn't really involved at that level.
That said, in my opinion the ideal set of opcodes really depends on your goal. There were many goals For Wasm, but I think there was a focus on keeping it small and simple. Originally it was AST-based, but it was changed to a stack machine to reduce size. It was also designed to be AoT or JIT compiled, so the opcode layout is not particularly friendly to hardware decoding or interpreters (although people have made some very high quality Wasm interpreters).
And of course there were a lot of discussions and disagreements about the best way forward: AST vs. register VM vs. stack machine. Structured control flow vs. goto. How to handle unreachable code. How to store integer literals (LEB vs. prefix byte). What the text format should look like (sexprs vs. ...?) etc.
However there were a lot of discussions that were not in those meetings and were in smaller groups or held in GitHub issues or PRs. Most of these were in https://github.com/webassembly/design.
You're right about wasm not being too complicated, I've learned this when watching David Beazley coding a WebAssembly interpreter in under an hour. And it doesn't require any knowledge about it beforehand.
I've just started the same, it's definitely fun. One thing that surprised me about field resolution is that a class can have multiple fields with the same name (if they have different types).
The gigantic spec has so many surprises hidden in it - the use of a bespoke “modified utf8” for native strings was a fun discovery.
Good luck with the mammoth project! I think the fun will really start when it comes to running mini programs to test against a reference implementation and finding out we implemented core parts like method/field/(super)interface resolution totally wrong.
Oh, and optimisation too. It’d be cool to implement a JIT but that’s a long way away
It really is one of the best things about emulation development. It's surprising how quickly you go from "nothing works" to "I'm playing Super Mario Bros!"
Exactly this. I wrote an Apple ][ emulator, so I started by writing a 6502 emulator, and this is the only approach. Start with a giant switch, implement a few opcodes, run a program, wait until it reaches the not implemented case, implement it, repeat.
On topic, I had tremendous fun writing the CPU emulation for my maybe 1/3rd done NES emulator (that I likely won't finish, but who knows).
Off topic, I wouldn't do a 6502 as a giant switch, you'll likely miss out on a lot of the shared logic. If you decode the instructions differently, you can easily reuse logic for addressing modes and what not.
I never did get very far with my NES emulator after getting an actual job (it was plan B for if no one liked my resume), but I do remember that being oddly satisfying [1]
It's one of my favorite things about programming. For example I made a Neo Geo tile viewer and decoding the format and figuring out just what needs to be done to get it working then suddenly BAM! tiles tiles everywhere! It's so satisfying.
I've done things along these lines for things like the Advent of Code "int-code" assembly language implementations. It's satisfying to go from crashing because an opcode isn't implemented to something that produces output.
“This is super impressive Binji we would love to have you on our team, now that you’ve got our attention pass this leetcode challenge with a random assortment of unrelated skills instead, and faster than the people that only study this specific kind of test”
I can't speak for Binji but in my experience those who know the low-level enough to be able to write emulators like this will also do very well on leetcode-like challenges.
I may have missed it in the post, but how is blue different than red? I realise that they have slightly different Pokemon, but is that enough to make the emulator run only blue?
Not sure, I think it should! Would be really cool if someone tried it out. I'm a little worried I didn't implement some necessary instructions/hardware features so the game isn't actually winnable :-}
This reminds me of the GB emulator that ran inside of Pokémon Stadium. Folks found some tricks to make it work with other cartridges, but it turned out the emulator was not complete. https://n64today.com/2019/06/02/play-game-boy-games-on-n64/
I wrote a GB emulator, the main issue was debugging it. Subtlely wrong tends to lead to game crash very quickly.
The trick I used was to take an existing GB emulator, add a printf debugging statement to print out each current instruction and the registers. Add the same printf statements on mine. Then use diff on the outputs of both emulators.
It does play Pokemon. The emulator is called FireGB.
Comparing to an existing, known-good emulator is probably the most convenient way. This can be printf debugging like you describe, or even trying to compare framebuffer contents etc.
However I feel like that "spoils" a bit of the fun. A more adventurous approach is to test with test ROMs [1] - they are simpler to follow by hand and discover why they don't work than real games.
Of course if some game ROM is relying on some more obscure hardware quirks this might not be enough. Just wanted to bring it to attention for anyone interested in writing and debugging their emulators.
I did something similar with an X86, with enough opcodes implemented so it could run Goody (an ancient CGA game): https://gabrielgambetta.com/remakes.html Super fun exercise!
That's really incredible ! Still, for a total beginner like I am, 589 lines of C++ don't seem enough to cover it all, so what's the fuss? Where are the assets? What should I look for? Agreed, feel free to post on show HN to level up the debate
You might want to add (obfuscated) to the title. The actual source code, while small, is much larger [1]. (edit: at 589 lines it's still impressive!)
For people who like this kind of thing and/or for people who want to be able to understand what is going on at all, you might like the course NAND2Tetris [2]. It's a good prerequisite to understanding this blog post if computer systems isn't a topic that one has explored before.
I think instead of line count, either final binary size or source code size in bytes should be more common in claims of minimalism; the demoscene, for example, always uses binary size.
That metric is game-able, too: you just implement a VM, and only count the instructions to to VM. The rest is one big binary blob of "data", occupying a single node in the AST of the host language.
Now, of course, that's basically cooking the books, but there's a wide gulf between a full VM & smaller, domain specific ones that might not look like VMs. (All or most of data-driven programming, really.)
If you've ever written a drop-in replacement for some existing piece of software and filled the implementation with a bunch of `notimplemented()` (example)[1], you know exactly how satisfying this is. I love it.
[1]: https://github.com/Planimeter/lgf/blob/master/lua/framework/...