Pokegb: A gameboy emulator that only plays Pokémon Blue, in 68 lines of C++

andrewmcwatters · on June 4, 2021

> When I started pokegb, I spent most of the first few days implementing the CPU. Every time I hit an instruction that wasn't implemented, I'd implement it and see if it would run any further.

If you've ever written a drop-in replacement for some existing piece of software and filled the implementation with a bunch of `notimplemented()` (example)[1], you know exactly how satisfying this is. I love it.

[1]: https://github.com/Planimeter/lgf/blob/master/lua/framework/...

tiddles · on June 4, 2021

This x1000. I started implementing a JVM for shits and giggles a while ago, and once I had laid out ~220 panics for each opcode, it was incredibly satisfying to implement them one at a time to work towards bootstrapping the standard library.

Granted, most opcodes are implemented incorrectly as minor things like method and field resolution aren’t quite there yet.. as soon as I’ve implemented enough opcodes to actually run something and return/exit/throw, the correctness will be a bugger to fix :^)

binji · on June 4, 2021

JVM is a great choice! You might also want to check out WebAssembly too (full disclosure: I used to work on wasm). There aren't too many opcodes, and a lot of them are relatively simple math operations.

Once you get it working, you could run things like JSLinux! https://bellard.org/jslinux/vm.html?url=win2k.cfg&mem=192&gr...

javajosh · on June 4, 2021

Given your experience at this level of abstraction, do you have any thoughts on what an ideal set of opcodes look like? E.g. something like Knuth's MMIX. Do you prefer RISC or CISC? What about RISC-V? How much do the designers of real-world micro-architectures talk to folks like you?

A bit of a tangent, I know, but it feels important.

binji · on June 4, 2021

I'm probably the wrong person to ask, to be honest. I was around for a lot of the design of Wasm, but wasn't really involved at that level.

That said, in my opinion the ideal set of opcodes really depends on your goal. There were many goals For Wasm, but I think there was a focus on keeping it small and simple. Originally it was AST-based, but it was changed to a stack machine to reduce size. It was also designed to be AoT or JIT compiled, so the opcode layout is not particularly friendly to hardware decoding or interpreters (although people have made some very high quality Wasm interpreters).

And of course there were a lot of discussions and disagreements about the best way forward: AST vs. register VM vs. stack machine. Structured control flow vs. goto. How to handle unreachable code. How to store integer literals (LEB vs. prefix byte). What the text format should look like (sexprs vs. ...?) etc.

ywei3410 · on June 5, 2021

Are there any meeting notes/chat logs for those discussions? It would be fascinating to get some insight into the design process for a modern VM.

binji · on June 5, 2021

Yes, you can find all the meeting notes here: https://github.com/webassembly/meetings

However there were a lot of discussions that were not in those meetings and were in smaller groups or held in GitHub issues or PRs. Most of these were in https://github.com/webassembly/design.

tiddles · on June 5, 2021

That’s a great idea, I’m not that familiar with wasm and the best way to learn is of course to implement a VM :)

Cheers for the link to the meeting notes in the other comment chain, sounds like a great rabbit hole

kfrane · on June 6, 2021

You're right about wasm not being too complicated, I've learned this when watching David Beazley coding a WebAssembly interpreter in under an hour. And it doesn't require any knowledge about it beforehand.

https://www.youtube.com/watch?v=r-A78RgMhZU

skitter · on June 4, 2021

I've just started the same, it's definitely fun. One thing that surprised me about field resolution is that a class can have multiple fields with the same name (if they have different types).

tiddles · on June 5, 2021

The gigantic spec has so many surprises hidden in it - the use of a bespoke “modified utf8” for native strings was a fun discovery.

Good luck with the mammoth project! I think the fun will really start when it comes to running mini programs to test against a reference implementation and finding out we implemented core parts like method/field/(super)interface resolution totally wrong.

Oh, and optimisation too. It’d be cool to implement a JIT but that’s a long way away

binji · on June 4, 2021

It really is one of the best things about emulation development. It's surprising how quickly you go from "nothing works" to "I'm playing Super Mario Bros!"

hota_mazi · on June 5, 2021

Exactly this. I wrote an Apple ][ emulator, so I started by writing a 6502 emulator, and this is the only approach. Start with a giant switch, implement a few opcodes, run a program, wait until it reaches the not implemented case, implement it, repeat.

It is a surprisingly satisfying activity.

toast0 · on June 5, 2021

On topic, I had tremendous fun writing the CPU emulation for my maybe 1/3rd done NES emulator (that I likely won't finish, but who knows).

Off topic, I wouldn't do a 6502 as a giant switch, you'll likely miss out on a lot of the shared logic. If you decode the instructions differently, you can easily reuse logic for addressing modes and what not.

I really like how the opcodes are laid out in the table on this wiki http://wiki.nesdev.com/w/index.php/CPU_unofficial_opcodes

hota_mazi · on June 9, 2021

My giant switch leverages the bit patterns :)

My emulator passes the Klaus2m functional suite and my Apple ][ emulator is able to run quite a few protected games.

missblit · on June 4, 2021

I never did get very far with my NES emulator after getting an actual job (it was plan B for if no one liked my resume), but I do remember that being oddly satisfying [1]

[1] https://github.com/missblit/nesnes/blob/master/instructions_...

city41 · on June 4, 2021

It's one of my favorite things about programming. For example I made a Neo Geo tile viewer and decoding the format and figuring out just what needs to be done to get it working then suddenly BAM! tiles tiles everywhere! It's so satisfying.

makapuf · on June 4, 2021

I made a game console on an embedded system. The magic when you actually bitbang pixels to a screen and you see that first image is breathtaking.

seligman99 · on June 5, 2021

I've done things along these lines for things like the Advent of Code "int-code" assembly language implementations. It's satisfying to go from crashing because an opcode isn't implemented to something that produces output.

vmception · on June 4, 2021

“This is super impressive Binji we would love to have you on our team, now that you’ve got our attention pass this leetcode challenge with a random assortment of unrelated skills instead, and faster than the people that only study this specific kind of test”

userbinator · on June 5, 2021

I can't speak for Binji but in my experience those who know the low-level enough to be able to write emulators like this will also do very well on leetcode-like challenges.

mosselman · on June 5, 2021

“It’s nothing personal, everybody has to do these, it wouldn’t be fair to the rest otherwise.”

binji · on June 4, 2021

Ha, I guess we'll see when I go for my next gig :-)

tkiolp4 · on June 4, 2021

Ha. This made my day.

binji · on June 4, 2021

Hi all, I'm the author of this post! Happy to answer any questions. :-)

crowf · on June 4, 2021

I may have missed it in the post, but how is blue different than red? I realise that they have slightly different Pokemon, but is that enough to make the emulator run only blue?

binji · on June 4, 2021

Oops you're right, I should have mentioned that. Red works too, I just didn't want to write Blue/Red everwhere :)

glhaynes · on June 4, 2021

In the spirit of the project I'm surprised you didn't pick "Red" because it's one fewer characters. :)

northern-lights · on June 5, 2021

What about Yellow? If yes, you could shorted it as RBY which is more familiar to most old genners.

Y_Y · on June 4, 2021

It probably needs less memory, since you only 8 bits to store blue but red needs 3.

Andrex · on June 4, 2021

I hope it's OK I blogged about your project!

https://gameboy.blog/2021/06/04/good-read-pokegb-a-gameboy-e...

shawntan · on June 4, 2021

Does the MissingNo glitch work?

binji · on June 4, 2021

Not sure, I think it should! Would be really cool if someone tried it out. I'm a little worried I didn't implement some necessary instructions/hardware features so the game isn't actually winnable :-}

shawntan · on June 5, 2021

Love the project. nostalgia++

belthesar · on June 4, 2021

This reminds me of the GB emulator that ran inside of Pokémon Stadium. Folks found some tricks to make it work with other cartridges, but it turned out the emulator was not complete. https://n64today.com/2019/06/02/play-game-boy-games-on-n64/

ReflectedImage · on June 4, 2021

I wrote a GB emulator, the main issue was debugging it. Subtlely wrong tends to lead to game crash very quickly.

The trick I used was to take an existing GB emulator, add a printf debugging statement to print out each current instruction and the registers. Add the same printf statements on mine. Then use diff on the outputs of both emulators.

It does play Pokemon. The emulator is called FireGB.

progbits · on June 4, 2021

Comparing to an existing, known-good emulator is probably the most convenient way. This can be printf debugging like you describe, or even trying to compare framebuffer contents etc.

However I feel like that "spoils" a bit of the fun. A more adventurous approach is to test with test ROMs [1] - they are simpler to follow by hand and discover why they don't work than real games.

Of course if some game ROM is relying on some more obscure hardware quirks this might not be enough. Just wanted to bring it to attention for anyone interested in writing and debugging their emulators.

[1]: https://gbdev.gg8.se/wiki/articles/Test_ROMs

ReflectedImage · on June 5, 2021

Ahh but it already passed the test ROMs at the point I turned to printf debugging.

ggambetta · on June 5, 2021

I did something similar with an X86, with enough opcodes implemented so it could run Goody (an ancient CGA game): https://gabrielgambetta.com/remakes.html Super fun exercise!

deepfriedrice · on June 5, 2021

Love the explanation of the DAA instruction. I wrote a gbc emulator in college, but couldn't figure out what that instruction was supposed to do.

I ended up just copying a lookup table from an open source emulator at the time, which of course didn't help my understanding: https://github.com/visualboyadvance-m/visualboyadvance-m/blo...

Funny to finally realize what this does, and now see how the lookup table works.

binji · on June 5, 2021

I kind of dreaded writing that one, to be honest! I didn't quite cover all the details but I hope it made a little more sense. :)

greatNespresso · on June 5, 2021

That's really incredible ! Still, for a total beginner like I am, 589 lines of C++ don't seem enough to cover it all, so what's the fuss? Where are the assets? What should I look for? Agreed, feel free to post on show HN to level up the debate

hoten · on June 5, 2021

It's an emulator, not the game data/ROM.

anonytrary · on June 5, 2021

  #include <SDL2/SDL.h>

How big is the final binary? It would be interesting to compare the final size to the total size of the GBC game. Although apparently

> Many features are not implemented!

So I'm not sure it's ready for that comparison.

mettamage · on June 4, 2021

You might want to add (obfuscated) to the title. The actual source code, while small, is much larger [1]. (edit: at 589 lines it's still impressive!)

For people who like this kind of thing and/or for people who want to be able to understand what is going on at all, you might like the course NAND2Tetris [2]. It's a good prerequisite to understanding this blog post if computer systems isn't a topic that one has explored before.

[1] https://gist.github.com/binji/395669d45e9005950232043ab4378a... -- the author notes this in the article.

[2] https://www.nand2tetris.org/

nighthawk454 · on June 4, 2021

Still, the unbofuscated one is only 589 lines including formatting, a copyright header, and what looks to be an 80-char line limit.

Seems sufficiently interesting either way

Aachen · on June 4, 2021

See, the truth is just as impressive to me, but knowing it's not a lie and a more than fair line length makes me appreciate it.

ngokevin · on June 4, 2021

The obfuscated one is awesome though, the code is in the shape of pokeballs!

userbinator · on June 4, 2021

I think instead of line count, either final binary size or source code size in bytes should be more common in claims of minimalism; the demoscene, for example, always uses binary size.

binji · on June 4, 2021

I got kinda dragged about this on reddit/twitter. Afterwards I posted on r/tinycode with the size in bytes:

https://www.reddit.com/r/tinycode/comments/nn5djb/pokegb_a_g...

iampims · on June 4, 2021

Don’t let any of these comments take away from how great this is, _especially_ with the in-depth blog post.

kzrdude · on June 5, 2021

would it be possible to push it down to IOCCC size? Would be fun to see it there, of course.

binji · on June 5, 2021

I think so, when I was doing the write-up I noticed a few things I could have improved (and a few bugs!)

IgorPartola · on June 4, 2021

Number of nodes in the AST or number of machine instructions might be more meaningful.

deathanatos · on June 4, 2021

That metric is game-able, too: you just implement a VM, and only count the instructions to to VM. The rest is one big binary blob of "data", occupying a single node in the AST of the host language.

Now, of course, that's basically cooking the books, but there's a wide gulf between a full VM & smaller, domain specific ones that might not look like VMs. (All or most of data-driven programming, really.)

Ekaros · on June 4, 2021

Final binary size including all external dependencies is probably fairest metric. With heavy use of libraries things get messy.

account42 · on June 7, 2021

> including all external dependencies

The OS too?

Blikkentrekker · on June 5, 2021

source code size can still be manipulated by using very small identifiers or reusing them.

There should rather be a normalization metric where all identifiers are treated as eight characters long

Resulting binary will be different depending on the compiler and type of optimization and would simply be manipulated by optimizing heavily for size.

smithza · on June 4, 2021

68 lines of code is pretty much arbitrary. It could have been far less, he just put it all to fit his poke ball pattern.

anonymousiam · on June 5, 2021

The author's tweet claimed 62 lines.

binji · on June 5, 2021

Yep, I made a mistake it's actually 68.

dheera · on June 5, 2021

That, and obfuscation is also relative. You could call the 589 lines obfuscated as well.

anthk · on June 5, 2021

Some Forth wacko would reduce that to 40 lines...