Hacker News new | past | comments | ask | show | jobs | submit login
Pokegb: A gameboy emulator that only plays Pokémon Blue, in 68 lines of C++ (binji.github.io)
551 points by brundolf on June 4, 2021 | hide | past | favorite | 63 comments



> When I started pokegb, I spent most of the first few days implementing the CPU. Every time I hit an instruction that wasn't implemented, I'd implement it and see if it would run any further.

If you've ever written a drop-in replacement for some existing piece of software and filled the implementation with a bunch of `notimplemented()` (example)[1], you know exactly how satisfying this is. I love it.

[1]: https://github.com/Planimeter/lgf/blob/master/lua/framework/...


This x1000. I started implementing a JVM for shits and giggles a while ago, and once I had laid out ~220 panics for each opcode, it was incredibly satisfying to implement them one at a time to work towards bootstrapping the standard library.

Granted, most opcodes are implemented incorrectly as minor things like method and field resolution aren’t quite there yet.. as soon as I’ve implemented enough opcodes to actually run something and return/exit/throw, the correctness will be a bugger to fix :^)


JVM is a great choice! You might also want to check out WebAssembly too (full disclosure: I used to work on wasm). There aren't too many opcodes, and a lot of them are relatively simple math operations.

Once you get it working, you could run things like JSLinux! https://bellard.org/jslinux/vm.html?url=win2k.cfg&mem=192&gr...


Given your experience at this level of abstraction, do you have any thoughts on what an ideal set of opcodes look like? E.g. something like Knuth's MMIX. Do you prefer RISC or CISC? What about RISC-V? How much do the designers of real-world micro-architectures talk to folks like you?

A bit of a tangent, I know, but it feels important.


I'm probably the wrong person to ask, to be honest. I was around for a lot of the design of Wasm, but wasn't really involved at that level.

That said, in my opinion the ideal set of opcodes really depends on your goal. There were many goals For Wasm, but I think there was a focus on keeping it small and simple. Originally it was AST-based, but it was changed to a stack machine to reduce size. It was also designed to be AoT or JIT compiled, so the opcode layout is not particularly friendly to hardware decoding or interpreters (although people have made some very high quality Wasm interpreters).

And of course there were a lot of discussions and disagreements about the best way forward: AST vs. register VM vs. stack machine. Structured control flow vs. goto. How to handle unreachable code. How to store integer literals (LEB vs. prefix byte). What the text format should look like (sexprs vs. ...?) etc.


Are there any meeting notes/chat logs for those discussions? It would be fascinating to get some insight into the design process for a modern VM.


Yes, you can find all the meeting notes here: https://github.com/webassembly/meetings

However there were a lot of discussions that were not in those meetings and were in smaller groups or held in GitHub issues or PRs. Most of these were in https://github.com/webassembly/design.


That’s a great idea, I’m not that familiar with wasm and the best way to learn is of course to implement a VM :)

Cheers for the link to the meeting notes in the other comment chain, sounds like a great rabbit hole


You're right about wasm not being too complicated, I've learned this when watching David Beazley coding a WebAssembly interpreter in under an hour. And it doesn't require any knowledge about it beforehand.

https://www.youtube.com/watch?v=r-A78RgMhZU


I've just started the same, it's definitely fun. One thing that surprised me about field resolution is that a class can have multiple fields with the same name (if they have different types).


The gigantic spec has so many surprises hidden in it - the use of a bespoke “modified utf8” for native strings was a fun discovery.

Good luck with the mammoth project! I think the fun will really start when it comes to running mini programs to test against a reference implementation and finding out we implemented core parts like method/field/(super)interface resolution totally wrong.

Oh, and optimisation too. It’d be cool to implement a JIT but that’s a long way away


It really is one of the best things about emulation development. It's surprising how quickly you go from "nothing works" to "I'm playing Super Mario Bros!"


Exactly this. I wrote an Apple ][ emulator, so I started by writing a 6502 emulator, and this is the only approach. Start with a giant switch, implement a few opcodes, run a program, wait until it reaches the not implemented case, implement it, repeat.

It is a surprisingly satisfying activity.


On topic, I had tremendous fun writing the CPU emulation for my maybe 1/3rd done NES emulator (that I likely won't finish, but who knows).

Off topic, I wouldn't do a 6502 as a giant switch, you'll likely miss out on a lot of the shared logic. If you decode the instructions differently, you can easily reuse logic for addressing modes and what not.

I really like how the opcodes are laid out in the table on this wiki http://wiki.nesdev.com/w/index.php/CPU_unofficial_opcodes


My giant switch leverages the bit patterns :)

My emulator passes the Klaus2m functional suite and my Apple ][ emulator is able to run quite a few protected games.


I never did get very far with my NES emulator after getting an actual job (it was plan B for if no one liked my resume), but I do remember that being oddly satisfying [1]

[1] https://github.com/missblit/nesnes/blob/master/instructions_...


It's one of my favorite things about programming. For example I made a Neo Geo tile viewer and decoding the format and figuring out just what needs to be done to get it working then suddenly BAM! tiles tiles everywhere! It's so satisfying.


I made a game console on an embedded system. The magic when you actually bitbang pixels to a screen and you see that first image is breathtaking.


I've done things along these lines for things like the Advent of Code "int-code" assembly language implementations. It's satisfying to go from crashing because an opcode isn't implemented to something that produces output.


“This is super impressive Binji we would love to have you on our team, now that you’ve got our attention pass this leetcode challenge with a random assortment of unrelated skills instead, and faster than the people that only study this specific kind of test”


I can't speak for Binji but in my experience those who know the low-level enough to be able to write emulators like this will also do very well on leetcode-like challenges.


“It’s nothing personal, everybody has to do these, it wouldn’t be fair to the rest otherwise.”


Ha, I guess we'll see when I go for my next gig :-)


Ha. This made my day.


Hi all, I'm the author of this post! Happy to answer any questions. :-)


I may have missed it in the post, but how is blue different than red? I realise that they have slightly different Pokemon, but is that enough to make the emulator run only blue?


Oops you're right, I should have mentioned that. Red works too, I just didn't want to write Blue/Red everwhere :)


In the spirit of the project I'm surprised you didn't pick "Red" because it's one fewer characters. :)


What about Yellow? If yes, you could shorted it as RBY which is more familiar to most old genners.


It probably needs less memory, since you only 8 bits to store blue but red needs 3.


I hope it's OK I blogged about your project!

https://gameboy.blog/2021/06/04/good-read-pokegb-a-gameboy-e...


Does the MissingNo glitch work?


Not sure, I think it should! Would be really cool if someone tried it out. I'm a little worried I didn't implement some necessary instructions/hardware features so the game isn't actually winnable :-}


Love the project. nostalgia++


This reminds me of the GB emulator that ran inside of Pokémon Stadium. Folks found some tricks to make it work with other cartridges, but it turned out the emulator was not complete. https://n64today.com/2019/06/02/play-game-boy-games-on-n64/


I wrote a GB emulator, the main issue was debugging it. Subtlely wrong tends to lead to game crash very quickly.

The trick I used was to take an existing GB emulator, add a printf debugging statement to print out each current instruction and the registers. Add the same printf statements on mine. Then use diff on the outputs of both emulators.

It does play Pokemon. The emulator is called FireGB.


Comparing to an existing, known-good emulator is probably the most convenient way. This can be printf debugging like you describe, or even trying to compare framebuffer contents etc.

However I feel like that "spoils" a bit of the fun. A more adventurous approach is to test with test ROMs [1] - they are simpler to follow by hand and discover why they don't work than real games.

Of course if some game ROM is relying on some more obscure hardware quirks this might not be enough. Just wanted to bring it to attention for anyone interested in writing and debugging their emulators.

[1]: https://gbdev.gg8.se/wiki/articles/Test_ROMs


Ahh but it already passed the test ROMs at the point I turned to printf debugging.


I did something similar with an X86, with enough opcodes implemented so it could run Goody (an ancient CGA game): https://gabrielgambetta.com/remakes.html Super fun exercise!


Love the explanation of the DAA instruction. I wrote a gbc emulator in college, but couldn't figure out what that instruction was supposed to do.

I ended up just copying a lookup table from an open source emulator at the time, which of course didn't help my understanding: https://github.com/visualboyadvance-m/visualboyadvance-m/blo...

Funny to finally realize what this does, and now see how the lookup table works.


I kind of dreaded writing that one, to be honest! I didn't quite cover all the details but I hope it made a little more sense. :)


That's really incredible ! Still, for a total beginner like I am, 589 lines of C++ don't seem enough to cover it all, so what's the fuss? Where are the assets? What should I look for? Agreed, feel free to post on show HN to level up the debate


It's an emulator, not the game data/ROM.


  #include <SDL2/SDL.h>
How big is the final binary? It would be interesting to compare the final size to the total size of the GBC game. Although apparently

> Many features are not implemented!

So I'm not sure it's ready for that comparison.


You might want to add (obfuscated) to the title. The actual source code, while small, is much larger [1]. (edit: at 589 lines it's still impressive!)

For people who like this kind of thing and/or for people who want to be able to understand what is going on at all, you might like the course NAND2Tetris [2]. It's a good prerequisite to understanding this blog post if computer systems isn't a topic that one has explored before.

[1] https://gist.github.com/binji/395669d45e9005950232043ab4378a... -- the author notes this in the article.

[2] https://www.nand2tetris.org/


Still, the unbofuscated one is only 589 lines including formatting, a copyright header, and what looks to be an 80-char line limit.

Seems sufficiently interesting either way


See, the truth is just as impressive to me, but knowing it's not a lie and a more than fair line length makes me appreciate it.


The obfuscated one is awesome though, the code is in the shape of pokeballs!


I think instead of line count, either final binary size or source code size in bytes should be more common in claims of minimalism; the demoscene, for example, always uses binary size.


I got kinda dragged about this on reddit/twitter. Afterwards I posted on r/tinycode with the size in bytes:

https://www.reddit.com/r/tinycode/comments/nn5djb/pokegb_a_g...


Don’t let any of these comments take away from how great this is, _especially_ with the in-depth blog post.


would it be possible to push it down to IOCCC size? Would be fun to see it there, of course.


I think so, when I was doing the write-up I noticed a few things I could have improved (and a few bugs!)


Number of nodes in the AST or number of machine instructions might be more meaningful.


That metric is game-able, too: you just implement a VM, and only count the instructions to to VM. The rest is one big binary blob of "data", occupying a single node in the AST of the host language.

Now, of course, that's basically cooking the books, but there's a wide gulf between a full VM & smaller, domain specific ones that might not look like VMs. (All or most of data-driven programming, really.)


Final binary size including all external dependencies is probably fairest metric. With heavy use of libraries things get messy.


> including all external dependencies

The OS too?


source code size can still be manipulated by using very small identifiers or reusing them.

There should rather be a normalization metric where all identifiers are treated as eight characters long

Resulting binary will be different depending on the compiler and type of optimization and would simply be manipulated by optimizing heavily for size.


68 lines of code is pretty much arbitrary. It could have been far less, he just put it all to fit his poke ball pattern.


The author's tweet claimed 62 lines.


Yep, I made a mistake it's actually 68.


That, and obfuscation is also relative. You could call the 589 lines obfuscated as well.


Some Forth wacko would reduce that to 40 lines...




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: