Hacker News new | past | comments | ask | show | jobs | submit login
LaiNES – Cycle-accurate NES emulator in around 1000 lines of code (github.com/andreaorru)
231 points by mmphosis on Nov 28, 2016 | hide | past | favorite | 68 comments



This makes me wonder. The 6502 in the NES ran at 1.79MHz.

According to https://en.wikipedia.org/wiki/Instructions_per_second , a modern i7 processor handles north of 100,000 MIPS.

According to https://en.wikipedia.org/wiki/Transistor_count, the 6502 had 3510 transistors.

At 100,000 MIPS, a modern CPU would have a budget of ~56k instructions to process one cycle of that CPU, or about 16 instructions per transistor.

So it would seem it might now be possible to simulate those old processors, at the transistor level, in real time. Is anyone aware of experiments in this domain? If not really useful, that sounds like an interesting fun (side) project.


http://www.visual6502.org/ does it in the browser & flashes transistor states at you!

And if you liked that you'll probably also enjoy http://www.megaprocessor.com/progress.html


Also: Introducing the MOnSter 6502 http://www.evilmadscientist.com/2016/6502/


This is incredible! But it's also nowhere near realtime. The max clock rate of a 6502 is 1 MHz to 2 MHz, while the simulation runs at around 7.3 Hz in my browser.


It turns out that for several reasons, creating an exact emulation of even much older gaming technology can be very computationally intensive.

http://arstechnica.com/gaming/2011/08/accuracy-takes-power-o...


When people link to this, it's a good idea to mention that 65816 emulation runs as one enormous switch statement. Yes, it's accurate, but at a rather large performance penalty.

I'm a daily/weekly higan user, nut I make no excuses for how it's written. take a peek into the code.


I'm aware that hardware simulation of processors is something which is done, particularly by the people who design those processors.

It takes a lot of grunt to simulate a modern GPU or CPU, and of course getting near realtime is impossible, but it's handy if you want to run a load of code through it to see if it actually works.

So... yes, you can unit test the design of your processor. If you have a big enough server farm.


Verilog does exactly this. You design the logic than simulate all the gates. I'm sure there's some open cores out there ready for simulation if someone wants to fiddle with it


Simulations of complex modern chips don't run in real time.


You might find ladder logic for PLCs interesting. It simulates old hardware relay systems.


What does "cycle-accurate" mean? The README assumes the reader already knows; Wikipedia via Google is totally unhelpful: "A cycle-accurate simulator is a computer program that simulates a microarchitecture on a cycle-by-cycle basis."


The traditional way of coding a console emulator was to figure out the time interval to the next interrupt in units of CPU cycles, emulate enough instructions to cross that threshold, then emulate the interrupt and hook other routines (redrawing the screen, filling sound output buffers, reading input, etc.) off of those events (see e.g. Marat Fayzullin's classic Emulator HOWTO [1]). This approach runs a lot of stuff just fine because it does synchronize to the most important events, but can cause problems. For example, "well-behaved" code generally only writes to graphics registers or sprite tables during blanking periods, as writing during active display is usually undefined behavior. Some code breaks the rules. Sometimes this is done intentionally to do cool effects with the hardware. Other times it's a side effect of a bug that wasn't caught because there are coincidentally no symptoms with the timing of the actual hardware. But then you plug it into an emulator with only roughly accurate timing and it blows up.

In reality, the clocks for the various components don't necessarily run at the same rate, or even at integer multiples of the CPU clock. You can have a situation where, for example, there are 3.5 clock cycles on the graphics hardware for every one CPU clock cycle. For a lot of the classic systems, this happened because a single higher master clock is divided down for each component.

A "cycle-accurate" emulator is one that operates as if the emulated state of all hardware were updated on every tick of the master clock. This wasn't generally done in the past because it was far too slow ~20 years ago when emulation of classic consoles and computers really took off.

More sophisticated hardware doesn't necessarily have any single master clock in this sense, so it doesn't make much sense to talk about a "cycle-accurate" emulator of a modern PC, for example.

[1] http://fms.komkon.org/EMUL8/HOWTO.html


I played around with this in my own NES emulator, which works roughly the way you describe. I found that, at least for major-brand titles (Mario/Zelda/Metroid/Kirby), cycle accuracy actually doesn't matter. Everything's based off PPU/APU/mapper interrupts.

In fact, doubling (or more) the CPU clock enhanced some of the games I tried. Animations in Kirby's Adventure became more smooth (e.g. the Spark ability). Screens full of enemies in Metroid ran with no slowdown. The glitchy first scanline of status overlays in various games cleared up. I haven't yet observed negative effects in major-party titles.

I had designed my emulator as an experiment: treat the NES as an abstract specification, rather than a concrete implementation. Turns out that a lot of games seem to have actually been designed following the same principle. It was a wonderful feeling to see these games as I imagine they were intended to be experienced, as if I opened a letter from the game designers left unopened for thirty years.


Bear in mind that sometimes, people want to do tricks that depend on the slowdowns and glitches that happen on real hardware.


I'm sure. That wasn't the point of my experiment.


And in fact some games won't work without them. http://arstechnica.com/gaming/2011/08/accuracy-takes-power-o...


Yeah, every emulator is pretty much going to be fine with Mario, Zelda, and Kirby, but that isn't really the point.


In my case it was.


I guess what I mean to say is that someone endeavoring to write a cycle-accurate emulator wants every game to work exactly like it did originally regardless of what bizarre hardware-specific hacks it relies on. For someone who just wants to play any of the top 20 most popular games on a system and isn't too concerned with minor deviations pretty much any emulator is going to suit their needs (in fact it's a common model to have hacks in the emulator just to make a particular popular game work right).


> It was a wonderful feeling to see these games as I imagine they were intended to be experienced

Did you mean the way they were not intended to be experienced? Otherwise it doesn't make sense. The way they were intended to be experienced is on a real console connected to a CRT TV.


It really depends on the game and what the designers and programmers considered ideal or non-ideal at that time. There might not even be unanimous opinion among the creators of what the ideal behavior would be; that's often seen in film, where an actor, director, and screenwriter all have somewhat different interpretations of a character. On one hand you have stuff like Cave shooters where they've tried to reproduce the slowdown in ports because it's expected in the genre and affects difficulty. On the other hand you have stuff like Shadow of the Colossus, which was almost certainly not intended to have extreme framerate drops.

We can objectively talk about various metrics of accuracy compared to the original hardware. Authorial intent is, pretty much by definition, a matter of opinion (and authors themselves, over the years, often change their ideas of what their intent was).


Well-said. For my experiment I took "authorial intent" to include:

* no framerate drops

* audio synthesized with band-limited step functions (including the "triangle" channel)

* video composed of band-limited scanlines

* video interrupt running at NTSC rate

* audio interrupt running at 240 Hz

* free-running audio synthesis

Like you said, it is entirely a matter of opinion what authorial intent is. For my purposes this definition provided an interesting experiment, and the result was aesthetically pleasing.


There's also the part of "authorial intent" where particularly at that time many of the games were designed and developed sometimes on much bigger hardware (corporate mainframes, for instance) targeting the consoles and then later QAed on development consoles (that might not even be the same hardware as production consoles). There certainly are questions for some games if the authors intended something much better that they could get their development hardware to perform. It's not that many console hardware generations back where even development consoles still varied in hardware from production consoles (at least as recently as the early history of the PS3/Xbox 360).

All told, it's all a part of game's version of the "the artwork is never quite finished/realized, it's just eventually published".


You still have the source?


More sophisticated hardware doesn't necessarily have any single master clock in this sense, so it doesn't make much sense to talk about a "cycle-accurate" emulator of a modern PC, for example.

Even in a modern PC, many of the clocks are divided down using PLLs which maintain a fixed frequency and phase ratio, so it is theoretically possible to make such an accurate emulator, but it would be difficult to write and many orders of magnitude slower than the real hardware.

For an example of PC software which doesn't run correctly on anything but real hardware or really accurate cycle-emulation, see this amazing demo:

https://trixter.oldskool.org/2015/04/07/8088-mph-we-break-al...


It seems like the "cycle-accurate" emulator would be a lot easier to write, is this true?


I'm the author.

I disagree with this statement. It takes a considerable amount of effort and research to reach cycle-accuracy, especially on the PPU side. Understanding how the PPU pipeline works will take you more time than everything else.

If you settle for frame-based emulation you can write the functions to emulate the opcodes without worrying about the order and duration of the internal operations. Once the opcode is emulated, you just add cycles to a global counter based on a table that contains the number of cycles per instruction.

After 29781 cycles (you may end up being a little off each time, if you are not cycle-accurate), you call a function to update the state of the PPU. There you can use familiar iterative constructs to perform the rendering.

Compare this with the state machine approach in my code (ppu.cpp) and how much more careful I have to be composing microinstructions to form opcodes (cpu.cpp). I came up with a (I think) clean design but it wasn't trivial.


I've only actually written one in the classic style. I doubt that cycle-accurate is easier in general. It definitely isn't if you count the effort that goes into discovering the timing information by running experiments on the original hardware. If you start with everything documented, I guess it might be the least effort needed to bridge the gap between 99.5% compatibility and 100%.


TLDR: each CPU instruction takes the exact amount of time it does on the real cpu.

You generally only care about cycles in real-time stuff. In simpler CPUs like those used for microwaves and such, each CPU instruction takes constant time. You can count the number of instructions in your assembler loop and know how long the loop will take. Mind you sometimes each instruction takes constant time but some can take more time than others. A cpu cycle is defined as 1/clockspeed . A fast instruction can take 1 cycle, others 3 or more.

Usually there's multiple cpu instructions for GOTO and loading from memory, a fast set for things "close by" and a slow set for far away code or data. Loops and conditional instructions even on simple cpus can take different amounts of time depending on the outcome, so it gets pretty complicated to time things sometimes.

So when doing really low level timing critical code, you can use a super accurate signal generator to drive your CPU clock, then count your instructions to time things. Common uses include generating audio from bit banging and similar.

In more complex CPUs a number of things made it difficult or impossible to figure out exactly how long an instruction will take. None of the software for these CPUs is written to depend on exact cycle timings. Because nothing depends on cycle accuracy you don't have to worry about the timings when emulating these kinds of CPU's.


For extra speed, a lot of emulators take shortcuts. So, you might have a "frame-accurate" emulator that doesn't match the state of a "real" Nintendo on a cycle-by-cycle basis, but does by the end of the frame. There are a limited number of things triggered in the hardware (interrupts, set register flags, etc), and sometimes, like when the CPU is in an idle loop, the emulator can just skip ahead by a bunch of instructions until the next thing that needs to be handled.

Or in the graphics processor, maybe you'll blit out whole sprites/tiles at once, instead of rendering them pixel-by-pixel, the way that the hardware does.

In a cycle-accurate emulator, you're going to run every opcode, process the graphics pixel-by-pixel, etc.


Some understanding of context is expected to grasp implications of software. I'm not sure a github readme is the place to provide a complete education about the domain of the repository.

In this case, a 'cycle' is a processor cycle. Most emulators are imperfect, usually due to speed considerations, or to lack of access/understanding of the specifications of the original hardware. A short codebase that is entirely accurate is an achievement.


NES instructions are broken down into a series of smaller, simpler steps by the CPU. Each of these steps executes in a fixed amount of time - a cycle. For example, the "load value from memory into register A" instruction might take two steps: 1) fetch memory on the bus 2) move the fetched value into the register.

Cycle-accurate means the emulator is emulating all of the little steps that make up an individual instruction. Instruction-accurate emulators ignore the smaller steps and treat instructions as indivisible.

Cycle-accurate NES emulators only really matter for emulating certain graphical and sound effects.


I guess it's about emulation accuracy. Some emulators fail to emulate correctly for the purpose of speed, which causes visual and sound artifacts.

This comes to the host CPU power. The more accurate the emulator, the more power from the host will be needed.

UPD: Found a wiki article: http://emulation-general.wikia.com/wiki/Emulation_Accuracy


I take that to mean that the timing of instruction execution is true to the original, hence no unintended glitches.


People always say that terse code is hard to read and understand. I'd say I just learned a lot from a quick skim of the concise source here. If it had been structured with tons of white space and split over dozens of files/folders/modules, I'd have no chance of understanding it at a glance.


Terse code is easy to read. Gratuitously compact code isn't

The rule of thumb is that if it's short but in no way obfuscated, it's terse. If it's short, and impossible to read because of it, it's gratuitously compact.


It is in multiple files, but still very terse, eg. using single letter variables.

   /* CPU state */
   u8 ram[0x800];
   u8 A, X, Y, S;
   u16 PC;
   Flags P;
   bool nmi, irq;
I like it.


I'm the author.

The reason for the single letters is that those are the actual names of the 6502 registers.

Glad you like it overall. :)


It's really nice that you keep it so close to the actual hardware (not just the variable names, but also the way the logic works in for example PPU).

I also love the way you use C++ templates to simplify things!


accumulator xchange ... y, something stacky perhaps?

program counter

flagPort

non maskable interrupt, interrupt request

I'm sure you knew that though, it's common micro processor nomenclature.


Specifically on 6502-derived processors, you have the following registers: Accumulator, X and Y indexes, Stack Pointer, Program Counter, Flags. Of these, only the Program Counter is double-width.

Additionally, the zeroth "page" of memory (the lowest 256 bytes) are accessible with a dedicated addressing mode which saves space and time in the program, and can be used as a form of cache.


the variables are named for the registers they represent, it's not single letter variables for the sake of compactness.


OK, that was an unfair pick, other variables have longer strings. I just picked the example, because I thought it was cute how short the CPU status is, and because I couldn't remember what nmi ment, at first. Trying to talk about it brought it back to mind, so that worked for me, but I blamed the project in vain, because the decision was made elsewhere. Those electronic engineers!! Let's argue about better names for x and y. is anyone even reading this? I'm prolly not gonna come back to this either.


Absolutely agree. I'm writing an NES emulator myself and just glancing at this code has given me a few line-saving ideas.


That is very cool.

I'm always kind of in awe of this sort of thing. Maybe I should try to do it. Should take some of the awe away.

But I should probably focus on sucking less, first.


I'm the author.

Everything you need is on the NESdev wiki or it's linked there. A couple of particularly helpful resources are linked in my README. The PPU diagram and the 6502 reference were especially useful.

I definitely encourage you to do it, it's a great learning experience and very rewarding. When games start running is pure programming ecstasy :)


Hey, thanks.


Do it. Emulation has always fascinated me, so I wrote a (kind of crappy) NES emulator a few years ago. It demystifies things, there's a ton of documentation out there, and you'll get the joy of increasing numbers of games working as you make progress.


Try making a small virtual machine first - https://github.com/tekknolagi/carp/blob/master/README.md


Whoa, that's my project. Wouldn't recommend looking at that one, but definitely take a look at it's successor (linked on the README).


I recommend this reference quite highly: http://wiki.nesdev.com/w/index.php/NES_reference_guide


Another interesting project to take away some magic: https://github.com/ssloy/tinyrenderer/wiki


I did it precisely in order to not suck. The courses at the university seemed too toy-like, so I decided to write something bigger and with a real goal. 7500 lines of 386 and then some Modula 2, and a bit of AWOL from university.

In my case the Sinclair Spectrum, which I did cycle-accurately on about a 25MHz 386 and fast enough to play Jetpac on the slowest 386 ever sold. Cycle accuracy really only helped with pitch-perfect sound.

It was fun, and pushing something from a blank sheet to a working program is... shall we call it a practical exercise in not sucking?


Starting with a chip8 emulator is also a good exercise. We made one in JS in less than 1kb for this 13kb games compilation: http://js13kgames.com/entries/26-games-in-1 (unminified version here: http://xem.github.io/chip8/c8.html)


I agree! I started a chip8 emulator in Rust. Once I get video and input finished, I'm considering writing a Gameboy emulator.

https://github.com/TrevorS/rustychip8


As someone currently writing an NES emulator, do it! There are a ton of references to help you figure everything out it's been a pretty interesting experience.


There's a lot of docs out there, and some test roms to help. A CPU emulator is actually pretty straightforward. Adding the PPU is more complex, of course.


First I'd like to say this project represents fantastically impressive achievement regardless.

However, putting this repo through `cloc` reveals the HN title to be rather misleading.

EDIT: I had initially read through {cpu,apu,ppu,gui,joypad,mapper}.{cpp,h} and noticed that the mental tally I was taking had run well over 1000LOC. In my haste I quickly cloned the repository and ran cloc against the current dir which massively inflated the result (Doh!). See author's comment below for a more sensible figure.


Sorry, but that's really not fair. You are running that on all the folders inside the src/ directory (and the README?), including blargg's libraries that I'm using. I'm not counting that. Go ahead and include those if you want. It's code I didn't write, and bigger than the rest of the emulator! I don't think that's representative.

CPU and PPU implementations tend to be in the order of the thousands of lines -- they are around 200 and 300 lines respectively in LaiNES. In fact, most of the code is in the GUI that I could easily strip away if this was a competition. And this wasn't written to be small - it was written to be simple. It also came out small, but that's incidental.

Here's how I counted the lines and how I decided on the description for the repository, which by the way has been catapulted from totally unknown to worldwide attention overnight, and it's now object of unexpected, ruthless scrutiny that I couldn't foresee.

  [andrea@manhattan src]$ rm -rf boost nes_apu Sound_Queue.*
  [andrea@manhattan src]$ cloc .
        24 text files.
        24 unique files.                              
         1 file ignored.

  github.com/AlDanial/cloc v 1.70  T=0.03 s (780.3 files/s, 63170.2 lines/s)
  -------------------------------------------------------------------------------
  Language                     files          blank        comment           code
  -------------------------------------------------------------------------------
  C++                             11            210            110           1163
  C/C++ Header                    12             87              7            285
  -------------------------------------------------------------------------------
  SUM:                            23            297            117           1448
  -------------------------------------------------------------------------------


I agree with this figure, please see my edited comment above.

Still though, my gripe with the HN title still stands: to say this is ~1000 LOC is a bit rich. It's still bloody small so why try to shoehorn it into this category?

> And this wasn't written to be small - it was written to be simple. It also came out small, but that's incidental.

I think that why this is pure gold! Because it's simple, it's easy to understand, I would have been hopping with glee if this had been available to me as a teenager, instead of having to read tons of articles/textfiles of varying quality with lots of trial, error and head-scratching. It's size is besides the point, and why the title is still - in my opinion - misleading. I can't help but feel the very people that would benefit the most from this might possibly be put off that they're going to be presented with some indecipherable demo comp entry.

Either way keep up the good work.


No problem, we are cool. I see your point.

I believe there is still a lot of room for improvement in terms of accuracy, clarity and code size. Note that this repository was more than 3 years old. Maybe this will motivate me to improve it even further.


I think that'd be pretty cool to see. I'd be especially be interested how quickly the returns in size diminish given your starting point. Likewise, how much it'd would have to change architecturally as it gets smaller. Sounds like a slightly more extreme form of the game Shenzen-I/O.

While reading through the repo one thing I kept thinking was it would be nice IMO would be decoupling the everything from the GUI so that it was a little 'flatter'. So `main` would call `NES::run()` (or something), and `NES` would leverage GUI. GUI would be just drawing stuff. (In my head at least,) it feels like that way it might be easier to mentally partition things, for those using it as a learning project. As `NES` would be responsible for ownership and interop that GUI is doing now. I'll add it to my todo list and perhaps in god-knows-when I'll fork it and do this if you haven't gotten round to it :) Having said that it's inspired me to finish my first Go project which was a GameBoy emulator. Was started mainly to deep-dive the language, but I think it could be useful in a similar way if cleaned up and documented for folks.


I'm guessing that that figure is just for the emulation code. The UI and sound interface code eat up quite a few lines of code. Doesn't hurt to actually dig instead of karma farming...


I had read the repo first and just from counting in my head it was deffo over 1000 lines. Turns out the author agrees, that it's almost 1500 LOC. I was a bit hasty (read: dumb) in generating the table is all :)

Maybe calm down with the "karma farming" accusations, my problem was with the HN title, not the repository.


Also I'm not the author of the HN submission... ;)


Any plans for a libretro port?


Maybe of you inline entire functions...


Functions which are only "called" once, which I think is a perfectly reasonable thing to do.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: