Hacker News new | past | comments | ask | show | jobs | submit login
An 8086 PC emulator in 4043 bytes (ioccc.org)
196 points by epsylon on Jan 4, 2014 | hide | past | favorite | 38 comments



This is interesting, though it might be a bit of a cheat. He's put some data in the BIOS image to assist in decoding instructions (not to take away from how amazing this is!):

"CPU supports the full 8086/186 instruction set. Due to the complexities of the 8086’s arbitrary-length instruction decoding and flags, 8086 instructions are first converted to a simpler intermediate format before being executed. This conversion, along with instruction lengths and how each instruction modifies the flags, is assisted by some lookup tables which form part of the BIOS binary."

With that in mind, the BIOS is _hand-written_ and about 12kB (gzips to <4kB), so that's also a fantastic achievement. All-in-all, this is a wonderful entry.


Abuse of the rules, if done well, is a time-honored IOCCC tradition.


It's not the only abuse of the rules. Looking at the hint, I think there's some flaw in the size-testing tool which they've exploited.


The hint mentions "the importance of comments". The only comment I can spot in the actual code is here (with a couple of characters of context):

    I/**/n
The code up to that point (2100 bytes total) is roughly as long as the listed output of the size tool (1979). The difference is probably due to the fact that I just counted raw bytes rather than accounting for whitespace and such. My guess, then, is that /* */ without a space hits a bug in the size tool where it probably counts that as the beginning of a comment but misses the end, causing it to count the entire remainder of the program as a comment. The compiler, meanwhile, will see that as an empty comment and continue reading code.

This was a fairly predictable consequence of providing an official size tool. Used to be, they just laid out the rules for figuring out sizes, and you had to either count manually, or build your own tool to count. Now, bugs in the size tool end up being bugs in the rules, and fun like this ensues. Whether this is good or bad is a matter of opinion. I would say it certainly makes things interesting!

Edit: HN formatting makes the inline form of that comment look like a C++/C99-style // comment. Fun!


Clearly the official tool runs the C source code through a markdown interpreter first!


I went through and reformatted the code, after expanding all of the macros that contained unbalanced parens: http://eigenstate.org/cable3-formatted.c

I still haven't puzzled out exactly what all the variables are, but I think that this makes the structure much clearer.


The source itself: http://ioccc.org/2013/cable3/cable3.c

Or after running through preprocessing (not a huge help I found, sadly): https://gist.github.com/peterc/8259713


Strangely, visiting that Gist, GitHub asks me to log in. When I visit in a private browsing window, the Gist is displayed. Wasn't Google doing something like this, preventing registered users with an old cookie from seeing public content without signing in?


Did you do this by hand? Line 38 looks weird: main(BX,nE)charnE; {


That's an old-school C function declaration. Note that the function return type are parameter types are implicitly int. nE is then explicitly declared to be a char pointer pointer.


But the char is outside of the declaration and before the opening brace - is this legal? And especially, why isn't it written like main(BX,charnE)? Doesn't save any bytes,it rather adds one semicolon...


Yes, it's very old fashioned (K&R era).

I think the reason for doing it this way rather than putting the type declaration alongside the parameter is then it would also demand also adding an int declaration for the BX parameter which would then cause the code to be longer. That is to say with:

    main(BX,char**nE){
.. it would throw an expected identifier error on the char. So you'd be putting the original:

    main(BX,nE)n**nE;{
Against this error-free alternative:

    main(int BX,n**nE){
.. but that's one byte longer.

If I got any of this wrong, I hope someone will correct me though :)


Sounds about right. The author mentions using K&R style function declarations as a way of shaving bytes in his writeup, too.


Ah, thanks for the explanation. Looks like the author did a deep, deep dive in the C language specs!


K&R predates the specs, actually. C89 codified the current C parameter convention. Compilers typically continue to support K&R style for very old C code and because it doesn't break modern C support.


Because it's not entirely about size, it's an obfuscation contest. Anything strange or unusual, especially something most people haven't seen used, will help obfuscate the code.


No, just passed through gcc -E which essentially preprocesses the file only (in this case in order to unpack all the defines) without fully compiling.


For those that don't know C, basically preprocessing (cpp, gcc -E, clang -E) is one of the first steps to occurs before the C compiler is invoked. It's easiest to think of the preprocessor as a templating language that can be used (or abused) to DRY up code and make it easier to maintain and customize. So the only thing that ever really gets compiled is the amalgamated processed .c files altered and configured by .h files and preprocessor options.


In this case, even with the preprocessor and using "indent", the code is non-understandable. It requires a huge effort of deobfuscation (operator precedence mess, re-"macrofy" the code -as after expansion is very redundant, etc.). From the 4KB it goes to 30KB. After re-"macrofying" the code it drops to 15KB of somehow understandable code. It is a beautiful piece of code, no doubt.


> Instruction codings are complex and irregular in size and structure, with multiple addressing modes and no consistent memory placement for operands, very often multiple possible encodings for the same instruction

It's only complex and irregular if you look at it wrong. ;)

The x86 (like the 808{5,6} and the Z80) instructions look organised in an octal format: ftp://mipt.cc/Opcode.txt

I haven't inspected the code in any detail to see if it takes advantage of this.


I thought "Wow, that's a neat trick."

Then I saw AutoCAD, Flight Simulator, and Windows.

Mind = blown.


Here's a link to a fully functional Windows binary (built with Visual Studio 2013):

http://sdrv.ms/1hodqjP

Just add the bios and image files elsewhere.


Missing MSVCR120D.dll. Can you compile in release mode or include that DLL too? Installing the Visual C++ Redistributable Packages for Visual Studio 2013 didn't help.


Updated with Release-build that should use the DLL in VS2013 redist package.


Note that you can link with MSVCRT.DLL to get an even smaller binary. Now you have a tiny emulator binary built from a tiny emulator source.


/snip This entry weighs in at a magical 4043 bytes (8086 nibbles, 28,301 bits). It manages to implement most of the hardware in a 1980’s era IBM-PC using a few hundred fewer bits than the total number of transistors used to implement the original 8086 CPU. /snip

Just a small reminder how far we have come - and what Moore's law (RIP) has done for us. Yes the original PC had few enough transistors one could actually count them. NowadAys ...

An awesome piece of work


Neat, the linked hd.img file contains windows, Simcity, AutoCAD, etc. Has anyone compiled this for Windows? If so, would you kindly post the binary? It'd be fun to play with this old software again.


The BIOS is part of the PC, yet it's not included in the 4043 bytes.


The BIOS is replaceable software. Standard on the IBM PC, but not the "bare metal" itself. One can only emulate so much within 4096 bytes.


"Due to the complexities of the 8086’s arbitrary-length instruction decoding and flags, 8086 instructions are first converted to a simpler intermediate format before being executed. This conversion, along with instruction lengths and how each instruction modifies the flags, is assisted by some lookup tables which form part of the BIOS binary."

I agree regarding the BIOS, but this sounds like the BIOS image contains "microcode" necessary for the emulator to run, which would seem to be cheating just a little.


If you're on Linux and don't want to bother installing a VGA font, there's no need. Do Ctrl+Alt+F6 (or lower) to switch to a terminal and run it there, then do Ctrl+Alt+F7 to switch back later.

Also, if someone can tell me how to make Windows not make the emulator segfault on my machine, that'd be great.

EDIT: Oh, it works fine in a graphical terminal. I'm guessing SDL failing to init on a text terminal broke everything.

EDIT 2: ALL THIS, PLUS REVERSI! You can even play Reversi!


MS Visual C++ 2013 chokes on this with "'KB' undeclared identifier" error.


Oh it's defined in the makefile... never mind.


Has someone already unobfuscated the source code?


I did. Do you know the software license of the original code?

edit: P.S. working deobfuscation, i.e. you can build the executable. Including some minor corrections to the orgininal (e.g. 1MB of RAM allocation instead 2MB, BIOS file size limited to 65536-256 bytes, etc.). It a nice code, minimalistic, e.g. the "rep" prefixed instructions are actually repeated, not unrolled. Really optimized for code size. Even deobfuscated it takes 12-13KB and 22KB of executable (x86-64 Linux, stripped binary). Very cool.


All IOCCC winners — http://ioccc.org/years.html#2013


Is licensed/copyrighted? Do the IOCCC have any implicit license?


Fantastic. Finally some "hacker news" on Hacker News ;P




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: