Hacker News new | past | comments | ask | show | jobs | submit login
Doom engine code review (fabiensanglard.net)
265 points by franze on April 22, 2011 | hide | past | favorite | 28 comments



It's amazing that this would run halfway well on a 33 MHz 486. Doom had a 35 fps cap, and ran at 320x240 (square pixels):

2.7 million pixels per second at 35 fps (the cap).

1.4 million pixels per second at 18 fps (~50% of cap).

At the more realistic target of 18 fps, you have 24 clock cycles per pixel. A 486 averaged about 0.8 instructions per clock, so you're looking at 19 instructions per pixel. With a 33 MHz memory bus and the DRAM of the day, you're looking at about 5 clocks for memory latency. That looks like an upper bound of no more than 4 memory operations per pixel.

A convincing 3d renderer averaging 19 instructions and 4 memory operations per pixel. And we're not even counting blit/video delays here. Good lord is that savage optimization work. Carmack is famous for a reason.

P.S. The really scary thought is that Doom would hypothetically run on any 386 machine -- can you imagine painting e.g. 160x120 on a cacheless 20 MHz 386 laptop?


IIRC, the 486 was not pipelined, so you did't get 19 instructions, 4 of which could be memory instructions. You got 19 arithmetic instructions or 4 memory instructions, or something in between (without checking your math, but as I recall, that's roughly right).

There are basically two inner loop types in a game like Doom - the wall loop and the floor loop. The wall loop renders a vertical strip of pixels on a wall and the floor loop renders a horizontal strip of floor. Each of these loops is actually rather trivial to write and just walks over the pixels reading an input texel, modulate by lighting, and then write out to the screen buffer. (Actually, Doom had transparent textures for some walls, so that's a third type of loop). The wall loop is slightly simpler because it can be an axis aligned walk through the input texture.

There really isn't very much room to optimize these loops. You can unroll them. You can play with how you do the adds and carrys and maybe shave off another instruction. For walls, you can organize your texture data so that vertical neighbors are consecutive in memory. You can be very choosy about your lighting function. The hard work was setting everything up for your inner loop.


The 486 was pipelined but not superscalar. You could approach 1 instruction per clock cycle but you could never go over 1.

---

286: protected memory extensions for x86

386: useful protected memory extensions for x86. (Hardly anything used 286 protected memory.)

486: first RISC-like pipelined x86

586: first superscalar x86


You're right. What I really meant was that instructions were processed 'in order'. An uncached memory access slowed things down a particular amount and there was no way to cover the latency with other instructions.


The figure I found suggested that a 486 averages ~0.8 instructions per clock running full-tilt. That seems impossible unless one hardly ever hits memory.

I'm not 100% certain, but I think the pipelining would allow you to execute (some) register-only instructions in the x86 equivalent of a delay slot.


According to the specs, the original 80486 could read or write 16 bits per clock. That's probably on a DX - on a DX2 it's probably 2 clocks. And that's with in order execution - the bus was running at the same frequency as the CPU.


If I recall correctly, you had to shrink the view port on lower-end 486s for playable framerates. Also, the status bar shrank the area to be rendered to some extent.

Your point still stands, just pointing out that the target area for rendering was often smaller than 320x240.


I remember playing multiplayer Doom over a modem back in the days. I had to shrink the view port down to the size of a postage stamp to maintain reasonable performance; I felt like a T-Rex playing like that (if you didn't move, I probably wouldn't notice you [especially if you were indigo]).


I remember it was very playable on the 33mhz, 8mb 486DX machines I had at my high school. A friend had a 50mhz 12mb 486SX-2 which was wonderful for Doom. I sadly only had a 16mhz 386sx with 2mb of RAM. I amazingly did get Doom to "run" on this machine through some virtual memory program for Win 3.1. I had to shrink the screen size down to the smallest size for it to run at about 1fps.

I also remember a couple of other people that had "486" upgrades for their 386 based systems and Doom was perfectly playable on those systems as well. RAM was the big limiting factor that I remember.


Why'd you run it through Windows? I had a 33Mhz 386 and 2 (4?) MB of ram. I made a boot disk, and it worked fine.


I also done that on a friends' 386SX 25MHz machine that had only 2MB of ram. Doom required at least 4MB (sounds ridiculous as I am typing) and although it used the DOS4GW dos extender that was capable of swapping, it never worked.

On the other side, Win 3.1 was also capable of swapping and ran Doom. It was absolutely unplayable though :)


You know, I still have that 386 in my mom's attic. I'm tempted to break it out and see what it can do.

Ultima VII required a 33MHz 386DX w/ 4Mb of ram, and my computer ran it fine with a boot disk. If memory serves (and I was 7 at the time, so it probably doesn't), a DOS 6.22 bootdisk contained fields for page size, which leads me to believe it had support for virtual memory. That said, I do remember the massive stink that was made about virtual memory when Windows 95 came out, so I could be wrong.

I'm highly tempted to break out that old box and play around with it. This conversation tickles my nostalgia bone.


Doom needed 4mb of free memory. If you had 4mb total, you had to create a boot disk. 8mb machines worked fine without a boot disk.

I also, amazingly, remember running some early version of Pagemaker on that machine. When I went to link text, I would click the mouse to start the process, go get a sandwich, watch some TV and come back 10 minutes later when it finished.


The really scary thought is that Doom would hypothetically run on any 386 machine -- can you imagine painting e.g. 160x120 on a cacheless 20 MHz 386 laptop

I used to play Doom with a AMD386sx/33 with an ULSI coprocessor (a i387 clone)


Doom actually ran at 320x200

Good points regardless though.


The first (and only one, currently) comment brought me back years ago! A major performance trick back then was to ensure the code and data would remain into the (very small) cache, as well as preferring structures that would be read in order.

====================

"Because walls were rendered as columns, wall textures were stored in memory rotated 90 degrees to the left. This was done to reduce the amount of computation required for texture coordinates"

The real reason is faster memory acces when reading linearly on old machine, less cpu cache clear. It's an old trick used on smooth rotozoomer effect in demo scene year ago.


Actually, rotating a texture 90 degrees in memory when making a rotozoomer still results in large amounts of cache misses depending on the current rotation angle. In order to get a smooth rotozoomer effect you have to use a technique called block rendering. This means you divide the image in square cells and render these one by one. Because the pixels in the cell are always near eachother you can reduce cache misses. There is an old snippet from Niklas Beisert floating around the web that explains exactly how this works.


Yup :) Here's the code if someone is interested:

ftp://latvia.tucows.com/pub/mirror/x2ftp/msdos/programming/demosrc/pasroto.zip

(I managed to find it back thanks to hornet.org)

Edit: you can also check out the list at ftp://latvia.tucows.com/pub/mirror/x2ftp/msdos/programming/demosrc/00index.txt


Thanks for digging this up. Much appreciated.


Actually, the real reason was that Doom ran in "Mode X", where it was actually faster to rasterize vertical strips rather than horizontal ones. This worked out especially well in Wolf3D and Doom, since perspective correction in the vertical direction was never needed.


Actually it didn't it ran in the bog standard 320x200 256 color mode. mostly because using "mode x" back then wasn't as widely supported on lots of low end machines.

  mov ax, 13h
  int 10h
ah those were the days.



Looking at the Doom source, it seems that the actual 3D game view was rendered into a linear buffer, and subsequently blitted onto the mode X surface. Some UI elements went straight to the screen.

(The DOS Doom source was never publicly released, just the Linux port, but there are remnants of some DOS code in the archive. I'm specifically looking at R_DrawSpan in README.asm and V_DrawPatchDirect in v_video.c)


That may be where my misunderstanding came from then, didn't realize the original dos version wasn't what was released.



Michael Abrash' "Zen of Graphics Programming" has a good overview of many of the tricks used during that era: http://www.amazon.com/Zen-Graphics-Programming-Ultimate-Writ...

(now, of course, mainly to be read for nostalgic reasons).


... and, to avoid similar architectural mistakes in the future.

History is useful. We're not at the point where we want CS students to memorize what happened at the Battle of Algol in 1968, but it's darned close.


Any qed users? I liked it for map editing most.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: