Doom engine code review

hapless · on April 22, 2011

It's amazing that this would run halfway well on a 33 MHz 486. Doom had a 35 fps cap, and ran at 320x240 (square pixels):

2.7 million pixels per second at 35 fps (the cap).

1.4 million pixels per second at 18 fps (~50% of cap).

At the more realistic target of 18 fps, you have 24 clock cycles per pixel. A 486 averaged about 0.8 instructions per clock, so you're looking at 19 instructions per pixel. With a 33 MHz memory bus and the DRAM of the day, you're looking at about 5 clocks for memory latency. That looks like an upper bound of no more than 4 memory operations per pixel.

A convincing 3d renderer averaging 19 instructions and 4 memory operations per pixel. And we're not even counting blit/video delays here. Good lord is that savage optimization work. Carmack is famous for a reason.

P.S. The really scary thought is that Doom would hypothetically run on any 386 machine -- can you imagine painting e.g. 160x120 on a cacheless 20 MHz 386 laptop?

shasta · on April 22, 2011

IIRC, the 486 was not pipelined, so you did't get 19 instructions, 4 of which could be memory instructions. You got 19 arithmetic instructions or 4 memory instructions, or something in between (without checking your math, but as I recall, that's roughly right).

There are basically two inner loop types in a game like Doom - the wall loop and the floor loop. The wall loop renders a vertical strip of pixels on a wall and the floor loop renders a horizontal strip of floor. Each of these loops is actually rather trivial to write and just walks over the pixels reading an input texel, modulate by lighting, and then write out to the screen buffer. (Actually, Doom had transparent textures for some walls, so that's a third type of loop). The wall loop is slightly simpler because it can be an axis aligned walk through the input texture.

There really isn't very much room to optimize these loops. You can unroll them. You can play with how you do the adds and carrys and maybe shave off another instruction. For walls, you can organize your texture data so that vertical neighbors are consecutive in memory. You can be very choosy about your lighting function. The hard work was setting everything up for your inner loop.

hapless · on April 22, 2011

The 486 was pipelined but not superscalar. You could approach 1 instruction per clock cycle but you could never go over 1.

---

286: protected memory extensions for x86

386: useful protected memory extensions for x86. (Hardly anything used 286 protected memory.)

486: first RISC-like pipelined x86

586: first superscalar x86

shasta · on April 22, 2011

You're right. What I really meant was that instructions were processed 'in order'. An uncached memory access slowed things down a particular amount and there was no way to cover the latency with other instructions.

hapless · on April 22, 2011

The figure I found suggested that a 486 averages ~0.8 instructions per clock running full-tilt. That seems impossible unless one hardly ever hits memory.

I'm not 100% certain, but I think the pipelining would allow you to execute (some) register-only instructions in the x86 equivalent of a delay slot.

shasta · on April 23, 2011

According to the specs, the original 80486 could read or write 16 bits per clock. That's probably on a DX - on a DX2 it's probably 2 clocks. And that's with in order execution - the bus was running at the same frequency as the CPU.

hunterjrj · on April 22, 2011

If I recall correctly, you had to shrink the view port on lower-end 486s for playable framerates. Also, the status bar shrank the area to be rendered to some extent.

Your point still stands, just pointing out that the target area for rendering was often smaller than 320x240.

Splines · on April 22, 2011

I remember playing multiplayer Doom over a modem back in the days. I had to shrink the view port down to the size of a postage stamp to maintain reasonable performance; I felt like a T-Rex playing like that (if you didn't move, I probably wouldn't notice you [especially if you were indigo]).

rograndom · on April 22, 2011

I remember it was very playable on the 33mhz, 8mb 486DX machines I had at my high school. A friend had a 50mhz 12mb 486SX-2 which was wonderful for Doom. I sadly only had a 16mhz 386sx with 2mb of RAM. I amazingly did get Doom to "run" on this machine through some virtual memory program for Win 3.1. I had to shrink the screen size down to the smallest size for it to run at about 1fps.

I also remember a couple of other people that had "486" upgrades for their 386 based systems and Doom was perfectly playable on those systems as well. RAM was the big limiting factor that I remember.

redthrowaway · on April 23, 2011

Why'd you run it through Windows? I had a 33Mhz 386 and 2 (4?) MB of ram. I made a boot disk, and it worked fine.

agazso · on April 23, 2011

I also done that on a friends' 386SX 25MHz machine that had only 2MB of ram. Doom required at least 4MB (sounds ridiculous as I am typing) and although it used the DOS4GW dos extender that was capable of swapping, it never worked.

On the other side, Win 3.1 was also capable of swapping and ran Doom. It was absolutely unplayable though :)

redthrowaway · on April 24, 2011

You know, I still have that 386 in my mom's attic. I'm tempted to break it out and see what it can do.

Ultima VII required a 33MHz 386DX w/ 4Mb of ram, and my computer ran it fine with a boot disk. If memory serves (and I was 7 at the time, so it probably doesn't), a DOS 6.22 bootdisk contained fields for page size, which leads me to believe it had support for virtual memory. That said, I do remember the massive stink that was made about virtual memory when Windows 95 came out, so I could be wrong.

I'm highly tempted to break out that old box and play around with it. This conversation tickles my nostalgia bone.

rograndom · on May 2, 2011

Doom needed 4mb of free memory. If you had 4mb total, you had to create a boot disk. 8mb machines worked fine without a boot disk.

I also, amazingly, remember running some early version of Pagemaker on that machine. When I went to link text, I would click the mouse to start the process, go get a sandwich, watch some TV and come back 10 minutes later when it finished.

pmarin · on April 22, 2011

The really scary thought is that Doom would hypothetically run on any 386 machine -- can you imagine painting e.g. 160x120 on a cacheless 20 MHz 386 laptop

I used to play Doom with a AMD386sx/33 with an ULSI coprocessor (a i387 clone)

grimlck · on April 22, 2011

Doom actually ran at 320x200

Good points regardless though.

thibaut_barrere · on April 22, 2011

The first (and only one, currently) comment brought me back years ago! A major performance trick back then was to ensure the code and data would remain into the (very small) cache, as well as preferring structures that would be read in order.

====================

"Because walls were rendered as columns, wall textures were stored in memory rotated 90 degrees to the left. This was done to reduce the amount of computation required for texture coordinates"

The real reason is faster memory acces when reading linearly on old machine, less cpu cache clear. It's an old trick used on smooth rotozoomer effect in demo scene year ago.

thedjinn · on April 22, 2011

Actually, rotating a texture 90 degrees in memory when making a rotozoomer still results in large amounts of cache misses depending on the current rotation angle. In order to get a smooth rotozoomer effect you have to use a technique called block rendering. This means you divide the image in square cells and render these one by one. Because the pixels in the cell are always near eachother you can reduce cache misses. There is an old snippet from Niklas Beisert floating around the web that explains exactly how this works.

thibaut_barrere · on April 22, 2011

Yup :) Here's the code if someone is interested:

ftp://latvia.tucows.com/pub/mirror/x2ftp/msdos/programming/demosrc/pasroto.zip

(I managed to find it back thanks to hornet.org)

Edit: you can also check out the list at ftp://latvia.tucows.com/pub/mirror/x2ftp/msdos/programming/demosrc/00index.txt

dman · on April 22, 2011

Thanks for digging this up. Much appreciated.

CamperBob · on April 22, 2011

Actually, the real reason was that Doom ran in "Mode X", where it was actually faster to rasterize vertical strips rather than horizontal ones. This worked out especially well in Wolf3D and Doom, since perspective correction in the vertical direction was never needed.

simcop2387 · on April 22, 2011

Actually it didn't it ran in the bog standard 320x200 256 color mode. mostly because using "mode x" back then wasn't as widely supported on lots of low end machines.

  mov ax, 13h
  int 10h

ah those were the days.

CamperBob · on April 22, 2011

Afraid not. (http://www.idsoftware.com/wolfenstein3dclassic/wolfdevelopme..., "Programming notes")

andrewf · on April 22, 2011

Looking at the Doom source, it seems that the actual 3D game view was rendered into a linear buffer, and subsequently blitted onto the mode X surface. Some UI elements went straight to the screen.

(The DOS Doom source was never publicly released, just the Linux port, but there are remnants of some DOS code in the archive. I'm specifically looking at R_DrawSpan in README.asm and V_DrawPatchDirect in v_video.c)

simcop2387 · on April 23, 2011

That may be where my misunderstanding came from then, didn't realize the original dos version wasn't what was released.

light3 · on April 22, 2011

Link to nice doom port at bottom: http://www.chocolate-doom.org/wiki/index.php/Chocolate_Doom

Luc · on April 22, 2011

Michael Abrash' "Zen of Graphics Programming" has a good overview of many of the tricks used during that era: http://www.amazon.com/Zen-Graphics-Programming-Ultimate-Writ...

(now, of course, mainly to be read for nostalgic reasons).

kabdib · on April 23, 2011

... and, to avoid similar architectural mistakes in the future.

History is useful. We're not at the point where we want CS students to memorize what happened at the Battle of Algol in 1968, but it's darned close.

Tyrant505 · on April 22, 2011

Any qed users? I liked it for map editing most.