Hacker News new | past | comments | ask | show | jobs | submit login
Atari 2600 hardware design: Making something out of almost nothing (bigmessowires.com)
98 points by zdw on Jan 14, 2023 | hide | past | favorite | 41 comments



I can recommend „Racing the Beam“ to anyone curious about the Atari 2600. It’s a great book! https://en.m.wikipedia.org/wiki/Racing_the_Beam


I am currently reading it. It’s not as dense technically as this article, but does a great job of showing progress from the original two games, Pong and Combat, to more technically complex and innovative ones like Adventure, Yar’s Revenge and Pitfall.


+1, a super interesting read


I have actually recently picked up Atari 2600 homebrew development as a hobby.

My reasons are:

1. The hardware is simple enough that you can completely understand everything that is going on. And you absolutely need to understand it, if you want to make good games. Just a great feeling of power and control.

2. You need to use assembly. Even with 8-bit era computers you don't really need to use assembly. Sure, for performance but for many task even the dog-slow Basic of the C64 could get you far. Not many excuses to write substantial amounts of assembly these days, so I really enjoy the challenge.

3. You are forced towards radical simplicity but the hardware is still strong enough that you can make games people might want to play. Even in 2023.

There are great fantasy-consoles like the pico-8 these days but they don't scratch the same itch for me. I enjoy the historicity of programming for a real and very influential console.

Plus, the 6502 instructions set you learn used to be ubiquitous, giving you a head start if you want to program for the C64, NES, Apple ][ and many more systems.

I recommend using https://8bitworkshop.com/ as you can get an instant feedback loop when developing your game.

I haven't yet finished making a game but I am having great fun!


> What’s worse, this tiny amount of RAM had to serve as both the scratchpad/heap and as the stack!

Heap? Lol you are not doing ANY dynamic allocation of any kind on the Atari 2600.

Also you're not using the stack to pass parameters to routines here either. You might not even use subroutines.

> This pin-reduced 6507 eliminated the 6502’s NMI and IRQ pins, so there was no hardware interrupt capability at all. Everything had to be accomplished with software timing and polling.

STA WSYNC will make the CPU halt until the TIA is at the next scan line. This is the chief video synchronization tool on the Atari 2600. You do have to keep track of the number of scanlines you've generated. Horizontal timing takes care of itself but you need to track that too for various effects.


> Also you're not using the stack to pass parameters to routines here either. You might not even use subroutines.

Well that 128 bytes was in the upper half of page 1 and the 6502's stack was in page 1, but at least one source says that the upper half of page 1 was also mapped to page 0.

I'd guess that at least some programs would push to the stack - if nothing else you could save a byte when storing a byte to memory (PHA vs STA #FF).


Yes, if the 128 of the 6537 is mapped starting at 0x100, then one would use ldx, txs, pha, pla, etc., for any possible optimization. S register is a precious resource! Anything to squeeze between those hsyncs.


I worked on a system in which the address decoding for the RAM mapped $0-$1FF to 256 bytes of RAM. Fortunately it was easier to do graphics on that than on the 2600


I'd have to look at the "fan dissassemblies" floating around--you are 100% right and I think Combat did this.


Thanks! Interesting to have this confirmed. I had a think about how this might be used in practice. Probably not huge wins but every byte counted in those days!


Thanks, I almost had to stop reading after seeing multiple fundamental misunderstandings like that. I respect the author and have BMOW stuff for my Mac, but he should throw a big “I’m new to this” caveat around this article.


Geez, it wasn't that bad, but the article is literally wrapped in disclaimers.

"Recently over the holiday break, I became interested in the 2600’s hardware architecture"

"Did I butcher some technical explanation here, or omit important details? Please let me know! I’m just a beginner on this Atari hardware journey, with much still to learn."


Every time you do a JSR $XXXX it's doing dynamic allocation of sorts. The return address is kept on the stack. I honestly have never seen a game of any complexity that doesn't use subroutines (it saves ROM).


No idea whether it meets your personal complexity bar, but 2600 Centipede seemingly doesn't contain a single JSR or RTS: https://github.com/DNSDEBRO/Disassemblies/blob/master/Atari2...


Sure it meets my bar. That's interesting!


> Look at this image of Pitfall, and notice how the tree canopy, tree trunks, ground, pit, and underground cave all show left-right symmetry. These are all built from the playfield (plus additional tricks to be described later). The only sprites are Pitfall Harry, the vine he’s swinging from, the rolling log, and the scorpion.

I read in a similar article that the branches of the trees are too finely detailed to be drawn with the playfield layer. So, they are added on with sprites.


[citation needed] for that assertion. Atari 2600 is extremely limited in the number of sprites available on-screen at the same time. If you've played Pac-Man then you know that 5 is too many, because all of the so-called "ghosts" are actually flickering. They're doing that not for a spooky effect, but because they can't be simultaneously displayed. That sort of sprite flicker doesn't happen in Pitfall!


The sprites can be replicated by adjusting NUSIZn for the player of interest: https://archive.org/details/Atari_2600_TIA_Technical_Manual/...

Looks like the tree branches could be two copies on the left, then two more on the right. Not much going on in that region so plenty of time.

EDIT: presumably this bit: https://github.com/johnidm/asm-atari-2600/blob/8b613f3c4bc80...


Yes, but the limitation is per-scanline, not per frame. Thus, “racing the beam”. The tree branches are above all the other sprites besides the swinging vine.


> If anyone has an idea why the TIA’s designers used LFSRs for this stuff, I’d love to hear about it.

I'm guessing because a counter would have ripple carry (i.e. five extra gate delays when rolling over from 111111 to 000000) or need extra gates for carry lookahead and an LFSR is constant-delay.


The latency notwithstanding, you can implement a counter with an LFSR that hits every unique value in a 2^n-1 sequence with less transistors than a standard counter.


This is the reason. LFSRs are much simpler to implement, and in those days, every transistor saved was very important.

Some early microprocessors used LFSRs instead of a regular counter for their instruction pointer register for this reason: https://news.ycombinator.com/item?id=8375577


I haven't read _Racing the Beam_, but the impression I get is that while the 2600 looks crazy, it actually wouldn't be that difficult or insane to write a graphics kernel to generate the screen layout you need. It's just a matter of keeping track of what line you're on and feeding in the correct data. It seems tedious, but that's what computers are for.

Once the graphics kernel is written, it just a matter of playing with the data that's fed into it.

The real challenge, it seems, it simply that there are so few cycles left over to implement the actual game logic.


> The real challenge, it seems, it simply that there are so few cycles left over to implement the actual game logic.

I haven’t programmed the 2600, but I expect that, in about every program for the 2600, running out of time for your game logic is unavoidable, and will make you go back to your universal, (relatively) easy to use kernel and tweak it to make it just a little bit faster for your particular use case.

Given that that kernel will be written in tight assembly, I expect that more or less amounts to a rewrite. You’ll use your knowledge f the kernel to write a new one.


Even on the NES and I'm sure more modern consoles and maybe any game.

You often round robin game logic.

Here's a fun example Castlevania on the NES by timing your movement precisely you can make it skip certain blocks being loaded. Making it leave a stair case or even a door.

Works because alternate even and odd frames responsible for loading different y coords for level data in blocks.

https://m.youtube.com/watch?v=Dw7NkOzp8tE

It might seem like there's no point now..but even in a 120fps game do you need absolutely everything to update at 120hz?


Believe it or not we still amortize work and have game architecture where the sim updates at a much lower frequency than the game presentation.


You don't really have that many spare cycles per line (unless you have double height pixels), and since there's not many registers either, you really can't get much done with the spare cycles, either.

You really do need to get almost all your game loop to fit into vblank, and spend most of the screen on time driving the graphics. NES and other systems with fancier graphics systems kind of inverted the script; on those you run the game loop during screen on, and mostly transfer data to the graphics unit during vblank.


The other real challenge is that you have to build your entire game using just 128 bytes of memory. Not kilobytes, bytes. You don't have the luxury of having a single scanline of the screen at memory at once. There is no caching of anything. You have just enough memory for a handful of current state variables and that's it.

Luckily your program is running out of ROM so you don't have to load it before you execute, but even with that it's an incredibly tight fit.


Here’s a detailed video by retro game mechanics that shows how the logo graphics of the infamous E.T. game is rendered: https://youtu.be/sJFnWZH5FXc

It seems incredible, very difficult. And remember, back then cartridges only had 8KB of memory.

This means even more hacks are required, if your game needs more than a few levels. Theres a recurring hn post how Pitfall encoded levels in a single byte (bitfield), then used some sort of sequence generator to allow going back and forth among the 255 levels.


Many new games are made with an arm processor built in the cart[0]. I'm not super familiar with how it works but they do some really fancy graphics kernels. Still limited by how the graphic/color registers work and not many cycles to change them all at the right time.

[0] https://forums.atariage.com/topic/330778-part-1-cdfj-overvie...


Despite the arm processor in the cart it barely helps with the kernel and game code in most games. Typically they are older not so powerful Cortex M0 and it is largely tied up emulating the cartridge hardware. It can give modest gains when doing the kernel but most of the fancy graphic kernels are designed as they always have been - with a lot of deep human thought.


Easier said than done. Implementing the game logic during the vertical blank is actually more straightforward.


There's new games being made for the 2600 all the time. A few years ago I got really into it but never made anything worth sharing. However, I found a Twitch stream called ZeroPageHomebrew [0] that I've been watching religiously ever since. They used to just cover 2600 homebrews but they've recently branched into homebrews on other early Atari systems as well. The developers of the featured games are often in chat, sometimes they do call-in interviews. It's on twice a week and they never run out of new things to show.

[0] https://www.twitch.tv/zeropagehomebrew


I look at stuff like this, amazed at the efficiency, and then realize I scoff at a new 32 bit cortex MO chip for shipping with only 64K of ram and 12K of flash at 48 Mhz.


The rules change as you add bits. The binary code size gets bigger and the instruction set natively deals with larger types...so you end up with some inflation by default.

The other thing that starts happening is more buffers and caches appear everywhere as you add I/O, raising your minimum spec. What every system after the 2600 did was add some form of graphics buffer: on the Atari 400/800, with a base config of 16kB it was a mix of a display list program(which automates the scanline tricks of the 2600) and various character and raster modes to give software trade-offs between RAM usage and display fidelity. But even the smallest character mode assumes that you have a few kilobytes available.

8-bit programs, even with larger RAM, are necessarily "right to the point" in terms of what information they are working on. The luxury of larger spec is just in being flexible and allowing more non-essential data to pass through.


Can't remember if I read it here recently, or on reddit, but this conference session with David Crane on Pitfall/Atari is pretty good. He doesn't as technical as some folks may want, but there's some coolness in just hearing him explain this in person. Couple of interesting Q&A bits towards the end too.

https://www.youtube.com/watch?v=MBT1OK6VAIU


One of the best resources to learn Atari 2600 programming are Gustavo's lectures at pikuma.com.

I can't recommend it enough!

Atari 2600 Programming: https://pikuma.com/courses/learn-assembly-language-programmi...


Ok with all this in mind, reconsider Howard Warshaw's notorious ET the Extra-Terrestrial, which he had 5 weeks to write. All the challenges of writing anything at all for the Atari platform, then do it in 5 weeks.

Looking at it from some strange angle, it's a tremendous accomplishment.


This is the game like 200k copies got trashed?


Yup. Howard is quite proud of it. It was a big deal, until folks actually got their copy. Then the criticism started.

Anyway he got his bonus so from his point of view all was well.


> These LFSRs also use two separate clocks, 180 degrees out of phase

Does that just mean they take turns ticking with equal interval between ticks?




The deadline for YC's W25 batch is 8pm PT tonight. Go for it!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: