Or consider Air Strike Patrol, where a shadow is drawn under your aircraft. This is done using mid-scanline raster effects, which are extraordinarily resource intensive to emulate.
So what's really going on here is that the emulator must emulate not only the SNES hardware, but also the television. Video game emulators have had to deal with this for a long time, to varying and increasing levels of accuracy. Televisions (especially analog CRTs) have quite a bit of emergent behavior in processing the display input that is not easily captured and replicated by your typical frame buffer. Interlacing is a major such phenomenon; most emulators still simply treat the 60 fields per second as 60 distinct frames rather than interlacing them. (And younger players are used to seeing the games that way, never having played on original console and TV hardware.)
The ultimate example of this effect occurs in emulating games that originally used a vector CRT. An emulator writing to a raster frame buffer simply can't replicate the bright, sharp display of a real Asteroids or Star Castle or Battlezone machine.
TV behavior even goes beyond electronics. Consider the characteristics of the phosphor coating and the persistence time between refreshes. Some games made use of effects where that characteristic mattered, so if you want to emulate that with high fidelity, yes that will take a lot of CPU cycles.
That particular problem isn't a case of emulating the television, but rather accurately emulating the console's video hardware and its interactions with the rest of the system. If one were simply interested in emulating the television's behaviour then you could construct a frame buffer based on the visible sprites and postprocess that (possibly in conjunction with several preceding fields).
If the console allowed sneaky things to be done on each raster line (like changing the colours) then constructing that frame buffer becomes considerably more resource intensive, as it must now probably be done line by line with the correct timing wrt. the rest of the emulation.
If you could pull tricks mid scanline (presumably through careful timing after an interrupt) then the problem will be a whole lot worse, though I'd guess it can be reduced by recording changes to the relevant hardware registers along with timestamps in the emulation so that the timing of your scanlines' construction becomes less of an issue.
You're correct, this particular problem can be handled with sufficiently sophisticated frame buffer logic. I was generalizing from that to other concepts where emulating the television or its signal processing would be required.
I'll give you another example. On the Atari 2600 game console, the vertical sync is software controlled. The software is responsible for enabling the vertical sync pulse. This can be done 60 times per second as standard -- or you could play tricks with it. Suppose you strobe it at a different or even irregular rate. On an analog TV, the picture starts rolling vertically. That breaks way outside the sandbox of a framebuffer, with signal being displayed in overscan areas, and during the normally-blank retrace interval resulting in ghosting effects. (No commercial game did that, but it's been done in tech demos, and conceivably a horror game could do it intentionally for mood.) To produce that same behavior on framebuffer-based hardware, you need to emulate or at least approximate the workings of a TV's vertical sync logic, none of which appears in the console itself.
> I'd guess it can be reduced by recording changes to the relevant hardware registers along with timestamps in the emulation so that the timing of your scanlines' construction becomes less of an issue.
This would be possible in most cases, but the SNES throws another problem at you: the video renderer can set flags that can affect the operation of the CPU. Range/tile over sprite flags, H/Vblank signals, etc.
In my model, I chose to forgo timestamps because they are very tricky to get right with subtle details. Instead, I render one pixel at a time, but I use a cooperative threading model. Whenever the CPU reads something from the PPU, it checks to see if the CPU is currently not caught up with the PPU. If so, it will switch and run the PPU. The PPU does the same with respect to the CPU.
Even with that, all the extra overhead of being -able- to process one pixel at a time knocks the framerate from ~240fps to ~100fps. And it fixes maybe a half-dozen minor issues in games for all that overhead.
This is because scanline-based renderers are notoriously good at working around these issues. There are lots of games that do a mid-scanline write in error, but only a few that do more than one on the same scanline. So all you have to do is make sure you render your line on the correct 'side' of the write. We actually took every game we could find with this issue, and averaged out the best possible position within a line to run the highest number of games correctly. Other emulators take that further and can make changes to that timing on a per-game basis to fix even more issues.
Air Strike Patrol's shadow is actually the only known effect where two writes occur on the same line, and there is no one point that would render the line correctly.
Is anyone else reminded of the "copper" effect people would do back in the day where they would cycle the colors of a screen in sync with the horizontal refresh of the monitor to create bars of color the oscillate up and down in really cool patterns?
Indeed. I did copper too, in DOS x86 assembly. Some programs used it to practical effect: you can exceed 256 colors in an 8-bit framebuffer by swapping palette values mid-screen or mid-scanline.
In fact, every Atari 2600 game is a copper effect. The 2600's graphics chip is one-dimensional, working with only one scanline at a time. To display a picture, the software must run in lockstep as the electron beam traces down the screen, changing sprite bitmaps and colors and positions each scanline as appropriate. In other words, the 2600 literally uses the phosphor on the physical TV screen as the frame buffer. No surprise that this was tricky to emulate, and why 2600 emulators took longer to reach usable compatibility levels than emulators for the later more powerful Nintendo systems.
> Indeed. I did copper too, in DOS x86 assembly. Some programs used it to practical effect: you can exceed 256 colors in an 8-bit framebuffer by swapping palette values mid-screen or mid-scanline.
This was used on the BBC Master enhanced version of Elite (and some other games or the era) to get a best-of-both-worlds choice of the Beeb's display modes. The bottom third of the screen was in mode 2 (low res, 4 bit colour depth (well, 3 bit plus flash-or-not)) to get the higher colour variation for the control displays and the top two thirds were in mode 1 (twice the resolution but only 2-bit colour depth) to get the higher resolution for the wireframe graphics.
I made copper bars once, just for having done them. I guess by the time I got into democoding (96/97 or so), they didn't impress as much anymore. It was cool that they were technically full-colour in a 256c screenmode, but apart from that they were just horizontal coloured bars to me :)
However there was another very useful trick to changing colours wrt sync. Basically you wanted to have all the gfx drawing done before the vertical retrace (which is quite a bit longer than the horizontal one), then flip the buffer (during) so you'd get a flickerless display at full framerate. Now if you'd change palette colour 0 (background, including screen edges) to red right after the flip, and then back to black again after your drawing routines are done and you begin waiting for the vsync again, you got to see the top of your screen's background red, up until some percentage of the screen height.
This was basically your performance meter. Code more complex routines and the red area becomes bigger. Add even more calculations, it gets to the bottom of the screen, and when it gets too far you won't be done calculating before the next vsync and your framerate drops to half.
Some times I even micro-optimized bits of assembly code by marking the position with pencil on a post-it on the side of the monitor to see if switching around some instructions would make it go up or down a few millimeters :) It really was that stable (given you did exactly the same calculations every frame--which is often the case for demos, but probably not for games). That is, until Windows came along: multitasking meant you were going to miss that vsync every once in a while and the red bar jumping up and down like crazy.
The proper term for this effect is "raster bars", from how you traditionally did the effect by waiting for the raster line register to hit a certain vertical position on the screen, and then changing f.e. the background color of that scanline, then wait for the next line, change the color again etc. The name "copper bars" came out of the Amiga scene from how you could easily and without involving the CPU do this effect (and much more) on the Amiga using one of its co-processors, nicknamed the "Copper".
So what's really going on here is that the emulator must emulate not only the SNES hardware, but also the television. Video game emulators have had to deal with this for a long time, to varying and increasing levels of accuracy. Televisions (especially analog CRTs) have quite a bit of emergent behavior in processing the display input that is not easily captured and replicated by your typical frame buffer. Interlacing is a major such phenomenon; most emulators still simply treat the 60 fields per second as 60 distinct frames rather than interlacing them. (And younger players are used to seeing the games that way, never having played on original console and TV hardware.)
The ultimate example of this effect occurs in emulating games that originally used a vector CRT. An emulator writing to a raster frame buffer simply can't replicate the bright, sharp display of a real Asteroids or Star Castle or Battlezone machine.
TV behavior even goes beyond electronics. Consider the characteristics of the phosphor coating and the persistence time between refreshes. Some games made use of effects where that characteristic mattered, so if you want to emulate that with high fidelity, yes that will take a lot of CPU cycles.