This is just an inner loop optimization, really. The thing to realize about the ...

Sniffnoy · on Aug 9, 2019

> Conditionals inside the loop would blow your performance

But why do you need conditionals? It's just bit operations...

mschaef · on Aug 9, 2019

To pick the bit operation being used. They also have varying number of operands (ranging from just filling a target region with solid black to three operand stuff that merges images, etc.)

Sniffnoy · on Aug 9, 2019

No, I mean... look, you've got four inputs: p, s, d, and f. p, s, and d are one-bit values. f is an 8-bit value that encodes the input->output mappings here. The output is 1-bit.

The output is found simply as f&(1<<((p<<2)|(s<<1)|d)) != 0.

Yeah, if you use the index to map to a bunch of ands and ors and whatnot to execute, you have a conditional. But if you just use the index directly and do bit operations, there's no conditional. What am I missing?

mschaef · on Aug 12, 2019

Different ROP's do different amounts of work (both logical operations and memory accesses). An ROP like BLACKNESS just fills the target with black, where an ROP like SRCAND does logical operations on multiple source values. The dynamic compilation let them truly reduce the inner loop of the BitBlt operation to about as simple as it could be with the given ROP.

There are also format issues. Circa Windows 1.0, you might be likely to have a CGA, Hercules graphics board, or EGA. All three have different memory formats and architectures. (CGA was fairly ordinary, Hercules used an odd line ordering, and EGA had multiplexing hardware on board that let you operate on 32-bits at a time over an 8-bit interface.)

Any approach based on lookup tables would've both taken a lot of memory that wasn't available and introduced extra memory references in the inner loop of what was probably the most performance critical code in the OS from a GUI PoV. (Keep in mind that unlike the Mac, these machines were not originally designed to run a GUI.)

This story is set ten or more years after the design of BitBlt was done, but I think it does nicely illustrate how people were thinking about memory back then. The reason Windows 95 didn't update the seconds on the taskbar clock was to allow it to page out font rendering code on small machines. If you're rendering text every second, you can't really do that.

https://devblogs.microsoft.com/oldnewthing/20031010-00/?p=42...

(And Windows 1.0 was written to run on hardware with 1/16th the memory of Windows 95.)