Hacker News new | past | comments | ask | show | jobs | submit login

This is just an inner loop optimization, really. The thing to realize about the machines Windows 1.0 was originally written to run on is that they were very short on both memory and speed. Almost unimaginably so, by modern standards.

So here, where there's a need to perform operations in bulk as quickly as possible, this sort of extreme optimization makes a bunch of sense. Conditionals inside the loop would blow your performance, and the other options buy back performance with memory that was itself in short supply. So... dynamic compilation threads the needle between these two constraints at the cost of some up front development complexity. (Probably ongoing complexity too, as additional types of graphics boards were released with their own sorts of peculiar optimization strategies.)




> Conditionals inside the loop would blow your performance

But why do you need conditionals? It's just bit operations...


To pick the bit operation being used. They also have varying number of operands (ranging from just filling a target region with solid black to three operand stuff that merges images, etc.)


No, I mean... look, you've got four inputs: p, s, d, and f. p, s, and d are one-bit values. f is an 8-bit value that encodes the input->output mappings here. The output is 1-bit.

The output is found simply as f&(1<<((p<<2)|(s<<1)|d)) != 0.

Yeah, if you use the index to map to a bunch of ands and ors and whatnot to execute, you have a conditional. But if you just use the index directly and do bit operations, there's no conditional. What am I missing?


Different ROP's do different amounts of work (both logical operations and memory accesses). An ROP like BLACKNESS just fills the target with black, where an ROP like SRCAND does logical operations on multiple source values. The dynamic compilation let them truly reduce the inner loop of the BitBlt operation to about as simple as it could be with the given ROP.

There are also format issues. Circa Windows 1.0, you might be likely to have a CGA, Hercules graphics board, or EGA. All three have different memory formats and architectures. (CGA was fairly ordinary, Hercules used an odd line ordering, and EGA had multiplexing hardware on board that let you operate on 32-bits at a time over an 8-bit interface.)

Any approach based on lookup tables would've both taken a lot of memory that wasn't available and introduced extra memory references in the inner loop of what was probably the most performance critical code in the OS from a GUI PoV. (Keep in mind that unlike the Mac, these machines were not originally designed to run a GUI.)

This story is set ten or more years after the design of BitBlt was done, but I think it does nicely illustrate how people were thinking about memory back then. The reason Windows 95 didn't update the seconds on the taskbar clock was to allow it to page out font rendering code on small machines. If you're rendering text every second, you can't really do that.

https://devblogs.microsoft.com/oldnewthing/20031010-00/?p=42...

(And Windows 1.0 was written to run on hardware with 1/16th the memory of Windows 95.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: