This turns out to be a lot of assembly macros to help write one x86 assembly. ht...

anonymoushn · 2024-03-26T14:01:33 1711461693

So far intrinsics supplemented with functions that wrap single instructions cover my needs. The thing I occasionally want to be easier is manually configurable unrolling. For example if you have some kernel that processes the input in 3 streams, or that loads 5 registers worth of input from a single stream and processes all of those in the same way, the resulting code duplication is a useless place where typos can give you a bad time. A macro system slightly different from the one C actually has could let you get rid of the duplication and just change the constant to control the unrolling.