> This code only makes people's lives better if many languages and frameworks th...

TinkersW · on Aug 21, 2023

It is written with intrinsics not ASM.

Compilers understand intrinsics and can optimize around them, and CPUs evolve improved SIMD instruction sets at a snails pace.

Intel doesn't even really support AVX512 yet for consumer hardware, and maybe never will, so this code is mostly only good for very modern AMD.

magicalhippo · on Aug 21, 2023

I'm talking about which instructions and idioms are optimal. AFAIK, with intrinsics the compiler won't completely change what you've written.

Back in the days REP MOVSB was the fastes way to copy bytes, then Pentium came and rolling your own loop was better. Then CPUs improved and REP MOVSB was suddenly better again[1], for those CPUs. And then it changed again...

Similar story for other idioms where implementation details on CPUs change. Compilers can respond and target your exact CPU.

[1]: https://github.com/golang/go/issues/14630 (notice how one comments the same patch that gives 1.6x boost for OP gives them a 5x degradation)

bruce343434 · on Aug 21, 2023

What do you mean "optimize around them"? Do you have a godbolt/codegen example of suboptimal intrinsic calls being optimized?