Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This code only makes people's lives better if many languages and frameworks that translates latin-1 to utf8 are updated to have this new faster implementation.

Except CPUs evolve and what was once a fast way of doing things may no longer be very fast. And with ASM you got no compiler to generate better targeted instructions.

I've seen many instances where significant performance was gained by swapping out and old hand-written ASM routine with a plain language version.

If you ever add some optimized ASM to your code, do a performance check at startup or similar, and have the plain language version as a fallback.



It is written with intrinsics not ASM.

Compilers understand intrinsics and can optimize around them, and CPUs evolve improved SIMD instruction sets at a snails pace.

Intel doesn't even really support AVX512 yet for consumer hardware, and maybe never will, so this code is mostly only good for very modern AMD.


I'm talking about which instructions and idioms are optimal. AFAIK, with intrinsics the compiler won't completely change what you've written.

Back in the days REP MOVSB was the fastes way to copy bytes, then Pentium came and rolling your own loop was better. Then CPUs improved and REP MOVSB was suddenly better again[1], for those CPUs. And then it changed again...

Similar story for other idioms where implementation details on CPUs change. Compilers can respond and target your exact CPU.

[1]: https://github.com/golang/go/issues/14630 (notice how one comments the same patch that gives 1.6x boost for OP gives them a 5x degradation)


What do you mean "optimize around them"? Do you have a godbolt/codegen example of suboptimal intrinsic calls being optimized?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: