Hacker News new | past | comments | ask | show | jobs | submit login

It is not really my speciality, but there are equivalent things on OoO Land. Although CPUs are very good at running reasonable code efficiency, peak FP performance is still the real of hand written ASM or at the very least copious use of intrinsics.

It can also be very microarchitecture specific. Because FP code often need significant unrolling, the number of architectural registers needed to store partial results can be a bottleneck, especially if the compiler doesn't do a perfect job.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: