Hacker News new | past | comments | ask | show | jobs | submit login

I think you got it all wrong, memcpy comparison is given as a baseline.

It just shows even if you do not do anything (just copy the memory from place A to place B), you have some CPU usage baseline per GB. So this kind of saying: with this algorithm, you have very minimal CPU usage.

This algorithm is not doing excessive memory usage and Idling CPU because of memory bandwith, it is just doing very minimal CPU usage.




avx is extremely power hungry to a point the cpu downclocks itself. it might be worse than regular instructions even.

memcpy with wide instructions has been an issue for libc.


From the same author, https://lemire.me/blog/2018/08/15/the-dangers-of-avx-512-thr... tl;dr Isn't that extreme.


the test is on server grade cpu (read low clocks, great vrms) and the conclusion is: you have to use multiple cores to be in trouble.

current intel desktop processors have 'avx offset' which would suck if kicks on for mild base64.

overall wide instructions are nice and dandy and obliterate microbenchmarks but they may incur hidden costs, especially on low power cpus (laptops)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: