I think you got it all wrong, memcpy comparison is given as a baseline. It just ...

xxs · on Nov 6, 2019

avx is extremely power hungry to a point the cpu downclocks itself. it might be worse than regular instructions even.

memcpy with wide instructions has been an issue for libc.

sitkack · on Nov 6, 2019

xxs · on Nov 6, 2019

the test is on server grade cpu (read low clocks, great vrms) and the conclusion is: you have to use multiple cores to be in trouble.

current intel desktop processors have 'avx offset' which would suck if kicks on for mild base64.

overall wide instructions are nice and dandy and obliterate microbenchmarks but they may incur hidden costs, especially on low power cpus (laptops)