Hacker News new | past | comments | ask | show | jobs | submit login

If you're using AVX instructions make sure to insert a vzeroupper after you're done, otherwise you take a big speed hit transitioning between SSE and AVX. The compiler will generate these automatically for AVX instructions it is emitting:

https://godbolt.org/g/jNXJyA

Here's an intrinsic for it:

http://technion.ac.il/doc/intel/compiler_c/main_cls/intref_c...




Little known fact about AVX: every time you use an AVX instruction Intel processors reduce the core frequency by a few percents for about a millisecond to avoid overheating (I guess because they need to turn on an otherwise unused functional unit).

As a result, if you sprinkle some AVX instructions in your code but not enough to cause a significant efficiency win you might actually end up reducing your capacity (and affecting latency).


Can you provide a citation for that or a link to more reading? I'm not asking because I don't believe you, but because I would like to learn more. :-)


Sure, see this doc from Intel: https://computing.llnl.gov/tutorials/linux_clusters/intelAVX...

"Intel AVX instructions require more power to run. When executing these instructions, the processor may run at less than the marked frequency to maintain thermal design power (TDP) limits."

...

"When the processor detects Intel AVX instructions, additional voltage is applied to the core. With the additional voltage applied, the processor could run hotter, requiring the operating frequency to be reduced to maintain operations within the TDP limits. The higher voltage is maintained for 1 millisecond after the last Intel AVX instruction completes, and then the voltage returns to the nominal TDP voltage level."




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: