Hacker News new | past | comments | ask | show | jobs | submit login

You're right, I messed that up (though I'll leave it for posterity). I went into it with a bias thinking BMI was slow on Zen, since PDEP is 18 cycles vs 1 on Skylake, much to my disappointment back in the day.

After reviewing the example again, there's no obvious reason why Zen 2 is slower, although it's likely a rare edge case. Too bad there's nothing decent like VTune on AMD platforms.

I remember one session where my choice of temporary register significantly impacted throughput while implementing an unrolled int[] hash fn on my Kaby Lake processor. I never figured out exactly why, but sharp edges do exist even on Intel chips.




This benchmark heavily stresses branch misprediction recovery, so that could be worse on Zen.

Also, I could not reproduce Daniel's results: I got IPC of 1.77 (SKX) or 2.00 (SKL) compared to Daniel's reported 2.80 (SKL, I think), so Intel still better but by a smaller margin. Waiting for clarification on that one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: