You're right, I messed that up (though I'll leave it for posterity). I went into... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

reitzensteinm on Dec 6, 2019 | parent | context | favorite | on: Instructions per cycle: AMD Zen 2 versus Intel

You're right, I messed that up (though I'll leave it for posterity). I went into it with a bias thinking BMI was slow on Zen, since PDEP is 18 cycles vs 1 on Skylake, much to my disappointment back in the day.

After reviewing the example again, there's no obvious reason why Zen 2 is slower, although it's likely a rare edge case. Too bad there's nothing decent like VTune on AMD platforms.

I remember one session where my choice of temporary register significantly impacted throughput while implementing an unrolled int[] hash fn on my Kaby Lake processor. I never figured out exactly why, but sharp edges do exist even on Intel chips.

BeeOnRope on Dec 6, 2019 [–]

This benchmark heavily stresses branch misprediction recovery, so that could be worse on Zen.

Also, I could not reproduce Daniel's results: I got IPC of 1.77 (SKX) or 2.00 (SKL) compared to Daniel's reported 2.80 (SKL, I think), so Intel still better but by a smaller margin. Waiting for clarification on that one.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact