It’s depressing how many comments here are quick to dismiss the benchmarking/article. Yes, yes, memory bandwidth, I/O, and cache hierarchies are all important, but Daniel Lemire is one of the top people in the world when it comes to optimizing algorithms for modern CPUs. Do you like search engines? Lemire has made them significantly faster. He is often able to take code/algorithms that already seem fast, and make them much faster. He’s recently branched out beyond search engine core algorithms into some aspects of string processing (base64, UTF-8 validation, JSON parsing).
In this blog post, he’s paying attention to IPC because he’s typically working with inner loops where the data’s being delivered from RAM to L1 as efficiently as possible.
I have plenty of respect for Daniel (and you can even find me below in this discussion defending some aspects of this test), but I too find some fault with this article.
The main problem I have is that the claim in dispute seems to be that Zen 2 has comparable (perhaps slightly higher) IPC to Skylake, and then Daniel picks out two benchmarks and shows that Skylake has higher IPC than Zen 2... proving what exactly?
Contradicting people who said that Zen 2 had a higher IPC on every benchmark? Yes, those people were wrong, but it's easy to prove a point if you pick an argument almost no one was making it in the first place.
In the same (second) benchmark that he selected the "basic_decoder" sub-benchmark, but there is also another benchmark "bogus" which tests the empty function calling time, and this case I measure a reversed scenario: Intel at IPC 2.25 and AMD at 3.43. So should we now say that Intel IPC is "quite poor"?
Ha, I’m not referencing _your_ comments here, and I am curious about how you couldn’t reproduce his results; he’s quick to publish and seems happy to correct so we’ll see. I’m referencing more the other comments here saying things about the benchmarks not being realistic because good benchmarks need to have a mix of tasks, like memory and I/O—this ain’t a Phoronix post, folks.
I started reading through your blog last night. I’m slowly trying to learn how to go from being a programmer who doesn’t write slow code, to one who writes fast code, so absorbing a lot about vectorization and ILW, etc.
He's corrected the results (possibly even before I wrote the post you responded to this AM): they originally showed Intel at 2.8 IPC in the second table, they now show 2.1.
I measured 2.0, but I guess Daniel is using docker w/ a slightly different compiler version, so I think it the gap is sufficiently small that we can declare "close enough". I also measured quite different numbers for SKL (2.0) vs SKX (1.7), which is quite odd given the non-memory intensive behavior of the test: in that scenario, I'd expect SKL and SKX to perform identically.
>The main problem I have is that the claim in dispute seems to be that Zen 2 has comparable (perhaps slightly higher) IPC to Skylake, and then Daniel picks out two benchmarks and shows that Skylake has higher IPC than Zen 2.
Exactly my thoughts.
Casually hanging out sites that are bias towards AMD, even those never claimed Zen 2 has same or higher IPC than Intel.
In this blog post, he’s paying attention to IPC because he’s typically working with inner loops where the data’s being delivered from RAM to L1 as efficiently as possible.