Ha, I’m not referencing _your_ comments here, and I am curious about how you couldn’t reproduce his results; he’s quick to publish and seems happy to correct so we’ll see. I’m referencing more the other comments here saying things about the benchmarks not being realistic because good benchmarks need to have a mix of tasks, like memory and I/O—this ain’t a Phoronix post, folks.
I started reading through your blog last night. I’m slowly trying to learn how to go from being a programmer who doesn’t write slow code, to one who writes fast code, so absorbing a lot about vectorization and ILW, etc.
He's corrected the results (possibly even before I wrote the post you responded to this AM): they originally showed Intel at 2.8 IPC in the second table, they now show 2.1.
I measured 2.0, but I guess Daniel is using docker w/ a slightly different compiler version, so I think it the gap is sufficiently small that we can declare "close enough". I also measured quite different numbers for SKL (2.0) vs SKX (1.7), which is quite odd given the non-memory intensive behavior of the test: in that scenario, I'd expect SKL and SKX to perform identically.
I started reading through your blog last night. I’m slowly trying to learn how to go from being a programmer who doesn’t write slow code, to one who writes fast code, so absorbing a lot about vectorization and ILW, etc.