two words, "NUMA binding". The cross socket NUMA bandwidth is like a tiny straw....

kllrnohj · on June 7, 2020

I was looking at the single-socket results for what I said above specifically to rule out any NUMA differences: https://blog.min.io/content/images/2020/06/single-socket-per...

That really looks like it's just sorting by memory bandwidth. Particularly with Cascade Lake's results vs. Skylake at the ~10 core number, since there's not really any other major difference between those two.

Being that memory bandwidth constrained, and then not being NUMA aware, is also quite likely why the dual-socket results are a disaster.

redis_mlc · on June 8, 2020

> The cross socket NUMA bandwidth is like a tiny straw.

And the Internet is a series of tubes!

StillBored · on June 8, 2020

How about a ballpark.

The unidirectional bandwidth across a two socket system is roughly equal to a single DDR channel. Which is why you see postings on the intel forums about people complaining about their perf falling off a cliff if the hardware decides to flag the directory as requiring cross socket snoop (or whatever its actually doing). So even if you have done a reasonable job keeping your thread+data locally, doing a remote reference, or accidentally running the thread for a short while on the wrong socket can permanently lower the performance. See https://software.intel.com/en-us/forums/software-tuning-perf... for an example about a contented cache line.

Thankfully the latest products are mostly 10.4GT (gold and above) as it appears intel isn't doing as much product segmentation based on UPI link speeds anymore (silver/bronze are still slower though).

https://www.microway.com/knowledge-center-articles/performan...

kllrnohj · on June 8, 2020

> The unidirectional bandwidth across a two socket system is roughly equal to a single DDR channel

Depends on the implementation. Epyc Rome's 2P socket design is 64 PCI-E 4.0 lanes between the two CPUs. That's 128 GB/s bandwidth, which is roughly equivalent to 6-channel DDR4 2666.

StillBored · on June 8, 2020

Yes, I should have been clearer that I was speaking of intel products (which is what the article was talking about) not NUMA in general.