There are plenty of applications where single-threaded clock speed matters, and ...

AaronFriel · on June 22, 2020

The just announced Intel Xeon Cooper Lake top end processor has about 38.5MB of cache. The AMD Rome top end has 256MB of cache.

https://ark.intel.com/content/www/us/en/ark/products/205684/...

https://www.amd.com/en/products/cpu/amd-epyc-7h12

rckoepke · on June 22, 2020

I'm not sure this is the whole story, Intel has twice the L2 cache as AMD but I'm not sure that's enough to make a huge difference.

Epyc 7H12[1]:

- L1: two 32KiB L1 cache per core

- L2: 512KiB L2 cache per core

- L3: 16MiB L3 cache per core, but shared across all cores.

The L1/L2 cache aren't yet publicly available for any Cooper Lake processors, however the previous Cascade Lake architecture provided:

All Xeon Cascade Lakes[2]:

- L1: two 32 KiB L1 cache per core

- L2: 1 MiB L2 cache per core

- L3: 1.375 MiB L3 cache per core (shared across all cores)

Normally I'd expect the upcoming Cooper Lake to surpass AMD in L1, and lead further in L2 cache. However it looks like they're keeping the 1.375MiB L3 cache per core in Cooper Lake, so maybe L1/L2 are also unchanged.

0: https://www.hardwaretimes.com/cpu-cache-difference-between-l...

1: https://en.wikichip.org/wiki/amd/epyc/7h12

2: https://en.wikichip.org/wiki/intel/xeon_platinum/9282

Edit: Previously I showed EPYC having twice the L1 as Cascade Lake, this was a typo on my part, they're the same L1 per core.

deagle50 · on June 22, 2020

Zen 2 has 4MiB L3 per core, 16 MiB shared in one 4-core CCX.

rckoepke · on June 22, 2020

Thanks, I wrote that while burning the midnight oil and didn't double-check the sanity of those numbers. It's too late to edit mine but I hugely appreciate the clarification.

deagle50 · on June 23, 2020

NP. It's still a huge amount of LLC compared to the status quo. Says something about how expensive it really is to ship all that data between the CCXs/CCDs.

ju-st · on June 22, 2020

Intel L3 does not equal AMD L3 cache regarding latencies. Depending on the application this can matter a lot. https://pics.computerbase.de/7/9/1/0/2/13-1080.348625475.png

kllrnohj · on June 22, 2020

You'd need that latency to be significant enough that AMD's >2x core count doesn't still result in it winning by a landslide anyway, and you need L3 usage low enough that it still fits in Intel's relatively tiny L3 size.

There's been very few cloud benchmarks where 1P Epyc Rome hasn't beaten any offering from Intel, including 2P configurations. The L3 cache latency hasn't been a significant enough difference to make up the raw CPU count difference, and where L3 does matter the massive amount of it in Rome tends to still be more significant.

Which is kinda why Intel is just desperately pointing at a latency measurement slide instead of an application benchmark.

AdrianB1 · on June 22, 2020

Cache per tier matters a lot, total cache does not tell much. L1 is always per core and small, L2 is larger and slower, L3 is shared across many cores and access is really slow compared to L1 and L2. In the end performance per watt for a specific app is what matters, that is the end goal.

lend000 · on June 22, 2020

Interesting, that's news to me. Guess Intel just has clock speed then. That's why I still pay a premium to run certain jobs on z1d or c5 instances.

As another commenter pointed out, though, not all caches are equal. Unfortunately, I was not able to easily find access speeds for specific processors, so single-threaded benchmarks are the primary quantitative differentiator.

tobz1000 · on June 22, 2020

Given the IPC gains of Zen 2, the single-threaded gap is closing, and even reversed in some workloads.

And I think Xeon L3 cache tops out at about 40MB, whereas Threadripper & Epyc go up to 256MB.

plantain · on June 22, 2020

Really? 'Entry-level' EPYC's (7F52) have 256MB of L3 cache for 16 cores.

I don't think there's any Intel CPU's with more than 36MB L3?

robin_reala · on June 22, 2020

Xeon 9282s have 77MB: https://ark.intel.com/content/www/us/en/ark/products/194146/...

zrm · on June 22, 2020

77MB for 56 cores. That's that ploy where they basically glued two sockets together so they could claim a performance per socket advantage even though it draws 400W and that "socket" doesn't even exist (the package has to be soldered to a motherboard).

IIRC the only people who buy those vs. the equivalent dual socket system are people with expensive software which is licensed per socket.

ashtonkem · on June 22, 2020

Those applications exist, but not enough to justify Intel’s market cap.

btian · on June 22, 2020

Do you have a source for any of the things you said?

AdrianB1 · on June 22, 2020

Anecdotal: back then when SETI@home was a thing, I was running it on some servers; a 700 MHz Xeon was a lot faster (>50%, IIRC) than a 933 MHz Pentium 3, Xeon had a lot lower frequency and slower bus (100 vs 133MHz), but the cache was 4 times larger and probably the dataset or most of it was running in cache.

tolien · on June 22, 2020

The same happened with the mobile Pentium-M with 1 (Banias) and 2MB (Dothan) cache - you could get the whole lot in cache and it just flew, despite the (relatively) low clock speed. There were people building farms of machines with bare boards on Ikea shelving.