A bit off topic, but I bet someone here knows. When running an EC2 instance and ...

nisten · on June 28, 2020

Yeah that is true, you are sharing L3 cache. In order to mitigate some of recent intel issues I think AWS actuallly has their own chip now on newer motherboards to handle the hypervisor duties securely.

Otherwise, they'd do it in software patches for older CPUs and take the performance hit of the patch.

I'm not sure how much the hypervisor would reserve off of the L3, it is likely to be free for all however you'd still have quite a bit of dedicated L2 and L1 on most xenons. With AMD's first gen EPYC it's a little bit different because clusters of cores share a cache and you can get weirdly high latencies depending on which cores you're using, (i.e. cores 8 and 9 being too far apart)

Also according to this anandtech article, the average total CPU load for physical aws machines is ~60% and is actively balanced out by them. And yes, running benchmarks on a machine without noisy neighbors yields very significant improvements, up to 2x better on the benchmark scores. They measured this by comparing renting out all the cores of a machine vs only renting out the 4 or however they needed .

https://www.anandtech.com/print/15578/cloud-clash-amazon-gra...

I'm assuming they'd put a CPU into sleep/hibernate mode in order to save power instead of having it only run at 5% utilization.

lend000 · on June 28, 2020

Without any dedicated hypervisor tricks, can't a typical L3 cache eviction algorithm also evict memory that is assigned to another core and currently residing in its L2 or L1 cache? (Thereby flushing even the higher level caches if another core is really noisy.)

nisten · on June 28, 2020

I'm not entirely sure, it looks like for Skylake CPUs the L3 cache is no longer inclusive but instead acts as an extension of the per core L2 cache.

https://www.anandtech.com/show/11550/the-intel-skylakex-revi...

I remember a while ago reading about storing your encryption keys in L2 instead of ram and deliberate "abuse" of the L3 cache on VPS hosts however can't find that article and haven't kept up with the news on it.

Given that the intel cpu patches have reduced CPU performance by ~15% ( again sorry don't have the exact source) I'd say there has been quite significant changes in cache management in the name of security.

lend000 · on June 28, 2020

Thanks for the info. It's a dilemma when allocating instances because I want the full per-core performance but I don't need a full socket's worth of cores, so I just have to hope my neighbors aren't running huge jobs all the time.

Keyframe · on June 28, 2020

Sounds like a textbook example of where and how a side channel attack would look like.