Well, I disagree with pretty much everything in the claims.
First, most real unoptimised code faces many issues before memory bandwidth. During my PhD, the optimisation guys doing spiral.net sat nextdoor and they produced beautiful plots of what limits performance for a bunch of tasks and how each optimisation they do removes an upper bound line until last they get to some bandwidth limitation. Real code will likely have false IPC dependencies, memory latency problems due to pointer chaising or branch mispredictions well before memory bandwidth.
Then the database workload is something I would consider insanely optimized. Most engines are in fierce performance competition. And normally they hit the memory bandwidth in the end. This probably answers why the author is not comparing to EPYC instances that have the memory bandwidth to compete with Graviton.
Then the claims that they choose not to implement SMT or to use DDR5 are both coming from their upstream providers.
Wouldn't SMT be a feature that you are free to use when designing your own cores? I'm assuming Amazon has an architectural license (Annapurna acquisition probably had them, this team is likely the Graviton design team at AWS). So who is the upstream provider? ARM?
And if they designed the CPU wouldn't they decide which memory controller is appropriate? Seems like AWS should get as much credit for their CPUs as Apple gets for theirs.
Bottom line for Graviton is that a lot of AWS customers rely on open source software that already works well on ARM. And the AWS customer themselves often write their code in a language that will work just as well on ARM. So AWS can offer it's customers tremendous value with minimal transition pain. But sure, if you have a CPU-bound workload, it'll do better on EPYC or Xeon than Graviton.
First, most real unoptimised code faces many issues before memory bandwidth. During my PhD, the optimisation guys doing spiral.net sat nextdoor and they produced beautiful plots of what limits performance for a bunch of tasks and how each optimisation they do removes an upper bound line until last they get to some bandwidth limitation. Real code will likely have false IPC dependencies, memory latency problems due to pointer chaising or branch mispredictions well before memory bandwidth.
Then the database workload is something I would consider insanely optimized. Most engines are in fierce performance competition. And normally they hit the memory bandwidth in the end. This probably answers why the author is not comparing to EPYC instances that have the memory bandwidth to compete with Graviton.
Then the claims that they choose not to implement SMT or to use DDR5 are both coming from their upstream providers.