Hacker News new | past | comments | ask | show | jobs | submit login

> There is no Apple secret sauce.

Except for secret sauce like super wide instruction decode and enough registers to keep all their execution units filled[0], sure I guess there's no secret sauce.

Caches are only useful when they're serving execution units and Apple packed their chips with them. That's special sauce. If it wasn't special then every ARM chip would have the same levels of performance. It's not like the M1 was Apple's first chip. The A-series have been kicking the shit out of other ARM chips for almost a decade. If Apple didn't have any special sauce in their chip designs this wouldn't have been the case. It's not like Qualcomm doesn't have good chip designers and hasn't tried to compete with Apple's chips.

[0] https://news.ycombinator.com/item?id=25257932




Super wide instruction decode won't help you much unless you're able to feed and retire those instructions at a consistent pace. This means being able to keep you ALU busy and for that to happen, there's plenty of problems one has to solve but two major bottlenecks in CPU design are (1) branch-prediction in the CPU frontend and (2) hiding the memory latency in the CPU backend. Both of those are tightly coupled to the instruction- and data-cache design.

Coincidentally, both of those caches in Apple M design are unusually large - 192KB for instruction cache size and 128KB L1 data cache size - per core (!). The same goes for L2 cache size - 3MB per (performance) core.

When compared to bleeding edge _server_ CPUs from AMD and Intel, it's crazy to see that those figures are by several _magnitudes_ larger in the Apple M design. E.g. Zen3 Epyc - 32KB of instruction cache size, 32KB of L1 data cache size and 512KB L2. Intel Xeon Gold - 32KB of instruction cache size, 64KB of L1 data cache size and 1.25MB of L2.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: