It’s a good introduction, but it’s a bit disappointing that it ends that way. I’d love to read more about what’s behind the figure and more technical info about how it might work.
This isn’t specific to the M1 but I tap about cache lines in my last QCon presentation (where I also suggested that a 128b cache line wasn’t far away):
However the speed benefits come from a much larger L1 cache and the fact that the ram is in the same chip which will reduce latency that is the benefit for most of it.
The program (instruction) cache is also a lot bigger and has the advantage that as a fixed size isa can be much wider in execution than in x86 but that’s unlikely to be of benefit here, other than perhaps slightly in terms of queuing up multiple outstanding loads.
Yup, my presentation was back in March 2020 and the M1 came out later in the year — sooner than I was expecting, TBH; I thought that it was a couple of years out when I said it :-)