From what I have read it appears that the cores we typically run today, use snooping, constantly listening for interesting changes to other core's caches. Clearly this doesn't scale, you don't want to listen to cache traffic of 200,000 cores. So they have a centralised 'directory' to manage cache coherency. The directory system has higher latency, presumably every load/store has to send a message to the potentially distant cache directory manager.
Does anyone really know this stuff, I would be interested to hear a more knowledgeable take on it :)
I find myself thinking that the pattern of computing is to first implement an idea in discreet boxes, and then translate that into regions on a IC.
Just observe the path from Mainframe, via minicomp, to todays SoCs.
And as i was reading your comment i found myself thinking about how to get multiple servers to talk to a common datastore, and how that looks eerily similar to this cache coherence issue.
To wander into the weird for a bit, the phrase "as above, so below" keeps going through my mind while contemplating all this.
IIRC the L3 cache of a multicore cpu effectively works as a directory for cache coherence, at least on Intel machines which have fully inclusive caches.
https://en.wikipedia.org/wiki/Cache_coherence
From what I have read it appears that the cores we typically run today, use snooping, constantly listening for interesting changes to other core's caches. Clearly this doesn't scale, you don't want to listen to cache traffic of 200,000 cores. So they have a centralised 'directory' to manage cache coherency. The directory system has higher latency, presumably every load/store has to send a message to the potentially distant cache directory manager.
Does anyone really know this stuff, I would be interested to hear a more knowledgeable take on it :)