"If you want a flat memory space you have to guarantee an access to any memory address in less than X cycles otherwise you have a NUMA architecture[1]"
There is nothing wrong with NUMA (well, ccNUMA, but today that's a given). Even a simple modern two socket server is a NUMA machine.
Anyways, as I've commented elsewhere, I'm not arguing that shared memory is practical today on a large HPC cluser.
> Anyways, as I've commented elsewhere, I'm not arguing that shared memory is practical today on a large HPC cluser.
I think the point that was being made was that it'll never be practical purely for physical reasons. Any physical separation means that light takes a certain amount of time to travel and no known law of physics will let you circumvent that... A distance of a foot will always incur a latency of ~1ns (at best), so our models must account for latency. (At some point -- it's not obvious that we've reached the end of how compact a computer can be, but there is a limit where you just end up with a tiny black hole instead of computer.)
I don't get it, our models have been accounting for latency for the last 30 years at least. We routinely use three level of caches and higly out of order memory accesses for trying to make latency manageable.
Now it is possible that our best coherency protocols simply aren't effective at high latencies, but that doesn't mean we can't come up with something workable in the future. Is there any no-go theorem in the field?
There is nothing wrong with NUMA (well, ccNUMA, but today that's a given). Even a simple modern two socket server is a NUMA machine.
Anyways, as I've commented elsewhere, I'm not arguing that shared memory is practical today on a large HPC cluser.