Let's say that I can get something from RAM in 100 cycles. But if I have 60 threads all trying to do something with RAM, I can't do 60 RAM accesses in that 100 cycles, can I? Somebody's going to have to wait, aren't they?
this would work really well with rambus style async memory if it every got out from under the giant pile of patents
the 'plus' side here is that that condition gets handled gracefully, but yes, certainly you can end up in a situation where memory transactions per second is the bottleneck.
its likely more advtangeous to have a lot of memory controllers and ddr interfaces here than a lot of banks on the same bus. but that's a real cost and pin issue.
the mta 'solved' this by fully dissociating the memory from the cpu with a fabric
I’m not exactly sure what you mean. RAM allows multiple reads to be in flight at once but I guess won’t be clocked as fast as the cpu. So you’ll have to do some computation in some threads instead of reads. Peak performance will have a mix of some threads waiting on ram and others doing actual work.