I think generally the threads are spending a lot of time waiting on memory. It c...

AnimalMuppet · on Aug 30, 2023

Let's say that I can get something from RAM in 100 cycles. But if I have 60 threads all trying to do something with RAM, I can't do 60 RAM accesses in that 100 cycles, can I? Somebody's going to have to wait, aren't they?

convolvatron · on Aug 30, 2023

this would work really well with rambus style async memory if it every got out from under the giant pile of patents

the 'plus' side here is that that condition gets handled gracefully, but yes, certainly you can end up in a situation where memory transactions per second is the bottleneck.

its likely more advtangeous to have a lot of memory controllers and ddr interfaces here than a lot of banks on the same bus. but that's a real cost and pin issue.

the mta 'solved' this by fully dissociating the memory from the cpu with a fabric

maybe you could do the same with cxl today

dan-robertson · on Aug 30, 2023

I’m not exactly sure what you mean. RAM allows multiple reads to be in flight at once but I guess won’t be clocked as fast as the cpu. So you’ll have to do some computation in some threads instead of reads. Peak performance will have a mix of some threads waiting on ram and others doing actual work.

tgv · on Aug 30, 2023

This processor is out of my league, but do you have any idea how a program would use that optimally? How do you code for that?