Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Assuming "stacking memory chips on top of the processor" means local stores for each processor, it's pretty much a given at this point. That's how GPUs work (~16k local), and that's how the PS3 SPUs are (256k local).

It's extraordinarily painful to code for, but hey, all performance optimization is an exercise in caching. So it goes.



Yeah, essentially the processor cores start looking like ccNUMA boxes. The late 1990s called, they want their architecture back :-)

IMHO, it looks like we'll need some smarter memory bus management. If we're looking at the 1990s, anyone remember the crossbar switches SGI used to put in their short-lived x86 boxes? Thoughts on effectiveness?


That's what I was going to post. Use NUMA - each core gets it's own memory. Doesn't linux support NUMA?

It seems to me that each thread already has it's own memory space, just make sure the memory space for the thread is on the same CPU the thread runs on.

It isn't really necessary for each core to be able to access all memory (or at least it'll be way slower to access memory outside it's area).


Difficult to program for in what language? It sounds like a perfect fit for the Actor model and languages, like Erlang, that directly support the idea of partitioned memory and asynchronous communication.


Local store vs. ccNUMA is orthogonal to stacking vs. non-stacking. The local store architecture looks like an evolutionary dead end at this point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: