It's easy to buy memory, but hard to buy L2/L3 cache. The whole point of the exercise is to scale more easily on multicore architectures, but it's no good if you blow out the cache thousands of times per second and bottleneck the system on memory accesses.
Additionally, DRAM and VRAM bandwidth are always at premium. Whenever you're making a copy (which you do a lot when objects are immutable), you use memory bandwidth.
This is especially important on mobile, FPGAs and in cases where sheer volume of data is huge. (GPU and big data)
However do you agree that is more easy to solve memory (buying new hardware) than handle with race condition? kkkk
Then, Maybe we need to think about optimize our daily data structures.. I have following a few papers about this.. e.g:
http://dl.acm.org/citation.cfm?id=2658763 http://dl.acm.org/citation.cfm?id=2993251