Related to this is the concept of arrays of structures (AoS) vs. structures of arrays (SoA). Do you store all of the data that makes up one "object" in a single data structure, or do you store all of each "member" of each object in separate flat arrays?
If you need to use all of the "members" of an "object" at once, AoS is more efficient. But if you only need to use 1 or 2 members for a certain task, you will waste lots of memory bandwidth on data that isn't needed for the current operation. In this case, an SoA layout can improve your performance by an order of magnitude, by insuring that only the data that is actually necessary for your task is loaded into cache.
The same principle applies to row vs. column-oriented databases, which obviously are orders of magnitude more sensitive to IO bandwidth usage.
One of the most damning criticisms of OOP, from this perspective, is that it encourages you to wrap up lots of potentially unnecessary data into monolithic "objects" without concern to memory layout or usage patterns, giving you AoS and bloated data structures.
Good points. However, object oriented programming doesn't necessarily mean arrays of objects. I'd disagree that it encourages it in any way.
I realize that's a common paradigm in Java and C++, but it's an architectural decision that you make, not a part of a programming paradigm. The array-of-structs prevalence really stems from C, not object-oriented approaches.
You can do both in a object-oriented manner. For example:
I've always wished for a language that would just let me describe my data logically as a SQL-esque relation, while specifying separately its physical layout (incl. AoS vs. SoA). Tying logical flow to physical storage layout always seemed kinda clunky to me.
Layout of N-dimensional arrays becomes even more important when you start considering on-disk formats, where seek times are relatively slow.
People often talk about out-of-core calculations without fully considering how their data is laid out on disk. If your access pattern doesn't match the on-disk ordering of your large, N-dimensional array, you're in for a large surprise. Accessing a portion of an array that takes milliseconds in RAM can easily take multiple hours if you naively write a large multidimensional array to disk with an ordering that doesn't match your access pattern.
Of course, this is why chunking exists, but surprisingly few people are as familiar with it as they should be.
For example, at work, I often need to use a specific legacy format that's basically a large 3D fortran-ordered array stored on disk. I often get asked why I'm loading the entire thing into memory instead of taking an out-of-core approach. The problem is that our access pattern deals entirely with accessing a single "trace" (z-direction) at a time. For a fortran-ordered array stored on disk, that's the worst-possible access pattern. Each element in a single trace is a long ways apart throughout the entire file. Loading a single trace takes far long than loading the entire file into memory on most (non-SSD) systems.
I've recently been on a bit of an evangelical kick for trying to persuade more people to think about the implications of on-disk layout. (For example: http://geology.beer/2015/02/10/hdf-for-large-arrays/ While that's specifically about HDF5, most of it is explaining memory/disk layout and its implications.)
The great thing about the linear storage of these arrays also helps with distributed systems and reading blocks of bytes.
Lots of interesting little things in here to consider.
Without going ok another tangent this is also highly relates to blas. There is a lot more to this in practice. This is a wonderful introduction to the topic though.
You can also use space-filling curves to produce nice cache behavior under certain assumptions. If you're only needing to view a small, contiguous portion of a very large array, for instance.
In case anyone is interested, Michael Bader has a book called "Space-Filling Curves: An Introduction with Applications in Scientific Computing" that talks about this:
Was hoping this touched on Z-ordering. If you have a processor that can efficiently interleave bits (i.e. combine 0110 and 1011 to get 01101101), you can lay out 2D and 4D arrays in Z-order, which will improve spatial locality (nearby elements in both the X and Y dimensions are also nearby in memory/cache/TLBs).
You can actually efficiently achieve this even on processors without an interleave instruction if you're only walking stepwise -- basically, mask out the odd (resp. even) bits separately, setting the even (resp. odd) bits to 1 before adding 1, or to 0 before subtracting 1, then recombine.
If you need to use all of the "members" of an "object" at once, AoS is more efficient. But if you only need to use 1 or 2 members for a certain task, you will waste lots of memory bandwidth on data that isn't needed for the current operation. In this case, an SoA layout can improve your performance by an order of magnitude, by insuring that only the data that is actually necessary for your task is loaded into cache.
The same principle applies to row vs. column-oriented databases, which obviously are orders of magnitude more sensitive to IO bandwidth usage.
One of the most damning criticisms of OOP, from this perspective, is that it encourages you to wrap up lots of potentially unnecessary data into monolithic "objects" without concern to memory layout or usage patterns, giving you AoS and bloated data structures.