Or as it was cleverly put before the author: "Primary rays cache; secondary rays thrash"
Some interesting things to note, at least from my point of view:
* GPUs should not be considered like the next generation CPUs at this stage.
* Trying to take advantage of CUDA/OpenCL not only requires redesigning your algorithm and altering your data structures, but also the folly of implementing in software what is already available in hardware.
* Thread shared memory on nVidia cards isn't exactly similar to a cache. There's also a lot of papers in the last two years that speak of nothing else but altering algorithms to make better use of CUDA, because for anything else but the simplest of raytracers on a few objects, the situation gets really bad.
Very nice post, though. Neatly organized and well written, I really enjoyed it.
Nothing. In fact, everyone does this at least to first order; you need to do bundles of primary rays at a time to take advantage of their spatial (and thus, memory access pattern) coherence. The problem is that the secondary rays are not a uniform grid like the primary rays, and they could be pointing any which way, depending on scene geometry.
What you can do is attempt to group up a bunch of secondary rays that appear to be pointing roughly the same direction (e.g., if their primary rays all reflected off the same flat, specular object), and do them in a batch, exploiting their spatial coherence. Whether the process of finding coherent secondary rays is less costly than just processing secondary rays in the same order as their primary rays, again, depends on scene geometry.
Some interesting things to note, at least from my point of view:
* GPUs should not be considered like the next generation CPUs at this stage.
* Trying to take advantage of CUDA/OpenCL not only requires redesigning your algorithm and altering your data structures, but also the folly of implementing in software what is already available in hardware.
* Thread shared memory on nVidia cards isn't exactly similar to a cache. There's also a lot of papers in the last two years that speak of nothing else but altering algorithms to make better use of CUDA, because for anything else but the simplest of raytracers on a few objects, the situation gets really bad.
Very nice post, though. Neatly organized and well written, I really enjoyed it.