I was just re-reading this yesterday! I think a lot of the Monte Carlo rendering people know about this family of algorithms, and the idea is pretty simple. I’m not sure why it’s described from the start as picking k samples though, I find it a lot easier to understand and see how it works when you start with k=1.
Btw, it’s fun to work out the probabilities of picking a sample to prove how this algorithm works, and that it’s uniform. Big piles of factorials that all cancel out.
Reservoir sampling requires you to maintain a reservoir of n items that will be your sample. This can be prohibitive if the sample size itself is huge. The algorithm described here decides (commits) whether an item is to be sampled as it sees it (i.e., online), while consuming a constant amount of memory. It differs from the usual reservoir sampling approach in that it cannot go back and "unsample" an item it already selected (whereas the crux of a reservoir sampling approach is evicting current candidate items and replacing them with new items).
I was just re-reading this yesterday! I think a lot of the Monte Carlo rendering people know about this family of algorithms, and the idea is pretty simple. I’m not sure why it’s described from the start as picking k samples though, I find it a lot easier to understand and see how it works when you start with k=1.
Btw, it’s fun to work out the probabilities of picking a sample to prove how this algorithm works, and that it’s uniform. Big piles of factorials that all cancel out.
Pps the other super rad algorithms that aren’t well knows are the alias method for simulating a discrete distribution https://en.wikipedia.org/wiki/Alias_method
...and the suffix array, which is crazy surprisingly useful https://en.wikipedia.org/wiki/Suffix_array