Hacker News new | past | comments | ask | show | jobs | submit login

This is a good way to pick a few 64 bit numbers without duplicates, if 64 bit numbers is what you need.

I didn't see any replies to you mention this part, but Vitter's algorithm works on an unknown number of samples. You can pick a 64-bit number uniformly when you know in advance you have exactly 2^64 choices. But how would you pick samples from a stream with equal probability, without knowing how many items are in the stream? Maybe it's 3, maybe it's 2.493e85, you won't know until you run out, and when you do, you want to have picked some items from the stream with uniform probability.




Incorrect, you do know this number beforehand. The parameter `N` in the code is the upper bound on the number of values, and the parameter `n` is the number of samples.


I don't know which code you mean, but I was referring to Vitter's basic "Algorithm R", which does not depend on knowing the number of samples or an upper bound in advance. https://en.wikipedia.org/wiki/Reservoir_sampling#Algorithm_R




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: