Hacker News new | past | comments | ask | show | jobs | submit login

We can compute a random k-size subset of {0..n-1} as rndsubset(0,k,n), where

rndsubset(i,0,n) = {}

rndsubset(i,k,n) = {i} union rndsubset(i+1,k-1,n) with probability k/(n-i), rndsubset(i+1,k,n) otherwise

This however takes on average n/k time per element, which is prohibitive when n >> k.

The probability of {h} being the next chosen element in rndsubset(i,k,n) is k/(n-h) * Prod_{i<=j<h} 1-k/(n-j), which is sampled directly in constant time by Vitter.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: