"(Stated more formally: given non-negative integers k and n with k <= n, generat...

LolWolf · on Sept 13, 2019

Probably the hardest things are the space and time requirements. If you relax e.g. the O(k) time requirement to O(k log k), the problem essentially becomes trivial.[1]

More generally, though, what solution are you thinking of? (The problem, as stated, is actually fairly difficult! As far as I can tell...)

---

[1] See: https://en.wikipedia.org/wiki/Coupon_collector's_problem . The solution is then to draw repeatedly and then check if this card has been drawn.

fergie · on Sept 13, 2019

I see- in that case I need to read up on O(k) vs O(k log k)

tromp · on Sept 13, 2019

We can compute a random k-size subset of {0..n-1} as rndsubset(0,k,n), where

rndsubset(i,0,n) = {}

rndsubset(i,k,n) = {i} union rndsubset(i+1,k-1,n) with probability k/(n-i), rndsubset(i+1,k,n) otherwise

This however takes on average n/k time per element, which is prohibitive when n >> k.

The probability of {h} being the next chosen element in rndsubset(i,k,n) is k/(n-h) * Prod_{i<=j<h} 1-k/(n-j), which is sampled directly in constant time by Vitter.

rini17 · on Sept 13, 2019

Trivial algorithms (like one implemented in Python) slow down when k is close to n. Or when n is large, then it is not possible to use bitmap to mark dealt cards.

avip · on Sept 13, 2019

The python algo does not get slow when n is large. The k~n case obviously calls for inverse selection.

https://github.com/python/cpython/blob/master/Lib/random.py#...

LolWolf · on Sept 13, 2019

NOTE: This comment I wrote is totally wrong. I haven't had my morning coffee yet. H_n - H_(cn) ≈ constant, not diverging (for 0 < c ≤ 1). :)

Yeah, but the k ~ n/2 case (in which inverse selection and normal selection have the same runtime) is still Ω(n log n) (equiv Ω(k log k)), which is still "slower" than the presented algorithm.

yorwba · on Sept 13, 2019

Actually, this comment is correct despite making a similar argument to your incorrect one elsewhere. Using a set to accumulate intermediate results requires k inserts at O(log k) each.

LolWolf · on Sept 13, 2019

I'm still not sure about this—the amortized time of insertion should be O(1) on any one of the usual hash table-type-structures. We're not inserting a single item to an already-built hash-table, but rather k of them into an empty hash-table, with elements being uniformly drawn and bucketed using a good hash function. All of this should give O(1) average time for insertions.

airstrike · on Sept 13, 2019

What's preventing someone from implementing a non-trivial algorithm in Python?

rini17 · on Sept 13, 2019

That was not at all what I meant.