NOTE: This comment I wrote is totally wrong. I haven't had my morning coffee yet. H_n - H_(cn) ≈ constant, not diverging (for 0 < c ≤ 1). :)
Yeah, but the k ~ n/2 case (in which inverse selection and normal selection have the same runtime) is still Ω(n log n) (equiv Ω(k log k)), which is still "slower" than the presented algorithm.
Actually, this comment is correct despite making a similar argument to your incorrect one elsewhere. Using a set to accumulate intermediate results requires k inserts at O(log k) each.
I'm still not sure about this—the amortized time of insertion should be O(1) on any one of the usual hash table-type-structures. We're not inserting a single item to an already-built hash-table, but rather k of them into an empty hash-table, with elements being uniformly drawn and bucketed using a good hash function. All of this should give O(1) average time for insertions.
https://github.com/python/cpython/blob/master/Lib/random.py#...