Hacker News new | past | comments | ask | show | jobs | submit login

I see. In particular the log(k) doesn't come from repeat attempts, it comes from the time needed to search through your list of drawn elements to see if you've seen the chosen element before.



See my edit! Oops.

Anyways, if you were to keep it in some hash-based structure, you could check membership! (But I'm sure you already knew that :)


No structure can check membership in O(1) time if n is big enough.


In general, assuming that we have constant-time operations on arbitrarily large numbers (which we almost universally assume when doing these analyses, except in very specific fields), then hash structures do check membership and add items in O(1) time (since the time spent increasing the structure size is amortized fairly quickly).

Additionally, the only problem with this approach is the extra O(k) space taken up by the structure, which doesn't meet the condition if the result is to be reported in a streaming fashion (but this is not the case in most practical applications)! For comparison, Vitter's is able to do this streaming output in O(1) space.


> then hash structures do check membership and add items in O(1) time (since the time spent increasing the structure size is amortized fairly quickly).

Hashtables can never guarantee constant time operations. It is expected O(1) time, but it is completely possible that everything hashes to the same bucket and you get O(n) operations, no matter how much you rehash.


The algorithm is a randomized algorithm and we're assuming we're picking a good hash. The probability of that event is exponentially vanishing, so, in expectation (which is the runtime we're looking for, as we are analyzing a randomized algorithm based on rejection sampling) it is O(1).

In the worst case analysis of this rejection-sampling algorithm, the time for table look up and insertion doesn't even factor in since you'll have infinite run-time.


> Hashtables can never guarantee constant time operations.

A bit mask is a form of hash table and is guaranteed O(1) lookup for O(lg(Universe)) space.

When you have fixed universes, all kinds of options open up. I personally love vEB Trees, which can do set operations in O(lg(lg(U)))


Yes, in cases where direct addressing is possible you get guaranteed constant time. That's really more of an array than a hashtable IMO.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: