Hacker News new | past | comments | ask | show | jobs | submit login

What issues do you have with cuckoo hashing? Unlike bloom filters it seems like a completely reasonable default / baseline no?



Cuckoo hashes are great if you really understand why you're using them, but 99% of people would be fine with chaining for most applications or a simpler set-associative scheme if open addressing is not negotiable.

Their locality is problematic, they are fantastically hard to get right in a multi-reader/multi-writer case (if you don't want to just lock the whole damn table) and they're just all-round complex with surprising sharp edges. The fact that you have to go wandering off with a bunch of unpredictable data-dependent reads if you don't find what you're looking for is ... wild. Alternately you can pay the costs of multiple parallel accesses preemptively, but now you've smashed throughput in case the table is big enough to be bandwidth limited (L3, memory).

They look pretty great when the load factor is low, but everything looks great with a light load factor.

As far as open-addressing schemes go, they're OK, but I see people reaching for them for reasons that don't seem to make any sense.


Biggest problem with Cuckoo tables is that if your hash function doesn't have the right properties insertions do not converge. They literally go into an infinite loop. This makes them not viable as a generic primitive for hash tables.

Besides, as you mention, that they're in fact slower than modern engineered hash table implementation that do only one random memory access in the vast majority of cases.


Yes, exactly. Linear probing is often easier and faster. Using buckets also helps (each bucket contains multiple slots), for both cuckoo hashing and linear probing. E.g. the cuckoo filter uses a bucket size of 4. See also https://gankra.github.io/blah/hashbrown-tldr/ for a fast hash table implementation.


Using hashbrown as evidence in a comment defending linear probing is odd when it specifically notes that it uses quadratic probing.


I'm sorry! I thought this implementation uses linear probing. Yes this is not an example of linear probing. But it is much closer to linear probing than to cuckoo hashing (I hope we can agree on this). I also found this blog entry interesting: https://probablydance.com/2017/02/26/i-wrote-the-fastest-has... (of course "fastest" is relative, when it comes to benchmarks)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: