1) The linearly-probed robin hood variant is just a sorted array. That's it. It's just a sorted array. I haven't seen an article explain it so succinctly though.
2) The linearly-probed robin hood variant is only faster than regular linear probing for searches that fail. Successful searches have the same average probe count. Too many articles bullshit about how robin hood reduces variance (which is irrelevant 99% of the time) and then go on to ignore the fact that they cannot beat the performance of linear probing when it comes to successful searches. One guy I saw spent hours trying to optimize his hash table for the K-nucleotide benchmark without realizing that a trivial 5-minute implementation of linear probing would beat it because searches in that benchmark never fail.
3) There's a simple 40 year old bitwise trick by Knuth which allows one to combine the probe distance and the hash into a single value. This saves up to 4 bytes per table entry, but I haven't seen anyone use it.
There's also a few articles that get deletion straight-up wrong, but meh. That doesn't bother me too much.
Also, for double hashing I'm pretty sure a SSE/AVX variant is optimal. I haven't tried implementing one though.
> 1) The linearly-probed robin hood variant is just a sorted array. That's it. It's just a sorted array. I haven't seen an article explain it so succinctly though.
It's not. Importantly, it leaves spaces so that you avoid what would be ~sqrt(n) probe lengths had you tightly packed everything. Also, it is a bit out of order as you wrap around the limits of the array (less of a big deal).
The sortedness property is really useful, btw. A RHH lets you get the perf of a good hash map for random access, and the throughput of a sorted list for merges / bulk processing.
> 3) There's a simple 40 year old bitwise trick by Knuth which allows one to combine the probe distance and the hash into a single value. This saves up to 4 bytes per table entry, but I haven't seen anyone use it.
When the values are sorted (modulo empties and wrapping) you don't need to store the probe length. You just maintain the invariant that things are sorted. Maybe I'm missing some detail of your implementation, but mine sure don't bother storing it.
>When the values are sorted (modulo empties and wrapping) you don't need to store the probe length. You just maintain the invariant that things are sorted. Maybe I'm missing some detail of your implementation, but mine sure don't bother storing it.
Storing the probe count allows the lookup function to have 2 branches per loop as opposed to 3. The probe count also contains enough information to distinguish occupied buckets from unoccupied ones. I'm convinced that combining the probe count with the hash is objectively the best way to implement such tables.
Can I convince you to downgrade to being subjectively convinced?
Storing hashes or probes wasn't faster for the use cases I evaluated. I can see reasons why you might like it, for your uses and perhaps "generally", but not in my case at least (small keys where storage bloat was worse than `fnv` recompute, and values which had invalid states which can be used to represent empties)
Edit: I'm re-reading my first response and it reads like "you never need/want the probe length", and I totally retract that position; it wasn't intended.
1) I don't see why a linear probing robin hood is a sorted array? Can you explain more?
Robin hood hashing doesn't limit which probing scheme you use. I end up with quadratic probing which gives me both good cache locality and good probe distributions. The probing schemes I tried was omitted in this post because it would bring too much noise. But I can give you some quick summary here:
1. linear probing: probing distribution has high medium and high variance. Performance is not that great either because of the high probing numbers.
2. quadratic probing: probing distribution is not the best, but the medium sicks to 2 probes. Since the first two probes are very close in quadratic probing, its overall performance wins.
3. When probing, hash the key with the probe added to the seed. This gives very good hash distribution, but hash on each probe is very slow. Also you cannot do deletion using this scheme.
4. rotate(key, probe). This is somewhat like rehash, but way faster. The probe distribution is also very good, but items goes too far away so we lost the cache locality.
5. Combination of different schemes listed above. Still, the quadratic probing gives me best performance.
I also tried to use gcc/clang vector extension to speed up probing, but it actually slows down for 30%! I guess I have to hand tune the SSE intrinsics and measure it carefully with IACA to get the optimal performance.
Deletion itself is quite complicated and deserves its own post.
I was actually working on a blog post about this a few months ago, but never published it because I wasn't happy with it. Here it is, anyway: https://goo.gl/W1KZ2t
What I found in testing was that linear probing beat everything when it had a sufficiently good hash function and a sufficiently low load factor. The load factor was pretty important. Even when factoring in cache misses, page faults, and the cost of allocations, a linearly probed table with a low load factor always beat more space-efficient designs.
>I also tried to use gcc/clang vector extension to speed up probing, but it actually slows down for 30%! I guess I have to hand tune the SSE intrinsics and measure it carefully with IACA to get the optimal performance.
They're only advantageous when the hash is perfect and the data is in the cache. Otherwise the performance should be the same as regular code. To clarify what I meant by SSE with double hashing, I really meant search an entire cacheline before incrementing by the second hash value. SSE can be used for this in some cases but regular code works just as well.
>Deletion itself is quite complicated and deserves its own post.
That's why I prefer the linear variant; deletion is easy!
It's a sorted array because your sorting by 2 keys (A,B). A is the hash, normally a u32 or u64. B is the probe count, also normally the same sized b/c system ints are easy.
Strictly speaking the Robin Hood "sort" isn't purely lexiconally ordered. A larger A value maybe replaced a smaller A, with a much larger B.
But this relation nonetheless is just a weird solution to build a cmp function one. One that arguably doesn't work in a strict sense. But one that _nearly_ works.
So if A is the hashed-to index into your table, then B is a function of A. When you are doing your linear probe and you see a value along with its probe count, it either
1. Has a higher probe count, this means it's hash is further away than yours => less than yours
2. Has a lower probe count than yours. This means its hash is closer than yours => greater than yours
3. Is equal in probe count to yours. That means it's in the same bucket => equal to yours.
When I first see "sorted array", my first impression is it is sorted by the "original key". If we look for the hashed value in hash table, all hash tables are "sorted array" with this definition.
Also there's a finial mod in linear probing
h(k, i) = (k + i) mod N. With this mod you may have different key/probe combination that messed up the order.
> 1) The linearly-probed robin hood variant is just a sorted array. That's it. It's just a sorted array.
Right. I stumbled on linear probing robin hood hashing as a simplification of Bender, Demaine and Farach-Colton's Packed memory array (http://erikdemaine.org/papers/FOCS2000b/paper.pdf) when inserts are randomised.
> 2) The linearly-probed robin hood variant is only faster than regular linear probing for searches that fail.
The sortedness means Robin Hood should not only be faster in the worst case (that's what reducing variance is all about), even for successful searches, but can also be faster on average, at high enough loads: make sure empty cells compare as greater than any legitimate value, track the maximum displacement, and use binary search (or just use exponential search).
2) The linearly-probed robin hood variant is only faster than regular linear probing for searches that fail. Successful searches have the same average probe count. Too many articles bullshit about how robin hood reduces variance (which is irrelevant 99% of the time) and then go on to ignore the fact that they cannot beat the performance of linear probing when it comes to successful searches. One guy I saw spent hours trying to optimize his hash table for the K-nucleotide benchmark without realizing that a trivial 5-minute implementation of linear probing would beat it because searches in that benchmark never fail.
3) There's a simple 40 year old bitwise trick by Knuth which allows one to combine the probe distance and the hash into a single value. This saves up to 4 bytes per table entry, but I haven't seen anyone use it.
There's also a few articles that get deletion straight-up wrong, but meh. That doesn't bother me too much.
Also, for double hashing I'm pretty sure a SSE/AVX variant is optimal. I haven't tried implementing one though.