Rainbow is the fastest 128-bit hash, and the fastest 256-bit hash (non-crypto). The 8th fastest 64-bit hash (by family, or 9th fastest overall, and 13-th fatest overall if you include 32-bit hashes). The advantage of rainbow is its easily scalable output size (64, 128 and 256), its high quality (passes all the tests), and its utter simplicity: it is under 140 source lines of code, easily readable
The random constants are primes that were selected based on their avalanche quality under a multiply modulo operation. This part of the development was interesting different primes had very distinctly different avalanche qualities. The highest quality primes caused good avalanche (~ 50% bit flip probability) across the widest possible set of bits, on average. These are quite rare. A lot of large primes only avalanche across a narrow range of bits, even in a 128-bit space. The search program took a couple of days to discover all these primes running on a 2020s era MBP. Primes are chosen because they give a complete residue set under the modulus, ensuring a long cycle length at least regarding the nominal test operation.
The rest of the hash was developed by trial and error using my intuition on developing hashes arising from long experience of doing so, and using SMHasher3 to evaluate the results, by iterating to improve and re-testing, over a period of a couple of weeks in the holidays a few years ago. I started making hash functions in my teens as a fun hobby.
I am using fnv-1a to hash Objective-C method selectors, which are generally just identifier characters and zero or multiple ':'. At the time of my research, fnv-1a had the least collisions over my set of "real life" selectors. I think, it could be worthwhile some time, to try out other constants for maybe even less collisions. Is your list of good primes available ? (And maybe also those that are not quite perfect)
Everything is in the source code. I highly doubt any of the good hash functions listed in smhasher3 (ie all tests passed) would collide over identifiers.
So they should all have zero collisions, meaning there’s no ‘least’ among the good quality ones - they’re all equally collissionless (they differ in other tests).
Sounds like an interesting project. What’s its purpose?
Cool. I forgot to mention, that I am truncating the hash down to 32 bits to hopefully generate tighter CPU instructions. At these few bits, collisions are still rare enough, but they are a concern.
Now my understanding of the choice of prime is that, you are "weighing" the input bits and the computed bits, that will form the hash. So in case of identifiers its very likely that bit 7 of the input is always 0 and say maybe bit 4 is statistically more likely to be 1 by some margin. The other input bits would have some entropy as well. I would expect that certain (imperfect) primes would then aid me to get a better use of the 32 bit space and therefore less risk of a collision for my Objective-C runtime.
Ah, that's interesting. 32-bits yes you would get some collisions even from good hashes just statistically.
Also now I understand your constraints. Very interesting, so you are designing a custom hash function to use in this specific domain of keys with specific probabilistic properties, and you are thinking that there would be some way you could multiply by a certain prime that would ideally fan out these keys to be evenly distributed over the space?
mulle-objc looks fascinating: fast, portable Objective-C runtime written 100% in C11. I encourage you to post a Show HN I'm sure people here would like it.
Hahaha! :) Good on you. Nothing to stop you posting again :)
Truly I have much experience with Posting Show HN's. There's very little quality difference between something you post that gets 3 points and something you post that gets 300 points. A lot of it depends on time, current dynamic audience, and other posts at the time.
Repeat post to get a better idea of the true interest in your work. I encourage you to do it again!!! :)
Spooky also has some good results on common identifiers.
But fnv-1a is in a completely different league. It's recommended for hash tables with other security measures than hash function security. This hash is a typical hybrid, but not universal. umash would be the perfect hybrid: insecure, pretty fast, passes all tests, universal
Rainbow is the fastest 128-bit hash, and the fastest 256-bit hash (non-crypto). The 8th fastest 64-bit hash (by family, or 9th fastest overall, and 13-th fatest overall if you include 32-bit hashes). The advantage of rainbow is its easily scalable output size (64, 128 and 256), its high quality (passes all the tests), and its utter simplicity: it is under 140 source lines of code, easily readable
The random constants are primes that were selected based on their avalanche quality under a multiply modulo operation. This part of the development was interesting different primes had very distinctly different avalanche qualities. The highest quality primes caused good avalanche (~ 50% bit flip probability) across the widest possible set of bits, on average. These are quite rare. A lot of large primes only avalanche across a narrow range of bits, even in a 128-bit space. The search program took a couple of days to discover all these primes running on a 2020s era MBP. Primes are chosen because they give a complete residue set under the modulus, ensuring a long cycle length at least regarding the nominal test operation.
The rest of the hash was developed by trial and error using my intuition on developing hashes arising from long experience of doing so, and using SMHasher3 to evaluate the results, by iterating to improve and re-testing, over a period of a couple of weeks in the holidays a few years ago. I started making hash functions in my teens as a fun hobby.
There's also a fun little wasm powered dashboard you can play with, here: https://dosaygo-research.github.io/rain/