> This should _only_ help if you use a terrible hash function.
So in other words it helps. The point being that when you're writing a library where you have no control over what people will pass you, you can either punt on doing the best you can and leave it to be the users problem, or when - like in this case - there is a trivially simple option that makes things better you can make it better.
> I'm also not sure why they claim that integer modulo is slow when the* show how to make it fast by using bitwise AND.
They show that integer modulo is slow because a lot of code uses the actual modulo rather than bitwise and (granted often because they assume they need a prime sized hash table to do well).
EDIT: Note that the article specifically also addresses the bitwise and by pointing out that Dinkumware in using that 1) had to compensate by using a more expensive hash for integers, 2) created a situation where implementers providing a custom hash had to be aware that this specific implementation would just throw away the high bits, and pointed out they'd been bitten by that issue several times. It also mentions how Google's google::dense_hash_map did the same thing but without even providing it's own integer hash replacement to compensate.
> But we're designing the hash table! There's no reason for us to make it possible for the user to provide a custom hash function.
He's comparing it to the C++ stdlib. An implementation that is intended to be a drop-in replacement for that needs to abide by similar guarantees, so it is possible to provide a custom hash function for the set of contracts in the libraries he compares with, and it's fairly common to do so.
The logical conclusion is that there is a reason for him to make it possible for the user to provide a custom hash function, because it's what the clients he wants to support expects.
Sure when you have control over the API and you are free to refuse to let users of the API set custom hash functions, just provide a better hash function.
But that is explicitly not the case this article is addressing, so you're arguing against a strawman.
The argument that another solution to this issue is to just refuse to let users provide the hash function would be entirely reasonable, and maybe it'd have been nice if the article started by addressing that.
So in other words it helps. The point being that when you're writing a library where you have no control over what people will pass you, you can either punt on doing the best you can and leave it to be the users problem, or when - like in this case - there is a trivially simple option that makes things better you can make it better.
> I'm also not sure why they claim that integer modulo is slow when the* show how to make it fast by using bitwise AND.
They show that integer modulo is slow because a lot of code uses the actual modulo rather than bitwise and (granted often because they assume they need a prime sized hash table to do well).
EDIT: Note that the article specifically also addresses the bitwise and by pointing out that Dinkumware in using that 1) had to compensate by using a more expensive hash for integers, 2) created a situation where implementers providing a custom hash had to be aware that this specific implementation would just throw away the high bits, and pointed out they'd been bitten by that issue several times. It also mentions how Google's google::dense_hash_map did the same thing but without even providing it's own integer hash replacement to compensate.
> But we're designing the hash table! There's no reason for us to make it possible for the user to provide a custom hash function.
He's comparing it to the C++ stdlib. An implementation that is intended to be a drop-in replacement for that needs to abide by similar guarantees, so it is possible to provide a custom hash function for the set of contracts in the libraries he compares with, and it's fairly common to do so.
The logical conclusion is that there is a reason for him to make it possible for the user to provide a custom hash function, because it's what the clients he wants to support expects.
Sure when you have control over the API and you are free to refuse to let users of the API set custom hash functions, just provide a better hash function.
But that is explicitly not the case this article is addressing, so you're arguing against a strawman.
The argument that another solution to this issue is to just refuse to let users provide the hash function would be entirely reasonable, and maybe it'd have been nice if the article started by addressing that.