So it’s just some unfounded handwaving? You can just dismiss one of the greatest...

vidarh · on May 1, 2023

It's not just handwaving. It's knowing the context. As late as the mid 80's it was not unusual for multiplication to take tens of cycles if your CPU even had a built in multiplication instruction (e.g. on the M68000 in the Amiga, a MULU - unsigned 16 bit -> 32 bit multiplication - took from 38 to 70 cycles plus any cycles required for memory access; if you needed to multiply numbers larger than 16 bit to depend on the overflow like in the case of this article, you'd need multiple instructions anyway), far worse if it didn't, while memory loads were often cheap in comparison (on the M68000 a memory read indirect via an address register could be down to 8 cycles), and so until years after that it did make sense to do all kinds of things to reduce the need for multiplication. But it doesn't any more.

While it'd be worthwhile doing tests to confirm a specific case, the default assumptions have changed: Today memory is slow and multiplication fast (in terms of cycles; in absolute terms both are of course far faster).

You certainly should not today pick a more complex hashing scheme to try to avoid a multiplication without carefully measuring it just because it was discussed even by someone as smart as Knuth in a context where the relative instruction costs where entirely different.

If you're actually using the function as the primary hash function, then the distribution of the output might well make up for significant performance difference, so this is not to suggest that tabulation hashing isn't a worthwhile consideration.

tjoff · on April 30, 2023

How is it unfounded? How is it unreasonable to question the relevance of an optimization from a time where the relevant parts of computing were completely different?

"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.", Donald Knuth

Maybe doing benchmarks for a comment isn't worthwhile. I guarantee you have to do different benchmarks for different contexts anyway so I can't blindly trust the benchmark anyway. Not to say it wouldn't be interesting.

I wouldn't mind someone plotting the cost of instructions over time and how that affects choice of algorithms. But to expect that from a comment?

tinus_hn · on April 30, 2023

So here’s a paper where the author has run the experiment:

https://arxiv.org/pdf/1011.5200.pdf

And the answer is that the speed is comparable to other functions that however produce worse results.

vidarh · on May 1, 2023

A couple of issues with that interpretation:

1. Dated hardware, so the hashing algo speed comparison is no longer relevant without redoing it, but even on hardware that old a 2.2x-2.8x speed advantage for mul+shift is substantial.

2. No tests with contention for the cache; no tests with different table sizes; no code given. As a result it's impossible to tell if the performance numbers are relevant and realistic.

3. If they could demonstrate substantially better distribution, it might still be very worthwhile despite how much slower it is, but they test the hashing algorithms with runs of 100 random constants. We don't know if any of those constants are any good because they've not given them, but odds are highly against 100 random constants even approaching good. As such the comparisons of tabulation hashing with the other hashing method is meaningless in terms of performance (but see below) - it's trivial to find constants for multiplication + shift that produces pathologically bad outcomes.

What the paper does appear to show is that tabulation hashing might have more predictable runtime given the result on the specific set of structured input they test with, and that might well be a good reason to use it for some applications.

But that is tainted by the lack of transparency in what they've actually compared against.

(This is also mostly relevant if you considering using a multiplication-shift based hash function, which is also not what the original article is advocating you use Fibonacci hash for)

tinus_hn · on May 1, 2023

The hardware is not that dated, they specifically target then modern hardware that has the fast multiplication everyone is on about. And while multiplication has gotten faster, of course the size of the l1 cache has also grown. It is per-core so there is not going to be a whole lot of contention going on. A modern budget cpu has about 32kb per core so it’s not going to be a squeeze.

I am no expert in math but the algorithm is claimed to be better than another one because it is in a class that is better than the class the other is in. It’s not because a run of 100 shows some distribution.

You have a lot of demands for exhaustive testing but when you are asked to provide the same, it’s all too much to ask. ‘I wouldn’t mind someone plotting graphs’ yeah thanks, I wouldn’t mind someone else doing the work.

Then again someone else probably has done the work more recently or more in line with what you want to see. I found this paper in a few minutes or websearching, I’m sure you can spare the time to find a better one.

vidarh · on May 1, 2023

> It is per-core so there is not going to be a whole lot of contention going on.

That's only true if nothing else happens between requests to the hash table. That might be the case, or it might not. Depending on your workload that might make no difference or totally ruin your performance characteristics.

> I am no expert in math but the algorithm is claimed to be better than another one because it is in a class that is better than the class the other is in. It’s not because a run of 100 shows some distribution.

The problem with this is that while it may well have better characteristics than multiply and shift on average, the quality of the distribution of multiply and shift based hashes can vary by many orders of magnitude depending on the choice of multiplication factors. Put another way: Multiplying by 1 and shifting is a perfectly valid multiply and shift hash function. It's a very stupid one. The performance characteristics for a hash table doing that is nothing like the performance characteristics of what is proposed in the original article. I have no doubt that the table based approach will beat a large proportion of the multiply and shift hashes. But so does other multiply and shift hashes, by large factors. As such, without actually comparing against a known set of some of the best multiply-shift hashes we learn very little about whether or not it'll do well against good multiply-shift hashes.

To me, the fact that they chose random factors is very suspicious. Nobody uses random factors. The effort spent on choosing good factors over the years has been very extensive, and even hacky, ad hoc attempts will tend to use large prime numbers.

> You have a lot of demands for exhaustive testing but when you are asked to provide the same, it’s all too much to ask. ‘I wouldn’t mind someone plotting graphs’ yeah thanks, I wouldn’t mind someone else doing the work.

I've not made demands for anything. I've pointed out that making a blanket claim that multiplication is bad when the performance characteristics has changed as much as they have is unreasonable, and a paper like this tells us pretty much nothing more. It's absolutely reasonable to consider table based approaches; it's quite possible, even likely they'll have desirable properties for various sets of inputs - there is no such thing as a perfect hash function for all inputs, and sometimes you care most about pathological worst case scenarios, some times you care about averages, some times you know what data you will or won't see. That it performs as well as they've shown it to means there is almost certainly situations where it will be a good choice. But because of the choices they made in that paper we can't really tell when and where that would be, and that's a shame.

What is not reasonable is just writing off the use of multiplication on the basis of hardware characteristics that are decades out of date. That doesn't mean you should blindly use that either.

If there's one thing people should know about working with hash tables it's that you should test and measure for the actual type of data you expect to see.

tinus_hn · on May 1, 2023

Sorry, all I can do is suggest you come back tomorrow and reread the comment you wrote.

vidarh · on May 2, 2023

I have and stand by every word, but seeing as you have no arguments and resort to implied insults we are done here.

tinus_hn · on May 2, 2023

Sorry, if you start making up giant arguments against claims I have not made there really is no response. But clearly you don’t see that.

NohatCoder · on May 2, 2023

It is funny, the article claims to test the 64 bit code on "Dual-core Intel Xeon 2.6 GHz 64-bit processor with 4096KB cache". That is a really poor description, as it does not tell us what architecture the processor is. But one can go through a list of all Intel Xeon processors to find the ones that match the description. Turns out that there are none.

If we broaden the search to 2.66 GHz processors there are 4: 5030, 5150, 3070 and 3075. All released in 2006 and 2007. This means it is either one of the last "NetBurst" CPUs or one of the first "Core" CPUs. Assuming "Core" the relevant operation has a 5 clock latency, as best I can tell. This is down to 3 clocks on pretty much all modern X86 CPUs. Modern CPUs also get an extra load port, so I doubt the relative difference is much different on modern CPUs.

Overall it looks like a pretty bad benchmark, thrown into a paper on collision likelihood, which itself looks like an academic exercise with no relevance for the real world.