Hacker News new | past | comments | ask | show | jobs | submit login

This depend entirely on being able to do the interpolation on your keys. He computes:

    offset = (key - min) / (max - min)
but not all data can have arithmetic done on it like that. What if the data in your array are, for example, names?

It's pretty trivial to beat binary search if your data are integers, or other keys on which you can do arithmetic. You can make the uniform distribution assumption, check it as you go, degrade to binary search if the assumption seems flawed, and then proceed from there.

However, if you're given an array of unknown objects, a comparison function, and no way to do arithmetic on your objects, you can't beat binary search.




However, if you're given an array of unknown objects, a comparison function, and no way to do arithmetic on your objects, you can't beat binary search.

... if all you care about is operation count. If you're fetching pages from disk, a B+Tree will perform dramatically better than a binary search.

And if you're fetching pages from disk and have a large cache, a B+Tree can do far better than interpolation search, too. Interpolation search will thrash your cache, resulting in O(log log n) seeks, whereas a cache of size O(n/B) will allow a B+Tree to find data in O(1) seeks.


But now you're talking about different data structures - his assumption was that you had a simple sorted array. We both know that there's a lot that can be done, and when performance really matters you don't use a sorted array, you don't use binary search, and you do take into account things like access pattern, access speed, caches, etc.

But people don't, because they read simplistic articles like this and then believe they know everything. At least the author could hint that there's a lot more that can be done. As it is, it just leaves the naive reader with a sense that they've seen the light, and the experienced reader irritated.


True. Although I'd be interested in seeing actual benchmarks for interpolation vs. binary search too, since binary search has better cache locality...


What if the data in your array are, for example, names?

  offset = (key[0] - min) / (max - min)
doesn't seem very hard to me. Or you could even go by the first 2 digits:

  offset = (key[0] << sizeof(key[0]) + key[1] - min) / (max - min)
The issue, of course, is that names aren't evenly distributed, so this algorithm could degrade pretty poorly... you'd probably be better off with a simple binary search.


My apologies - the example I gave was a poor one, but I thought the point was clear. Obviously it wasn't.

Suppose instead I give you records of an unknown type, and a function that compares them. Now interpolation search doesn't work and you have to do something either simpler, or more complex.


But that's a different problem. Of course different problems may require different solutions.


But that's a different problem.

No, it's exactly the problem. Quoting:

    Quick, what is the fastest way
    to search a sorted array?
See? He didn't say I can do arithmetic on the keys, he just said the array is sorted. He then goes and solves a different problem to the one he originally gave.

In short, the real answer is: It depends

Yes, if you can do arithmetic on the keys, and the keys are sorted in the obvious way according to your arithmetic, then interpolation search can be faster than binary search.

But not always, not in the worst case, and not if you have interesting issues about the cache, access speed, etc.

No, I didn't change the problem, I just raised issues with the overly simplistic analysis he gave.


I'm not saying his problem description was great, but it did become clear through the course of the article the additional constraints and assumptions that he was working under. He certainly never said "you're given an array of unknown objects, a comparison function, and no way to do arithmetic on your objects", and that first sentence you quoted doesn't contain enough information to translate directly into that interpretation (where did it say you get a comparison function, for instance?).


I wonder if we're simply in violent agreement - it certainly seems to me that you just aren't getting my point, and while much of what you say is true, it just doesn't seem relevant to me.

Yes, Interpolation Search has better average complexity in those circumstances when it works. However, for almost all cases of Binary Search that I use, Interpolation Search won't work, and isn't applicable. I was trying to make that point.

If your vector is sorted then it's reasonable to assume you have a comparison function between the objects in your vector. Binary Search then requires nothing more than this absolute minimum - just a comparison function. In this case Interpolation Search doesn't work at all.

As Colin pointed out above, it actually gets worse for our author. In cases where the volumes are sufficient to make much of a difference, you are almost certainly going to want to take caching and access speeds into account. Now you need to move out of the realm of Binary or Interpolation Search and into other data structures anyway.

Yes, Interpolation Search is cool and works in a small niche of cases. But mostly, it's irrelevant.

And that's my summary. Cool, but mostly either inapplicable or irrelevant.


I think you're overstating how often you're only going have a sorted vector of unknown type and an opaque comparison function, and understating how often you'll know your types and be able to make a reasonable guess at their distribution.

You're also ignoring how much simpler it is to roll a quick interpolation search than throw together a B+-Tree, to the point where it may make sense as something to try when you've got an hour to try to speed up some process just enough to get you through another week of growing load while you're rewriting your app to do it right.

But, yes, I'll agree with your summary.


One particular real-world case where it is useful is seeking (to a particular frame number or time code) within variable-bit-rate video streams.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: