Hacker News new | past | comments | ask | show | jobs | submit login

I'm fascinated that so many of the commenters here seem to be focusing on the fact that the algorithm mentioned doesn't work in every case. My reaction was different -- I thought of a case where it would be less than ideal, and allowed it to inspire me to find an algorithm that worked better than naive binary search for that case, too.

In particular, I thought of the case of searching arrays of strings. If you are doing repeated searches against the same array, you can easily build a table breaking the array down into subarrays all sharing the same first character. (I'm assuming ASCII there!) Using that can quickly eliminate 5 or so comparisons that would be required with a naive binary search.

Okay, that's not brilliant, but might be useful in some circumstances. My bigger point is I thought this article was a very useful reminder that even a terrific general purpose algorithm like binary search can be beaten if you can apply specific domain knowledge to the search.




That sounds a bit like Burst Tries / burstsort.


You mean the Burst-colomon sort.


>In particular, I thought of the case of searching arrays of strings. If you are doing repeated searches against the same array, you can easily build a table breaking the array down into subarrays all sharing the same first character. (I'm assuming ASCII there!) Using that can quickly eliminate 5 or so comparisons that would be required with a naive binary search.

depending on the extent of your "fan-out" your memory budget might explode...


ASCII can be stored in a byte, so at most he is going to use

256*(index+length).

Assuming he doesn't have more than 2^64 byte of data, he will only need 24576 bytes to the index, and only 32768 if he needs a 64 bit, rather than a 32 bit length.

You might even not need the length argument as it is only the next array index subtracted from the current. In that case you need only 16384 bytes.

If you want two bit unicode, you have to use 49152, 65536 and 32768 bytes, respectively.

Remember he only wanted to index the first value.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: