Burstsort: Fastest known algorithm to sort large set of strings

ryancox · on Jan 22, 2009

This is the algorithm used in Hadoop's record setting TeraSort benchmark:

http://perspectives.mvdirona.com/2008/07/08/HadoopWinsTeraSo...

http://svn.apache.org/viewvc/hadoop/core/trunk/src/examples/...

jws · on Jan 22, 2009

Burstsort's design considers cache effects. (A cache miss can easily cost you 200 instructions.) In the grand old days of sorting algorithms it was instruction count that mattered.

Judging from their graphs, Burstsort is fastest by a wide margin for sufficiently large datasets.

Xichekolas · on Jan 22, 2009

It also has very consistent asymptotic performance... ie. doesn't degrade as badly as Quicksort when the data is strange in some way. (When a list is already in reverse order, naive Quicksort degrades to O(n^2)).

Also oddly interesting, according to some page I was reading on Quicksort before this, there exist adaptive algorithms that generate worst case data sets for Quicksort no matter what the partitioning scheme is.

eru · on Jan 22, 2009

How general can the partitioning scheme be? Can it generate worst cases for random selection of the pivot?

Also one can always select the median in (deterministic) linear time. This way quicksort won't degenerate to O(n^2) ever.

arebop · on Jan 22, 2009

I think Xichekolas had in mind adversarial techniques such as the one described in http://www.cs.dartmouth.edu/~doug/mdmspe.pdf. That paper assumes O(1) pivot selection.

eru · on Jan 22, 2009

I read the paper. Seems like they can not beat a median pivot (that takes O(k) to find at each step, but keeps the quicksort runtime of O(n log n).)

Still, a cool progam.

lliiffee · on Jan 22, 2009

With random pivot selection, quicksort is expected O(n log n) time for all inputs. (The expectation is over the set of random numbers used to pick the random pivots, and doesn't involve the input at all). Algorithms like this are called "las-vegas".

eru · on Jan 22, 2009

Yes. And if your adversarial algorithm is good enough, it should work even in that case. (Of course any input is the worst case if all inputs have the same expected time.)

Retric · on Jan 23, 2009

Wait if finding the median takes n time, and your sort takes n log n time then your best option is ~ n * log n * log n.

yters · on Jan 23, 2009

No, the O(n) for median selection is the same O(n) for set partition.

Retric · on Jan 23, 2009

Ahh, ok, it takes N * log N to do the selection, but it's only O(l) for the length of the set partition each time. I was thinking it's the same n each cycle vs the same n at each depth. Err, nm it's hard to put into words.

PS: It still doubles the time but that's not important in O notation.

eru · on Jan 23, 2009

You can get away with less than the media. For example a pivot guaranteed to lie between the 25th and 75th percentile is good enough and can be found faster.

Retric · on Jan 23, 2009

Personally, I really like merge sort. It makes few assumptions, it's stable, it works well with sorted inputs, can be split across multiple CPU's, and it works well with sequential input streams. Radix sort tends to be faster especially for strings, but it does not work for many data structures without adapting the algorithm. EX: Radix sorting doubles requires bit manipulation of the data.

eru · on Jan 24, 2009

Merge sort can be thought of as a kind of binary counting.

yters · on Jan 25, 2009

Doh, right, k select is the same problem. Median of medians is a good linear time technique, though.

dfox · on Jan 22, 2009

Important fact is that burst sort ("string" being the important keyword in title) is not as general as quick sort and thus it can be very fast. With assumptions required by quick sort (having comparison operator) sorting cannot be faster than O(Nlog(N)) (on average).

If you are interested in general sorting algorithm with good worst case performance, try merge sort. Merge sort is O(Nlog(N)) independent of input data, but with significantly larger constants than quick sort, also in theory it requires O(N) auxiliary space, but that is only case for arrays, linked-lists can be sorted in place.

likpok · on Jan 23, 2009

Smooth sort is better than mergesort, offering in-place sort that approaches O(n) as the data approaches sortedness (lower bound is O(nlogn)). It is, however, very complex to implement.

eru · on Jan 23, 2009

Don't you mean upper bound?

tptacek · on Jan 22, 2009

Are the parameters (like container size) sensitive to the cache design? They picked 8192 elements by hand-tuning; this is what Judy Arrays tried to accomplish too, and they ran aground on code that got too complicated as it tried to accomodate different architectures.

This is basically Radix sort (radix sort : tries :: heap sort : heaps) with a cache optimization, which is neat.

Xichekolas · on Jan 22, 2009

That was my thought too when I saw the 8192 number, but they deferred with "subject to further research".

I wonder if you could crudely determine a number based on inspecting the size of the cache of the machine at runtime.

I think the most impressive part about the whole thing is that the cache optimization flows naturally out of how the Trie is constructed rather than any voodoo on their part. (Since each string is handled once, rather than m times, where m is the string length, as in a pure radix sort.)

It makes me wonder if the cache optimization aspect was foreseen or accidental.

gojomo · on Jan 22, 2009

I haven't looked closely at this 'burstsort', but have read previously about 'cache-oblivious' algorithms, that benefit consistently from memory caches whatever their size. (No tuning for different sizes required.) See discussion here:

http://blogs.msdn.com/devdev/archive/2007/06/12/cache-oblivi...

bd · on Jan 22, 2009

See also this lecture on cache oblivious algorithms by Erik Demaine from MIT 6.046 "Introduction to Algorithms" course:

"This is a particularly fun area, one dear to my heart because I've done a lot of research in this area. This is an area co-founded by Professor Leiserson. So, in fact, the first context in which I met Professor Leiserson was him giving a talk about cache oblivious algorithms at WADS '99 in Vancouver I think. Yeah, that has to be an odd year. So, I learned about cache oblivious algorithms then, started working in the area, and it's been a fun place to play. But this topic in some sense was also developed in the context of this class. I think there was one semester, probably also '98-'99 where all of the problem sets were about cache oblivious algorithms."

http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Compute...

prakash · on Jan 22, 2009

and this: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.5...

lallysingh · on Jan 23, 2009

I'll assume it's not terribly effective when you're in a functional language, unless you've specifically allocated the elements in an array?