Timsort (2019)

linsomniac · on July 29, 2022

Among the previous discussions of timsort:

https://news.ycombinator.com/item?id=21196555

https://news.ycombinator.com/item?id=17436591

dang · on July 29, 2022

Thanks! Macroexpanded:

Beating TimSort at Merging - https://news.ycombinator.com/item?id=27823180 - July 2021 (69 comments - including https://news.ycombinator.com/threads?id=tim-peters - maybe luck will strike again...)

Timsort, the Python sorting algorithm - https://news.ycombinator.com/item?id=21196555 - Oct 2019 (131 comments)

On the Worst-Case Complexity of TimSort - https://news.ycombinator.com/item?id=17883461 - Aug 2018 (74 comments)

Timsort is a sorting algorithm that is efficient for real-world data - https://news.ycombinator.com/item?id=17436591 - July 2018 (77 comments)

Functional verification with mechanical proofs of TimSort [pdf] - https://news.ycombinator.com/item?id=9778243 - June 2015 (1 comment)

Timsort - https://news.ycombinator.com/item?id=3214527 - Nov 2011 (27 comments)

Visualising Sorting Algorithms: Python's timsort - https://news.ycombinator.com/item?id=2092594 - Jan 2011 (3 comments)

Java has switched from Mergesort to TimSort - https://news.ycombinator.com/item?id=752677 - Aug 2009 (13 comments)

Visualizing Sorting Algorithms - https://news.ycombinator.com/item?id=750858 - Aug 2009 (3 comments)

g42gregory · on July 29, 2022

Apparently a lot of people heard of it. :-) I just love these headlines.

linsomniac · on July 29, 2022

Heard of it? I've bought tim an Icelandic hamburger!

eunoia · on July 29, 2022

Story time, please :)

linsomniac · on July 30, 2022

It's only a small story: I was one of the group sent to Iceland for the Need for Speed Sprint on Python back in ~2006. Among the other participants was, of course, tim. I mostly worked on benchmarking to determine how much of a speedup we had achieved.

tim gave me some really eye-opening tips, one that's really stuck with me is that you want to take the best of many short runs of the benchmark rather than running it a long time, because longer runs are going to guarantee interference from other parts of the OS (scheduling, etc).

tim is an insanely smart guy, kind, and funny.

One day for lunch we were ordering takeout from this little joint, I forget the name, it might have been Hamborgarabulla Tomasar, and I picked up the check for everyone (maybe 15 people). tim described it as "shockingly good", which was absolutely accurate. For how expensive Iceland was for food, it wasn't too terribly bad price-wise either.

I wrote up more here, though I guess my blogs are offline now. https://lwn.net/Articles/185399/

So, not too crazy of a story. Not like the story of how I first met tim.

mulmen · on July 30, 2022

> So, not too crazy of a story. Not like the story of how I first met tim.

Are you going to make us ask every time?

How did you meet Tim?

linsomniac · on July 30, 2022

That's a story I only tell in person. ;-)-ly-yrs

Beldin · on July 29, 2022

Interestingly, Timsort used to be broken [1]. They found that by trying to prove correctness. I recall Stijn de Gouw remarking that they wanted to have their proof artefact evaluated, but you'd need a seriously beefy machine to do so.

[1] http://www.envisage-project.eu/proving-android-java-and-pyth...

masklinn · on July 29, 2022

Timsort was not broken, the implementations were, which is a slightly (but crucially) different issue: the implementations don’t / didn’t uphold the algorithm’s invariants at the upper edges.

Not only that, but the cpython’s implementation’s misbehaviour effectively couldn’t be triggered because you couldn’t create an array large enough to reach it.

_old_dude_ · on July 29, 2022

There is a follow up, the fix used in Java was not good enough [2].

[2] https://bugs.openjdk.org/browse/JDK-8203864

CJefferson · on July 29, 2022

Closely related is pattern defeating quicksort ( https://github.com/orlp/pdqsort ), which adapts quicksort to take advantage of sorted runs. I've adapted a few quicksorts to pdqsort and seen good speedups (as people were often sorting partially sorted data)

Basically: Timsort is to mergesort as pdqsort is to quicksort

BizarroLand · on July 29, 2022

My favorite oddball sorting algorithm is StalinSort. It runs in (O)1 time, making it probably the fastest sorting algorithm in the world.

It iterates through the list, and deletes every item in the list that isn't ordered. The remaining list is therefore automatically ordered.

magicalhippo · on July 29, 2022

Hadn't heard of that one, and looking it up I stumbled upon this[1] delightful collection of esoteric sorting algorithms. Perfect for a Friday evening...

[1]: https://towardsdatascience.com/esoteric-sort-algorithms-and-...

reidjs · on July 29, 2022

If it iterates over every item in the list then it’s O(n) time

drexlspivey · on July 30, 2022

Just delete the whole list

willcipriano · on July 30, 2022

Why would they ask to sort it if it is already sorted? Deleting the list resolves any out of order items.

BizarroLand · on Aug 2, 2022

BLARG. I knew that but I second guessed myself into getting it wrong. Thanks for the clarification.

BizarroLand · on Aug 2, 2022

I guess you could make a Blitzkrieg multithreaded variation, then it would run in O(Log N) time.

masklinn · on July 29, 2022

> Timsort is to mergesort as pdqsort is to quicksort

It’s the other way around, since timsort is a decade older than pdqsort.

dragonwriter · on July 29, 2022

Age has nothing to do with the accuracy of the analogy, which says nothing about the temporal relationship between timsort and pdqsort.

gavinray · on July 29, 2022

I thought it was widely known, IIRC a number of languages have adopted it as the stdlib default sorting algorithm for either unstable/stable sort (can't remember which is it?)

I'm not big into algorithms but I believe that pdqsort and timsort were supposed to be the best two general sorts.

rufus_foreman · on July 29, 2022

Java uses it in Arrays.sort:

     * <p>The implementation was adapted from Tim Peters's list sort for Python
     * (<a href="http://svn.python.org/projects/python/trunk/Objects/listsort.txt">
     * TimSort</a>).  It uses techniques from Peter McIlroy's "Optimistic
     * Sorting and Information Theoretic Complexity", in Proceedings of the
     * Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474,
     * January 1993.

masklinn · on July 29, 2022

> either unstable/stable sort (can't remember which is it?)

Stable. It’s a hybrid mergesort.

Some of the langages which adopted it after Python are Java, Rust, or Swift.

valbaca · on July 29, 2022

Java and Python IIRC

earthicus · on July 29, 2022

Here [1] is a great (short, readable) paper from 2015 that explains Timsort and similar stack-merge sorting algorithms. It also gives a runtime analysis, and a simplified version of the algorithm that they call 'alpha-sort'.

[1] https://hal-upec-upem.archives-ouvertes.fr/hal-01212839/file...

acrophiliac · on July 30, 2022

Whenever there's a discussion about sorting I like to put in a little reminder about the humble Radix Sort which is often overlooked by academics because it's so simple. It deserves more attention as it is super fast: O(kn). Academics disregard it because it won't sort floating point numbers, but the vast majority of real-world use cases deal with alphanumerics, so Radix Sort should get more press than it does.

zbobet2012 · on July 30, 2022

In many practical cases you can sort in O(n) since comparisons out to 512 bytes are a single op on a modern CPU.

threatofrain · on July 29, 2022

Also interesting is pdqsort, which will become Go's standard sort.

https://news.ycombinator.com/item?id=31106157

https://news.ycombinator.com/item?id=31098822

Rapzid · on July 29, 2022

Needs a 2005 in the title.

lupire · on July 29, 2022

9 lines of Timsort was the centerpiece of the Oracle-Google Android Java lawsuit.

https://www.infoq.com/news/2012/05/nine-lines/

jsnk · on July 29, 2022

I was hoping to see some benchmark numbers

lifthrasiir · on July 29, 2022

CPython source code contains a separate documentation [1] for a detailed description and discussion for the algorithm. TimSort optimizes for the number of comparisons (which are expensive in Python since each comparison can call a Python function) so all numbers are given as such.

[1] https://github.com/python/cpython/blob/main/Objects/listsort...

hinkley · on July 29, 2022

Number of comparisons is a particular sore point for me. Nobody is sorting integers. This is the goddamned 21st century, not 1980's era video games. Just stop.

Show me a sort algorithm that is still efficient with five pointer indirections in compare() and I'm satisfied. Show me primitive data types and I start to wonder what you're hiding (read: lying about).

elcomet · on July 29, 2022

What are you talking about? I sort integers every day

mort96 · on July 30, 2022

Sorting integers, floats and strings are surely some of the most widely used ways of sorting? You have some table-like thing, you want to sort it by some numeric column or alphabetically by some string column. Sorting some generic structure is surely the edge case, and even when you do that, surely the most common case is to compare one or two structure fields which are either ints, floats or strings.

In languages which try to be efficient, surely an expensive compare function is the outlier.

varenc · on July 29, 2022

Creating a single benchmark seems tricky since Timsort’s key strength is it performs particularly well on real world data where subsequences of the input are often already sorted (runs). With random data it should have O(n log n) time like any other comparison sort.

So with benchmarking the challenge is coming up with a set of real world-like test data that feels representative but isn’t just excessively playing to Timsort’s strengths. Not everyone’s “real world” data is the same.

hinkley · on July 29, 2022

I'm trying to recall what the expected level of clustering is even for properly random inputs.

A few years before Timsort there was someone randomly shuffling the input in order to avoid the reverse-order corner case that kills so many sort algorithms.

For truly random inputs, about half of the entries should be pairs that are partially ordered, and 25% runs of 3, no? And if memory serves Timsort also has a fast path for strictly decrementing runs as well, where it amortizes some of the comparisons done in the scan phase to do a blind reverse() call.

sshine · on July 29, 2022

This kind of almost sorted data is easily synthesised with any degree of already-sorted-ness.

varenc · on July 29, 2022

Certainly! But then how do you decide what degree of already-sorted-ness fairly represents real world data?

sshine · on July 30, 2022

You don’t have to agree on any one degree. You can set up a giant matrix with lengths on one axis, and the sortedness on the other.

Each cell can have a color indicating if TimSort beats, say, some other hybrid of MergeSort; green shades suggest TimSort is winning, red shades suggest the contender is winning.

For each cell, do multiple runs with those sortedness/length parameters and pick the average.

not2b · on July 29, 2022

By collecting a lot of real world data?

tpoacher · on July 29, 2022

> built for the real world — not constructed in academia.

Why would lacking in evidence and rigor be considered a plus?

Does Timsort pride itself of being the homeopathy of sorting algorithms?

What a bizzare brag to make.

masklinn · on July 29, 2022

> Why would lacking in evidence and rigor be considered a plus?

The point it’s making is about heuristics rather than theoretical complexity. Tim Peters provided plenty of evidence and rigorous work when he proposed it.

That you interpret it as “lack of evidence and rigor” is really about you.

tpoacher · on July 29, 2022

On the contrary, it means it was actually supported by academic effort (the provision of evidence and rigorous work, as you put it).

Hence why I'm saying what a weird brag. It's not even true.

scrozier · on July 29, 2022

I had the same thought at first. But I think the point is that this sort of heuristically cobbled together sort wouldn't typically be the stuff of academic research, brilliant and practical though it is.

jamestimmins · on July 29, 2022

I actually found this valuable, without being a "brag". The best type of sort for real-world problems is build with an understanding of real-world datasets. The types of questions and research that interest academia, while often valid, are distinct from those that concern practical engineers.

pizlonator · on July 29, 2022

“you’ve never heard of” is a bit presumptuous. I’ve heard of it a lot, nothing but good things.

hangonhn · on July 29, 2022

I get annoyed with titles like that. "Never heard of", "Didn't know.", etc. I know people want clicks but it sometimes feel normal person to person courtesy aren't valued when the medium is the Internet.

It's especially "dangerous" in this case because the people who would even care about sorting algorithms are going to contain a subset of people who knows a lot about sorting. Those people are likely to know about Timsort. It's like going to a heart surgery conference and having a presentation about "This one heart surgery method you never knew about." Good chance at least a few people in the audience would know and may have invented it.

Beldin · on July 29, 2022

Then I have an interesting tidbit for you: its sorting wasn't 100% correct. Should be fixed now though.

http://www.envisage-project.eu/proving-android-java-and-pyth...

pizlonator · on July 29, 2022

I had heard of that! I just don’t count a bug as a bad thing. Maybe the opposite. Software that has no bugs usually achieves this feat by not having any users. Timsort has tons of users, hence tons of attention, hence someone found a bug. That’s great!

2143 · on July 29, 2022

It's well known.

Nevertheless, the article was good.

If the author is reading this, consider dropping the "you've never heard of" from the title; you don't want to assume things about your reader.

lifthrasiir · on July 29, 2022

"lesser-known" is probably more appropriate choice, given that it is not a kind of algorithm you'd heard from CS curricula.

pclmulqdq · on July 29, 2022

I've done this too with blog posts, and it works okay, although many readers don't like it. I wanted a snappier way to say "you didn't learn about in school," and I think this author did too.

scrozier · on July 29, 2022

Sometimes a little colorful language makes dry topics a bit more fun. I don't think the title was meant to be statistically analyzed. :-)

dktalks · on July 29, 2022

This blog steals content from other sites, original is at https://hackernoon.com/timsort-the-fastest-sorting-algorithm....

Unfortunately I cannot downvote.

Please don't link plagiarized content. This guy also linked to his own "Big O Notation" in his page, where he says O(n) is polynomial

>>>In our shopping list example, in the worst-case of our algorithm it prints out every item in the list sequentially. Since there are n items in the list, it takes O(n) polynomial time to complete the algorithm.

indigo945 · on July 29, 2022

This may be the same author ... Anyway f(n)=n is definitely a polynomial and O(n) is a class of algorithms that run in polynomial time (i.e. are in P).

caslon · on July 29, 2022

It's the same author; compare the URL to your link's author's last name.

thebooktocome · on July 29, 2022

Well, linear functions are still technically polynomials… haha!

rat9988 · on July 29, 2022

O(n) is polynomial and it's the same author.

TameAntelope · on July 29, 2022

There are no downvotes on submissions, FWIW.