Left-leaning red-black trees are hard to implement (2008)

userbinator · on Dec 1, 2014

The link to the implementation is 404 and I can't find it in that directory either. A quick search of LLRB trees suggests that its only advantage is supposedly "easier to implement" but that doesn't appear to be the case in practice. In fact one of the links I found is http://www.read.seas.harvard.edu/~kohler/notes/llrb.html

It almost feels like LLRB trees are a bit of a scam.

kcl · on Dec 1, 2014

This is a valuable article that I read thoroughly when I was re-reviewing search trees. Nobody wants to spend 60 hours learning a new tree only to have it fail to live up to its promise. I'll summarize what I know (or think I know) about trees here, since it may be of use to anyone who had enough of an interest to check out the above article.

Sedgewick claims that LLRB trees are simpler, but this is hotly contested. Most of the sources I read said that LLRB trees were more complicated, often a lot more, than regular RB trees. All of the timings I found suggested that LLRB trees were slower than RB trees. Looking at the code, and at the detailed timings, I tend to agree that LLRB trees are slower and more complicated. If there is an explanation/implementation which realizes Sedgewick's promise of simplicity I have not seen it, and it isn't in the published LLRB material.

If you need a tree that is slower and simpler than an RB tree, with a similar implementation, you could look at AA trees instead.

My opinion however is that, in terms of practical applications, there is no reason to learn RB trees today instead of AVL trees. If you do research in data structures, RB trees may be worth knowing. I think the prevalence of RB trees is due to several factors:

1. RB trees are featured in the common Cormen algorithms book in lieu of AVL trees (a mistake). RB trees are also featured in the Sedgewick book.

2. The Cormen book shows how to implement delete for RB trees. Knuth's AVL description, and indeed most AVL descriptions, omit the delete method, which is a serious oversight. This has caused many tree implementations to avoid deletes or avoid AVL altogether. You can verify this via Google or GitHub. While Knuth initially had good reason, ultimately omitting delete was a mistake.

3. Plenty of theoretical conjecture predicts RB trees are faster. Actual benchmarks show that AVL trees perform better, or at worst, about the same. (If anyone has data that shows scenarios where this is not the case, I would be interested to see it.)

This is an unfortunate state of affairs, as the AVL tree appears to be a strictly better tree than the RB tree. The height is bounded at about 1.5x optimal instead of 2x optimal. You could argue the AVL tree is roughly as difficult to understand and code, though I would say it is simpler and more natural. The AVL tree performs generally better in actual benchmarks, and substantially better in certain important cases (sorted inserts).

nz · on Dec 1, 2014

If you're interested in this stuff, you might want to check out the draft paper linked below. It benchmarks performance and memory efficiency of various logarithmic time sorting data structures. It's a work in progress, and feedback is always welcome :)

http://nickziv.files.wordpress.com/2014/02/vis_ds.pdf

adrianN · on Dec 1, 2014

This seems to confirm my hunch that there is no reason to use anything but b-trees on modern computers, at least for single threaded code. Storing more values in the nodes and increasing the branching factor is better because you do more of your work inside the cache and avoid random memory access.

I also find b-trees much simpler to implement than either RB or AVL trees.

kcl · on Dec 1, 2014

I stop short of recommending B-trees as the only tree structure you need to know. I think if it matters at all what kind of tree you're using, you're deep enough in the guts that it pays to know the differences between B-trees and basic trees.

To me, this is especially relevant today, since if you need a "B-tree", you probably need a standard relational database (MySQL/PostgreSQL/SQLite), and if that doesn't work for you, you'd be advised to know what the tradeoffs are with a regular tree. There are scenarios where B-trees are suboptimal.

Cache-obliviousness, which is still new, is also changing how we look at trees, since in one sense a B-tree is a hack on the basic tree structure to exploit locality. Representation matters more today as well. This is a longer conversation. Suffice to say I think this is an area of active and useful research, and wouldn't rule out either class of tree.

nz · on Dec 1, 2014

Well if you're storing integers or other other small (8-byte) elements, then sure. But if you are storing pointers to objects, then you'll have the overhead of a load (in the comparison function). If you are embedding the objects, then the larger it is the more "copy-overhead" you will have as you move these large elements around when re-balancing and so on. But I don't have the numbers for this, so it might be insignificant overhead. I have it on my todo-list to benchmark embedded-larger-elements and pointers to those elements. Anyway, just something to consider.

userbinator · on Dec 1, 2014

Plenty of theoretical conjecture predicts RB trees are faster. Actual benchmarks show that AVL trees perform better, or at worst, about the same.

That's really odd - there's probably some assumption that isn't true. It reminds me of the saying, "In theory, there's no difference between theory and practice. In practice, there is."

anyfoo · on Dec 1, 2014

Here's a functional discussion and implementation of (primarily) deletion that I once did as a project thesis: http://www.reinference.net/llrb-delete-julien-oster.pdf

The reason I mention that: "There already is an algorithm for deletion available, which Sedgewick [2008] presented alongside his introduction to Left-Leaning Red-Black trees. This algorithm, however, while being pretty concise, follows a very imperative approach, containing a lot of nested if-else-blocks and breaking several invariants during its run by design, later fixing everything up using several auxiliary functions. [...]

Instead, we follow an approach which is closer to what Okasaki [1999] did with his seminal, purely functional implementation of insertion in Red-Black trees. That is, our algorithm relies purely on pattern matching and simple recursive decent. Moreover, of the three functions it consists of, the two which implement the algorithm’s bulk are completely self-contained, i.e. do not rely on auxiliary functions."

taeric · on Dec 1, 2014

I didn't see it in the linked paper, but how do the runtimes compare? (Did I just miss this?)

gregfjohnson · on Dec 1, 2014

It can be difficult to develop good intuition for the ideas behind the insert and delete algorithms for even standard red black trees. Here is an interactive web site that presents the essence of these algorithms in what I hope is "as simple as possible but no simpler" pseudo-code, and supports interactive play. (Insert, delete, single-step forward and backward through the algorithms, etc.) I would be delighted to hear if people find that this site makes the algorithms more intuitive, and any thoughts or suggestions for improvment.

http://www.gregfjohnson.com/redblackbuilder.html

Cieplak · on Dec 1, 2014

After reading the following implementation in python, I must agree:

https://github.com/peterhil/leftrb/blob/master/leftrb/llrb.p...

seri · on Dec 1, 2014

I may as well throw in my implementation in Go here: https://github.com/seri/goalgo/blob/master/st/llrb.go

LLRB definitely isn't as sweet as advertised, but saying it's a scam is a bit too much.

duckingtest · on Dec 1, 2014

Author's argument against using C++ isn't true. New/delete can be easily redefined to use malloc/free [0]. You can compile (a bit limited) C++ code without any special C++ dependencies.

[0] http://ptspts.blogspot.com/2010/12/how-to-write-c-program-wi...

dsp1234 · on Dec 1, 2014

The author's only argument against C++ is located in the article comments and is "Regarding the use of C++, that's a non-starter; malloc goes in libc!". As the author was replacing the malloc[0][1] implementation, that seems a reasonable argument.

[0] - http://www.freebsd.org/cgi/man.cgi?query=jemalloc&sektion=3

[1] - http://www.canonware.com/jemalloc/