Improving Go's 'container/list'

rawnlq · on March 19, 2015

Standard name for this type of linked list is the "intrusive linked list". There's an implementation for it in boost for c++ and here's another article about it: http://www.codeofhonor.com/blog/avoiding-game-crashes-relate...

deathanatos · on March 19, 2015

I don't know if Boost's intrustive containers are necessarily quite the same as an internal linked list. std::list<T> is internal but not intrustive; the difference between the three is something like…

external:

    class Node<T> {
        Node *next;
        T *item;
    }

internal:

    class Node<T> {
        Node *next;
        T item;
    }

intrusive:

    typedef T Node<T>;
    class <T> {
        Node *next;
        // members of T
    };

(Though I suppose you could consider intrusive a type of internal linked list, and the middle representation is then also a type of internal linked list ("non-intrusive"?).)

The intrusive one allows the linked list to avoid memory allocations, and hold objects that are not copy- or move-able. The Boost page has a good table on the differences[1].

I'm actually not sure how much I agree w/ the codeofhorror article. They present the difference between a std::list holding raw pointers and an intrusive list with the title "Avoiding Game Crashes…"; if you want to avoid crashes, std::list<T> or std::list<std::unique_ptr<T>> (or even a shared_ptr, if appropriate); Of course std::list<T*> is error prone!

[1]: http://www.boost.org/doc/libs/1_55_0/doc/html/intrusive/intr...

Dewie · on March 19, 2015

Is storing the elements in-place instead of with a pointer seriously all it takes to be intrusive? I'd have thought that would be par for the course in languages like C/++. Is there something else to it that I'm missing?

signa11 · on March 19, 2015

this article has a 28th-feb-2014 date, which suggests that the author used ver: 1.2 for measurements.

with a bunch of quite far-reaching changes f.e. runtime being re-written in go itself, fully precise gc (resulting in smaller heaps), elimination of segmented stacks, change in implementation of interface values etc. in version 1.4 of go, it would be instructive to compare how these observations /conclusions hold.

carterschonwald · on March 19, 2015

At least in my favorite problem domain, mathematical computation, reusable unboxed data structures are a huge win! That said, sometimes theres also a large use case to pointer formulations, namely preserving sharing for a complicate value like a proof!

Both are valuable tools, and having decent reusable support for both is a wonderful thing.

jlrubin · on March 19, 2015

Hmmmm. It's considered an anti-pattern to use list at all in Go, the typical style is that one should prefer using a slice with append. List is there as a relic from early go, and is around for compatibility. I guess improving the sitting code is good though.

patrickg · on March 19, 2015

Linked lists and arrays / slices are two totally different data structures with different applications: Removing objects from (anything but the end of) arrays / slices is expensive. If this is a common operation, linked list might be prefarable. This is not a language issue. At my day job, I manipulate lists a lot, this would not be faesable at all with arrays/slices.

Edit: this is a simplified, but typical example of my problem space: http://tex.stackexchange.com/a/233489/243 Traversing a list and manipulate the list entries by adding other items to the list.

jerf · on March 19, 2015

Computer architecture being what it is now, though, it has been observed recently that a lot of uses of linked lists are still faster with arrays/vectors, even if you do the "expensive" operations with them.

Point here not being "you're wrong", because you're still right in at least some cases, but that benchmarking can be called for if you're really choosing them for performance because the expense of cache misses and the way linked lists verge on the pathological case for cache misses (not quite, you do stand a chance the next element is in cache and possibly even the next element of RAM, but over time it's not a very good chance) means that arrays can win more often than you would naively think due to the fact linked lists take a rather staggeringly large constant factor nowadays, and it can take rather a lot of using their O(1) advantages to make up for it. This is totally a reachable bar, just, less than you might think.

http://www.codeproject.com/Articles/340797/Why-you-should-ne...

https://www.youtube.com/watch?v=YQs6IC-vgmo

(Note that for instance the second video computes a bit of a pathological case where you have to traverse to find where to insert. In some cases, you will for some reason or other already have that point in hand, which is one of the things that indicates linked lists may be useful; if you always have that case linked lists could still win. But the point should still be taken that vectors/arrays can be way faster than you'd think.)

Of course, if you're using a dynamic scripting language where everything's guaranteed boxed and there's no hope of keeping things local anyhow, well, you might as well take whatever makes the algorithms easier.

sebcat · on March 19, 2015

Another problem where double linked lists are generally better suited than arrays is caches with LRU eviction.

Example: https://github.com/sebcat/store/blob/master/store.go#L68-L15...

mratzloff · on March 19, 2015

Considered an anti pattern by whom?

Dewie · on March 19, 2015

Presumably by gophers.

wshuang · on March 19, 2015

Sorry, but I just wonder wouldn't this pointer to self structure prevent garbage collection?

davekeck · on March 19, 2015

Reference cycles aren't a problem for GCs because any objects that aren't reachable from a root set of nodes are considered garbage and get collected. (I only read part of the article, but hopefully that answers your question...)

lmm · on March 19, 2015

To clarify, they aren't a problem for mark/sweep GCs. Reference counting can be considered a form of GC.

wshuang · on March 19, 2015

Cool, thanks :)