Hacker News new | past | comments | ask | show | jobs | submit login
Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays [pdf] (2002) (trout.me.uk)
76 points by tosh on Nov 7, 2022 | hide | past | favorite | 10 comments



Phil Bagwells work is amazing. Especially the Hash Array Mapped Trie (HAMT) for me was an eye opener, as we had implemented a simple arrays based trie more or less for 64bit integers before and I always wondered how to compress the null pointers to children, which are non-existent. Nowadays I'm using ideas from the HAMT and the Adaptive Radix Trie (ART) to compress the trie nodes. Cool thing of course is also that each previous version is preserved through structure sharing and copy-on-write plus path copying (functional data structure, again).


> cdr is accomplished by simply decrementing the offset part of the pointer

Ah, the (c)ontent of the (d)ecrement part of the (r)egister which holds this value. :)

> Assume that a pointer references the head of a character VList and a new character element must be added. The sub-block base is recovered from the pointer by masking out the lower 4 bits. The data type size is recovered from the Element Type field, in this case a one byte character. The offset from the base to the pointer is calculated if this is less than 11 then the list pointer is incremented by one and the new element stored at the referenced location.

This is only possible if the pointer is not shared with anything else in the program.

After you've done the above operation, you still have the same pointer that references the head of the original VList, and the incremented pointer that references the extended list. If you hand something the original unextended VList pointer, and it tries to cons something onto the front using the same algorithm, it will overwrite that item.

You need some bits in there to indicate when a copy-on-write needs to be done to to avoid surprising effects through sharing.


I believe the paper handles this case with the LastUsed offset in the implementation of cons.

> During the consing of (9) the pointer offset is compared with the last used offset, LastUsed. If it is the same and less than the block size then it is simply incremented, the new entry made and LastUsed updated. This would have occurred as the integers (6), (7), (8) were added. If on the other-hand the pointer offset is less than the LastUsed a cons is being applied to the tail of a longer list, as is the case with the (9). In this case a new list block must be allocated and its Base-Offset pointer set to the tail contained in the original list.

If I'm understanding correctly, cells are only mutated in place if they are not the tails of other lists (meaning they aren't shared). If the cell is already the tail of another list, the consing is accomplished by creating a new cell to avoid modifying the other list.


If you're interested in this kind of thing I would heartily recommend Chris Okasaki's "Purely Functional Data Structures" https://www.cambridge.org/core/books/purely-functional-data-...


The very rough idea is that vlists are lists of arrays whose size decreases exponentially towards the tail of the list. So "hello, world" could be stored as

    "hello" -> ", wo" -> "rl" -> "d"
with space for 3 more characters to be consed in front of "hello".


Does anyone have experience with this vList data structure in real life applications? How well does it cache? Any production software that implemented this?

It's intriguing but wonder why it never gained traction.


It's slower than arrays, it's like a normal deque with some additional overhead. Array access is still pointer chasing, and potentially out of cache.


I need a refresher on DS&A


This needs [2002]


Thanks! Year added by a volunteer year editor. If anyone else would like to be a volunteer year editor, let us know at hn@ycombinator.com. (and sorry if I don't reply right away - I will!)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: