Hacker Newsnew | past | comments | ask | show | jobs | submit | starfreakclone's commentslogin

Thank you for the pointer! I was actually not aware of this paper at all, which is why it was not included here (not sure how I missed it). I'll be sure to push a revision to the blog which mentions this paper.


You may also want to take a look at “Deletion: The curse of the red-black tree” (Germane and Might 2014), which presents functional deletion for red-black trees.

https://matt.might.net/papers/germane2014deletion.pdf


I might have a useful suggestion. I've implemented a Piece Tree (VS Code style) and a Rope in a functional language (found the Rope much faster at everything but I made a few adjustments to the structure so it's not a straightforward Rope).

The major failing point of the Piece Tree from my benchmarks is the substring/querying time. An idea I want to try out to speed up my Piece Tree implementation is to distinguish between Concat (metadata) and Leaf (string) nodes, just as a Rope does, storing metadata in the internal nodes and Pieces at the leaves.

The reason (I hope) that will improve substring times is because, in a Piece Tree, the original string can be reconstructed through an in-order traversal of the tree's internal nodes.

So, if you specify a substring range that starts from one character before the root node and ends one character after the root node, you end up traversing to the rightmost node in the left subtree and the leftmost node in the right subtree (two O(log n) operations).

I'm hopeful the tree depth that would need to be traversed if the nodes were at the Leaves (like in a Rope) would be shorter (especially since adjacent pieces won't be O(log n) distance away) and want to try it out myself, but my intuition might be wrong. You can have a go trying that out yourself if the idea interests you.


Indeed, you could even use a piece tree just like in the blog but you store additional information in each piece which tell the renderer how to layout the associated text. My understanding is that most rich text editors represent the text as a node-based tree anyway.


Some early rich text editors never used a tree representation for formatting represented the formatting as "control character sequences". This was part of why some people loved WordPerfect so much because you could toggle a view of all the control characters and just delete/copy/paste them like any other text in the same document.

It's basically how the classic RTF format [1] works, and things like VT100/ANSI escape codes in terminals. It's kind of like the difference between imperative code and declarative code: "this character sequence means toggle the state of bold" versus "this node of characters is bold".

[1] https://docs.fileformat.com/word-processing/rtf/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: