Surprised to see no discussion of other data structures like dicts/maps, or arrays of arbitrary type. Hopefully they'd be a straightforward extension. IME, apps need collaborative data structures more often than they need pure collaborative text editing.
The motivating examples (update validation, partial loading, higher-level operations) are interesting, but I don't see a strong case that the reason Yjs etc. lack these features is the underlying CRDT implementation, as opposed to these features being intrinsically difficult to build.
> Surprised to see no discussion of other data structures like dicts/maps, or arrays of arbitrary type. Hopefully they'd be a straightforward extension. IME, apps need collaborative data structures more often than they need pure collaborative text editing.
Totally agree. I guess an array of "atomic" objects, where the properties of the objects can't be changed can be done just by replacing the string with your own type. Changes inside of the object is probably trickier, but maybe it's just about efficiently storing/traversing the tree?
I've also always thoguth it should be possible to create something where the consumer of the helper library (per OP terminology) can hook in their own lightweight "semantic model" logic, to prevent/manage invalid states. A todo item can't both have isDone: true and state: inProgress at the same time. Similar to rich text formatting semantics mentioned in the linked article.
CRDTs essentially work by deterministically picking one side when a conflict arises. The issue is that in general this does not guarantee the lack of data loss nor data being valid (you can resolve the conflict between two pieces of valid data and get invalid data as a result).
Imagine if every git merge conflict you got was resolved automatically by picking one side. Most of the time it would do the wrong thing, sometimes even leading to code that fails to compile. Imagine then you were not there ready to fix the issue, it would lead to even more chaotic results!
This is why CRDTs are not more widespread, because they only fix the problem you think you have, not the problem you actually have, which is to fix conflicts in a way that preserves data, its validity and meaning.
And arguably they make this issue even worse because they restrict the ways you can solve these conflicts to only those that can be replicated deterministically.
> This is why CRDTs are not more widespread, because they only fix the problem you think you have, not the problem you actually have, which is to fix conflicts in a way that preserves data, its validity and meaning.
I’ve been saying this for years, but there’s no reason you couldn’t make a crdt which emitted conflict ranges like git does. CRDTs have strictly more information than git when merging branches. It should be pretty easy to make a crdt which has a “merge and emit conflicts” mode for merging branches. It’s just nobody has implemented it yet.
(At this point I’ve been saying this for about 5 years. Maybe I need to finally code this up if only to demonstrate it)
Because automatic merging isn’t always perfect. Especially when merging long lived changes to code in git, sometimes you need manual intervention to figure out the result. And we need to manually intervene sometimes.
The motivating examples (update validation, partial loading, higher-level operations) are interesting, but I don't see a strong case that the reason Yjs etc. lack these features is the underlying CRDT implementation, as opposed to these features being intrinsically difficult to build.