To the extent that CRDTs are conflict-free it only means that two edits can commute (be applied in any order) and every party that has applied the same set of edits will produce the same result.
At the low-level that the CRDT is operating on there will be no conflicts.
But that does not mean that the user never perceives there to be conflicts. No matter what the consensus system used be it CRDTs or OT at some point the converge operation has to impose a total ordering to pick a "winner" in the case of conflicting user edits.
If editing a block of text and two users try to replace the same word with another word there are a number of possible outcomes
1) One of the edits "wins"
2) The word is replaced by the concatenation of each user's replacement.
3) Nobody wins and the edit is reverted.
In all cases at least one party perceives to themselves to have "lost". But it generally doesn't matter because humans doing the editing will make repairs to nonsensical edits in real time.
There has to be a tie-breaker when multiple users try to make different edits to the same region of text. At least that's my current understanding.
There are several CRDT algorithms (LSEQ, Logoot, WOOT, Treedoc) that try to properly solve the merging solution (your outcome #2) while still retaining the intent of the edits of each user. Their implementations differ, but the general idea is that each character (or chunk of characters) is assigned a key that can be ordered. When new text is added, it's given a key that is derived from the key of some adjacent text. As a result, the merge is the best-possible approximation of the intent of the edits.
Yes, I think we are in agreement. The algorithm can use heuristics to produce a best-effort pleasing result that captures user intent but if two users want opposite things to happen it has to fall back to a tie breaker.
In an extreme example, if the CRDT state-space was 1-bit and user A wants to make it a 0 and user B wants to make it a 1 a choice must be made by the algorithm.
Right, exactly. The key question is whether the algorithm's resolution heuristic takes into account the intent of the edit, and I'm not sure the algorithm in the article makes an effort to do that. (Not saying it doesn't, I just don't understand how it does, if it does.)
It's main consideration appears to be to make changes in a way that makes sense for a tree consisting of maps and lists without ever losing user input. If you look at their diagram with examples you get the gist of their resulting heuristics, which basically boils down to "only delete if a user explicitly deleted, otherwise keep both values if users try to set the same value to two different things", "if two complex values (maps,list) have been updated, recursively merge them", pretty much.
It will cause stuff that users will perceive as conflicts, and that may even appear to be totally illogical (one of their examples leaves an object that appears to be in a broken state, because one side deleted it, and the other side updated a single attribute, leading the map to continue to exist, but with only the one updated attribute) so there's probably room for improvement, though the rules are simple enough that many of these could be resolved at application level by just carefully deciding what operation to provide.
There is no "extent" to which CRDTs are conflict free, CRDTs are by definition always 100% conflict free.
"No matter what the consensus system used be it CRDTs or OT"
CRDTs and OT are worlds apart. CRDTs are guaranteed to be conflict free, whereas OT is generally too complicated and unproven to offer that guarantee at all.
"If editing a block of text and two users try to replace the same word with another word there are a number of possible outcomes"
There are actually many more possible outcomes than those you listed, so that with CRDTs designed explicitly for collaborative string editing you can provide perfect intent-preserving merges to the user. There is an excellent paper on preserving intent, see "Replicated abstract data types: Building blocks for collaborative applications" (http://dl.acm.org/citation.cfm?id=1931272")
> whereas OT is generally too complicated and unproven to offer that guarantee at all.
Citation needed.
I've built several production-level OT-based systems on top of ShareJS's JSON OT[1] code. The set of operations supported is guaranteed to be conflict-free and correct. We don't have AGDA proofs but we've used fuzzers to ferret out correctness bugs and its been about 2 years since a bug was found in the transform code. All in all, I'm very happy with the implementation.
Meanwhile, I don't believe a more generic JSON OT / CRDT system can be made conflict-free. (Well it can be conflict-free, but you'll lose data if it is). If you support arbitrary tree-level moves, you have the User A moves x into y's children, user B moves y into x's children problem. There are simply no good ways to resolve these operations without user intervention, or a lot more knowledge of the data structures at play.
"Due to the need to consider complicated case coverage, formal proofs are very complicated and error-prone, even for OT algorithms that only treat two characterwise primitives (insert and delete)" - Du Li & Rui Li - "An Admissibility-Based Operational Transformation Framework for Collaborative Editing Systems"
There's also an interesting comment there from the author of ShareJS, I think that might be you?
The other critical difference between CRDTs and OT is that CRDTs work offline, in a distributed setting, whereas OT cannot. OT requires a central online server to coordinate, which as far as I understand is the cause of the classic UI freeze in Google Docs whenever the network goes.
Yes thats me! For what its worth, I no longer believe that wave would take 2 years to implement now - mostly because of advances in web frameworks and web browsers. When wave was written we didn't have websockets, IE9 was quite a new browser. We've come a really long way in the last few years.
Working offline and working in a distributed setting are two different properties.
OT can work very well offline - the client and server just buffer ops and you do reconciliation. Some OT algorithms experience O(n*m) performance when the client comes back online (n client ops, m server ops), though the constant is quite low, you can make the client do the work. But you can do better - a better designed text reconciliation system can do O(nlog(n) + mlog(m)) if I remember correctly - which is quite usable in real systems and very competitive with CRDTs.)
P2P is much harder. I've got a sketch for a system that would use OT in a P2P setting, but in many ways its a poor man's CRDT. That said, the downside of CRDTs is that they grow unbounded as you make more modifications. Which would work great for documents that are written then discarded (like a HN thread). But it would work much less well for long lived, frequently written documents (like a server status dashboard). You can fix that using fancy coordination algorithms, but if you're going down that path you're back to sketching out an overcomplicated OT system.
There's lots of fun space to explore there - but not enough actual useful systems being made for production use.
Designer & author of the JSON OT algorithm that underlies ShareJS/ShareDB here. I do not have a formal proof for any of the properties of the algorithm, but we have extensively fuzzed it and are confident in its correctness. I believe a proof of correctness is possible, merely tedious.
> There is no "extent" to which CRDTs are conflict free, CRDTs are by definition always 100% conflict free.
CRDTs are always conflict-free. I am not disagreeing with that. I was trying to draw attention to the fact that just because the CRDT always commutes does not mean that a user does not perceive there to have been a conflict - out of a multiple contradictory edits the commute operation will ensure only one succeeds. So yes, the CRDT itself is conflict free, but it wasn't my point.
They aren't magic. The deal with tie breakers when they have to. I've seen posts where suggestions of building a CRDT-based source control system are being bandied around with claims that there will be no conflicts - sure, there won't be. Will the code make sense or even compile - probably not.
A conflict to a CRDT is one thing and a conflict to a user is something else entirely.
> so that with CRDTs designed explicitly for collaborative string editing you can provide perfect intent-preserving merges to the user.
I don't for a second believe this is event theoretically possible, given that if two people write collaboratively, it is not uncommon for the intent to change "mid stream" as a result of seeing the other person edit.
At the low-level that the CRDT is operating on there will be no conflicts.
But that does not mean that the user never perceives there to be conflicts. No matter what the consensus system used be it CRDTs or OT at some point the converge operation has to impose a total ordering to pick a "winner" in the case of conflicting user edits.
If editing a block of text and two users try to replace the same word with another word there are a number of possible outcomes
1) One of the edits "wins" 2) The word is replaced by the concatenation of each user's replacement. 3) Nobody wins and the edit is reverted.
In all cases at least one party perceives to themselves to have "lost". But it generally doesn't matter because humans doing the editing will make repairs to nonsensical edits in real time.
There has to be a tie-breaker when multiple users try to make different edits to the same region of text. At least that's my current understanding.