CRDT: Conflict-free replicated data type

canadaduane · on Nov 6, 2019

Several interesting open source projects are working with CRDTs to make state synchronization in distributed systems an easier problem to deal with:

  - Braid HTTP (https://braid.news/)
  - Automerge (https://github.com/automerge/automerge)
  - Gun (https://gun.eco/)
  - Yjs (http://y-js.org/)
  - Noms (https://github.com/attic-labs/noms)
  - DAT (https://dat.foundation/)

Personally, I'm most excited for Braid's effort to bring state sync to HTTP through the IETF process, as well as Automerge's progress in P2P via Hypermerge (and it's star app, Pushpin). I'd also like to see Gun succeed, but have had a hard time getting started due to visually distracting quirks in its documentation.

lucidstack · on Nov 6, 2019

Horde [0] is a really interesting Elixir library for distributed, self-healing state using CRDT

[0]: https://github.com/derekkraan/horde

qohen · on Nov 7, 2019

Horde is indeed a really interesting Elixir library. It's based on the CRDT library, DeltaCrdt [0], also by Derek Kraan. He discusses DeltaCrdt here [1] and Horde here [2].

BTW, Horde is a distributed supervisor, i.e. analogous to what regular Elixir OTP supervisors do, except that it can restart processes on different cluster nodes. It also provides a distributed process registry.

Daniel Azuma of Google gave a great 39 minute talk at ElixirConf 2018, "Docker and OTP: Friends or Foes?" [3], where he shows Horde (and DeltaCrdt) in action to keep a multi-player tank video game going while the Docker container it's running in gets killed and and the Elixir processes get respawned on another container. The state is persisted across Elixir nodes in other containers in an instance of DeltaCrdt.

(Again, the talk is really good -- if the above sounds interesting, you'll want to check it out [3]).

[0] https://github.com/derekkraan/delta_crdt_ex

[1] https://moosecode.nl/blog/how_deltacrdt_can_help_write_distr...

[2] https://moosecode.nl/blog/introducing_horde

[3] https://www.youtube.com/watch?v=nLApFANtkHs

rch · on Nov 6, 2019

Akka also provides some CRDT based distributed data capabilities:

https://doc.akka.io/docs/akka/current/typed/distributed-data...

toolz · on Nov 6, 2019

https://lasp-lang.readme.io/docs is another very interesting project that makes heavy use of many different CRDTs

lewisl9029 · on Nov 7, 2019

The IPFS folks also have an excellent compilation of resources on CRDTs available here: https://github.com/ipfs/research-CRDT

api · on Nov 6, 2019

LF uses a giant all-encompassing DAG as a CRDT and implements subjective name conflict resolution by computing "views" of the DAG based on various weighting constraints: https://github.com/zerotier/lf

It's a bit like Riak or etcd but can be used across organizational and trust boundaries similar to e.g. a cryptocurrency. Anyone can run LF as part of the same network and both data privacy and security are maintained.

lichtenberger · on Nov 6, 2019

I think Automerge from Martin Kleppmann is super interesting :-)

However, as he told me a more sophisticated persistence layer is needed. Looking forward how it's emerging

fyp · on Nov 6, 2019

He gave several talks on CRDTs which are good intros:

https://www.youtube.com/watch?v=B5NULPSiOGw

https://www.youtube.com/watch?v=yCcWpzY8dIA

lichtenberger · on Nov 6, 2019

Yes, I saw them. Also his book is one of the best computer science books I've read and he gave me a really good advise for my own Open Source project, regarding the use of BookKeeper as a distributed log (instead of Kafka) :-)

brianzelip · on Nov 6, 2019

Not open source, but also Actual, a personal budgeting app made by James Long, the author of Prettier.

Cool stuff for sure.

Here’s an insightful podcast about Actual (hosted by the author of Tailwind.css), http://www.fullstackradio.com/126

rogerclark · on Nov 6, 2019

Gun is also very unstable, and all of the support projects (superpeers etc) are skeletal at best. it's not usable for production applications yet

marknadal · on Nov 6, 2019

Author here.

GUN is running in production with 8M monthly users.

On large sites like HackerNoon and with Internet Archive.

What makes you say it is unstable?

If there is anything, please report it and we'll get it fixed.

Edit: Re grandparent: Will try to add "dark mode" to docs, in meanwhile, check out the docs on plain-text GitHub wiki ( https://github.com/amark/gun/wiki/Graph-Guide ) which is where everything pull from anyways.

canadaduane · on Nov 6, 2019

Thanks for your gracious reply, Mark. Your humanity-first approach to everything, as well as the the Gun community are what make it so very promising to me.

woah · on Nov 6, 2019

Thanks for the great comment, Steve. Comments like this humble and uplift me as I think of all the things that the GUN community will accomplish together.

sagichmal · on Nov 6, 2019

[flagged]

woah · on Nov 6, 2019

Sorry, I was being irreverent about the tone of the earlier comment. I guess that sort of frivolity is not what HN is about so I may delete my sarcastic comment.

marknadal · on Nov 6, 2019

I wish my sockpuppets had 6K+ karma :O

These people aren't me. (Thanks to them for kind comments, tho!)

I have a couple throw away HN accounts but I do not use them to talk about gun.

rogerclark · on Nov 6, 2019

my problems involved superpeers being flaky with respect to storage and connections, which meant that data propagation and persistence was not reliable. this was in August when i was trying to use it for a production application. of course, this is anecdotal, and it's easy to refute something like this by just saying "works over here". so either i try it again and find specific complaints to report back... or not

gardnr · on Nov 7, 2019

I have been eyeing gun for a production app. It would be a better story if you could try it again and report back. Do you still have the code laying around?

marknadal · on Nov 7, 2019

Yes please.

I use a distributed testing tool to simulate any setup or edge cases we find, so we can fix it.

So this would be useful to replicate.

Also, a big fix was released early October.

lelf · on Nov 6, 2019

And https://riak.com

kedean · on Nov 6, 2019

Is riak still actively maintained or supported by anyone? Last I heard Basho just sort of vanished as a company a few years ago.

lelf · on Nov 6, 2019

Yes. NHS.uk, Bet365, TI Tokyo. Bet365 bought all the assets (and open sourced them).

Last release was several months ago.

bessarabov · on Nov 6, 2019

- ff — A distributed note taker and task manager — https://github.com/ff-notes/ff

Rapzid · on Nov 6, 2019

Does anyone know what CRDT/OT library the VSCODE LiveShare extension uses?

lewisl9029 · on Nov 7, 2019

Not sure about VSCode's implementation, but Atom's teletype uses this: https://github.com/atom/teletype-crdt

Cogitri · on Nov 6, 2019

Xi-editor also uses a CRDT to merge incoming editing events

AriaMinaei · on Nov 6, 2019

Xi experimented with using CRDTs to merge state changes produced by plugins running asynchronously on different threads, but they decided that it was a failed experiment.

raphlinus · on Nov 6, 2019

I think that's overstating it a bit. What I would say is that the CRDT added a lot of complexity, and we weren't able to encapsulate that complexity into an "engine" as was originally hoped.

Previous HN discussion on the mini-retrospective might be interesting: https://news.ycombinator.com/item?id=19886883

AriaMinaei · on Nov 6, 2019

Thanks for the correction! Is it more accurate to say that in Xi, CRDTs ended up leaking too much complexity for the purpose of synchronizing state between async components of the editor (like plugins) that run on the same machine, but not necessarily for the purpose of collaborative editing between users where there is significant latency?

raphlinus · on Nov 6, 2019

I think that's fair. We had an experimental collaborative editing prototype running on Fuchsia, but that was never officially part of the scope of the project. So this is all about whether technology developed for multiuser collaboration is helpful for the single user case.

A good example is inserting indentation in response to, say, the "Enter" key. In a synchronous model, this is pretty straightforward. You calculate the indentation as essentially a function of the contents of the buffer and the delta (the insert of the newline), then any other typing gets processed after that. In a CRDT model, the additional typing can be interleaved in any order, and the merge operation has to converge to the correct results. You can think about how bad that is: the user could type a hundred lines of code, complete with complex editing operations including cut and paste, deletions, undo, and then the auto-indent plugin wakes up, calculates the diff, and sends it to the core. Meanwhile, the user has typed another hundred lines, including more cut'n'pasting of the affected lines. This is perhaps unrealistic, but the CRDT model requires it. And I think it is possible to solve it (I wrote some "rope science" articles, then there was more discussion in several issues), but it's a hugely intricate puzzle compared with just applying it synchronously, and the benefit is not worth it. It's quite practical to compute indentation for any realistic source file in under a millisecond, if you've got a nice fast language.

agazso · on Nov 6, 2019

For anyone more deeply interested in this topic I recommend to read this blog post from Archagon. It describes the different alternatives (OT, CmRDT, CvRDT, diff sync) for writing a collaborative editor. And unlike academic papers it is written in a format how a programmer does research and thinks about a problem in real life, so it's very natural to follow, even if it's long and complex.

http://archagon.net/blog/2018/03/24/data-laced-with-history/

(I am not affiliated in any way, just enjoyed it very much)

dboreham · on Nov 6, 2019

Alexei's work on this subject is very good: the best practical introduction to the subject.

However, I feel that it is worthwhile (necessary?) to completely understand the academic basis for CRDT. A good place to start is Shapiro et al's second paper : https://hal.inria.fr/inria-00555588/document .

In order (sic) to understand the paper you need to have a grasp of Order Theory. This is not terribly hard to get your head around. This is a good place to start: http://jtfmumm.com/blog/2015/11/17/crdt-primer-1-defanging-o... also https://www.wikiwand.com/en/Order_theory and basically stop when you understand this : https://www.wikiwand.com/en/Lattice_(order)

The reason why the formal basis is important is: that's the whole point of CRDT -- previously (been there, got the t-shirt..) folks just made up replication mechanisms they thought would work. Then they build them and embarked on a process of fixing the bugs. Sometimes that took decades. CRDT is nothing new in terms of : software to perform eventually consistent replication. The new thing is that there's a way to formally prove that your bright idea for replication will in fact work (as in : it will converge and it will have the consistency properties you expect). So if you're not seeing that aspect, then you're really not with the program.

btw originally the C stood for Commutative or Convergent, not Conflict-Free. There are plenty of CRDTs that cope with conflicts consistently, rather than being conflict free (e.g. LWW Register).

aboodman · on Nov 6, 2019

FWIW, I disagree that it is necessary (or even desirable) to understand the academic basis first.

The core idea of CRDTs are intuitive and easy to understand from an engineering perspective. For me, the academic literature unnecessarily complicates and obscures what's going on, and I would also point newcomers to something like http://archagon.net/blog/2018/03/24/data-laced-with-history/.

It probably has to do with where you are coming from. If you're not a mathematician or theoretical computer scientist, I don't think that reading these documents is going to be a very helpful start.

dboreham · on Nov 7, 2019

I hear you because I thought the same way until recently. I have worked with eventually consistent replication since long before the CRDT papers and thought initially it was just some academic mumbo jumbo layered over intuition. However, based on my journey through the field I'd encourage you to try again to grokk the academic side of the work because it's really the key insight and (for me) it's very much not intuitive. In fact that's the tripping hazard in all this: the simple examples are intuitive which leads to a sense that everything is similarly intuitive. Not so, for me at least.

I'm neither Mathematician nor Computer Scientist (I'm an Electrical Engineer), but I have learned not to fear formal notation and advanced abstract concepts. They're really not that hard to understand once the terminology is decoded.

sophacles · on Nov 7, 2019

At one point I was strugging with the basics of CRDTs. Someone linked me to this gem: https://www.youtube.com/watch?v=pMMDVphop40

It made a lot of stuff click for me. HTH someone!

fizwhiz · on Nov 6, 2019

what t-shirt?

dboreham · on Nov 6, 2019

https://en.m.wiktionary.org/wiki/been_there,_done_that,_got_...

appwiz · on Nov 6, 2019

Most recent submission to HN about this paper: https://news.ycombinator.com/item?id=21405583

sbazerque · on Nov 6, 2019

An interesting problem is how much interface mismatch you get when you try to solve real-life problems using CRDTs.

Awhile back I implemented the "become a connection" logic for a project that has purely CRDTs datatypes, and I kind of went back and forth between composing well known CRDTs to model all the possible state, or just implementing the solution as a finite state machine whose CRDT-ability had to be reasoned for this particular purpose. Ending doing the second.

Got the intuition that this is a very nice way to think about distributed software but the right abstractions may not be in place just yet.

zzzcpan · on Nov 6, 2019

Could you elaborate on "become a connection" logic? What kind of problem is that? You may need to change user interface a bit to fit into CRDTs.

dboreham · on Nov 6, 2019

Like a LinkedIn connection, I assume.

sbazerque · on Nov 6, 2019

Yes, it was for a chat proof-of-concept app. But the end result ended up being so convoluted I thought no sane person would program like that, so I went back to the drawing board.

Look at _init() here to get a feel of it:

https://github.com/hyperhyperspace/hyperhyperspace-web/blob/...

dboreham · on Nov 11, 2019

Your code didn't look so bad to me :)

I think you may have met some of my colleagues working on similar problems at Dweb camp recently.

mbrock · on Nov 7, 2019

I'm under the impression that CRDTs are somewhat limited by the lack of a clear way to combine several structures along with application-specific logic.

In the 2012 paper "Logic and Lattices for Distributed Programming" (by Conway, Marczak, Alvaro, Hellerstein, and Maier) they write:

> CvRDTs present two main problems: (a) the programmer bears responsibility for ensuring lattice properties for their methods (commutativity, associativity, idempotence), and (b) CvRDTs only provide guarantees for individual values, not for application logic in general.

They give this example of the second point:

> A replicated, fault-tolerant courseware application assigns students into study teams. It uses two set CvRDTs: one for Students, another for Teams. The application reads a version of Students and inserts the derived element <Alice, Bob> into Teams. Concurrently, Bob is removed from Students by another application replica. The use of CvRDTs ensures that all replicas will eventually agree that Bob is absent from Students, but this is not enough: application-level state is inconsistent unless the derived values in Teams are updated consistently to reflect Bob's removal. This is outside the scope of CvRDT guarantees.

They continue:

> Taken together, the problems with [CvRDTs] present a scope dilemma: a small module (e.g. a set) makes lattice properties easy to inspect and test, but provides only simple semantic guarantees. Large CvRDTs (e.g., an eventually consistent shopping cart) provide higher-level application guarantees but require the programmer to ensure lattice properties hold for a complex module, resulting in software that is difficult to test, maintain, and trust.

This leads to a research project about monotonic logic as a distributed programming paradigm.

http://bloom-lang.net

http://boom.cs.berkeley.edu

arberavdullahu · on Nov 6, 2019

I found it useful the IPFS research repository in Github [1], there are a lot of resources there including background concepts.

[1] https://github.com/ipfs/research-CRDT

phamilton · on Nov 6, 2019

I find that we use CRDT's quite often in the form of append-only sets (formally a G-Set). Rather than choose the CRDT specifically, we've seen that an append-only set was the natural choice for a particular need and then we identified the commutative and idempotent properties of the operation. Once identified, we leverage them for scale.

One good example is processing items in an SQS queue, where (without enabling FIFO) there is no guarantee of ordering. With a G-Set you can read and merge batches of items (and retry naively) without affecting the outcome.

ericand · on Nov 7, 2019

I've been researching these recently. Some things I've found: - Macrometa is a new startup and stateful edge (distributed database) service. I just tried them this morning. User experience is very raw but it works great. You can shim it in front of existing DBs (Dynamo, Firebase, etc) for better performance. - Gun: where macrometa runs on edge servers, Gun runs on the client. Mark is a fantastic community and they are doing interesting things. I had trouble implementing it but that's because I'm lousy at javascript. - Redis uses CRDTs or something similar: https://redislabs.com/redis-enterprise/technology/active-act... - This presentation is fantastic: https://www.infoq.com/presentations/crdt-distributed-consist...

bratsche · on Nov 6, 2019

I wasn't familiar with CRDTs until Chris McCord started talking about them wrt Phoenix Channels. Really cool and interesting stuff.

robto · on Nov 6, 2019

I've been working my way through 'A comprehensive study of Convergent and Commutative Replicated Data Types'[0] and I've been finding them very approachable. I'm trying to implement them in Clojure as I go and that has been interesting. I've also starting looking at 'Mergeable persistent data structures'[1], and that one already has a repo with the code in Ocaml, but I have to admit that I don't understand that queue structure yet.

It does seem like a good way to dodge some of the tough problems that come up in distributed systems.

[0]https://hal.inria.fr/file/index/docid/555588/filename/techre... [1]http://gazagnaire.org/pub/FGM15.pdf

Karrot_Kream · on Nov 7, 2019

CRDTs are very interesting, powerful, and simple datatypes. I've experimented with a Go CRDT DAG, and it was simple to implement. The much more difficult part with a CRDT is message routing. Even with routing, managing network membership is a tough problem. There are multiple approaches, like Gun's superpeers, or Gossip protocols, but this remains a very challenging aspect of using CRDTs.

clarkmoody · on Nov 7, 2019

I recently went down this rabbit hole and sincerely hope there will be some sort of return to local-first software development, with an eye to collaboration. CRDTs post a unique opportunity to develop software that empowers users with their own data, yet retains the ability to collaborate and store it on a server.

shepardrtc · on Nov 6, 2019

I've been working with a forked version of ShareJS for a while and its worked well over the years, so OT is certainly a viable option for anyone interested in collaborative work. I would recommend anyone use ShareDB rather than ShareJS, though. The former is a cleaner, more modern version of the latter.

dustingetz · on Nov 6, 2019

Has anyone made a good sequential log CRDT yet? (It's a hard problem)

dboreham · on Nov 11, 2019

What's your definition of sequential in this context?

jively · on Nov 6, 2019

These are fascinating, I built a toy replicated K/V in go using CRDTs:

https://github.com/lonelycode/yzma

bob1029 · on Nov 7, 2019

I am trying to understand how CRDT compares to Lamport timestamps or vector clocks. Couldn't you achieve most of the same objectives using these abstractions as well?