Schism: A library of CRDT implementations of Clojure’s core data types

hardwaresofton · on Aug 17, 2018

Once you get past all the complicated math (honestly being into Haskell has helped with this), CRDTs are actually very easy to understand, at least their semantics are.

Here's are some decent primer talks:

John Mumm - A CRDT Primer: Defanging Order Theory -> https://www.youtube.com/watch?v=OOlnp2bZVRs

"CRDTs Illustrated" by Arnout Engelen -> https://www.youtube.com/watch?v=9xFfOhasiOE

If you want to see it in action:

"Practical data synchronization with CRDTs" by Dmitry Ivanov -> https://www.youtube.com/watch?v=veeWamWy8dk

Service Discovery using CRDTs by Mushtaq Ahmed and Umesh Joshi at FnConf17 -> https://www.youtube.com/watch?v=RKSbiFDb3dU

There's also a talk with a guy sitting in a field which is the best I think I've ever seen, but I can't find it... but I can't find it in my youtube history :(

As people reason more and more about distributed systems I think CRDTs will become (if they're not already) almost required knowledge, like paxos/raft.

macintux · on Aug 17, 2018

I was fortunate enough to work at Basho, where CRDTs were part of nearly every technical discussion about how to progress Riak.

One of my former co-workers, Chris Meiklejohn, continues to push forward with CRDTs via Lasp[0]. He has a good list of reading materials[1] that it doesn't look like he's maintained recently but should still be useful.

Guess I'll plug my meta-list of distributed systems reading lists[3] (definitely currently unmaintained) while I'm here.

[0]: https://github.com/lasp-lang/lasp

[1]: http://christophermeiklejohn.com/distributed/systems/2013/07...

[2]: https://gist.github.com/macintux/6227368

gritzko · on Aug 17, 2018

I am the author of the CT vector CRDT, also the author of the Replicated Object Notation http://github.com/gritzko/ron.

I tried to look into the code, and I am not sure which way they implemented list and vector. The explanation is a bit confusing, especially the mention of "insertion index".

  Definition and support for Schism's Convergent List type. The 
  convergent list is a simple timestamped log of entries with a vector
  clock. Convergence places entries into the resultant list In
  insertion order. The vector clock conveys that an item has been
  removed from the list on another node."

  Convergence places entries into the
  resultant vector in insertion order, with insertions occurring by
  replaying insertions operations in order.

CRDT is all about partial orders, so what does it all mean? Orders are different for different replicas.

(Disclaimer. I am not a Clojurist.)

dunham · on Aug 17, 2018

Yeah, I couldn't follow the descriptions either.

It looks like "list" only supports appending (conj), removing the first item (rest), and removing all items (empty).

The data consists of a list of items, a "clock" (map of nodeId to Date object), and a list of (nodeId,Date) corresponding to the values.

I think they're using the clock to determine whether to drop deleted items on merge. Code for the operations is [list.cljc:153-169](https://github.com/aredington/schism/blob/master/src/schism/...). The merge code is "synchronize" a little further down (I didn't read through it).

The "vector" object supports append (conj), pop, clearing the list (empty), and setting a value at a position (assoc). It does _not_ support insert/delete. It is structured similarly to "list" except using vectors instead of lists for the values and the timestamps. Code is at [vector.cljc:222-244](https://github.com/aredington/schism/blob/master/src/schism/...)

j-pb · on Aug 17, 2018

A thorough explanation of its semantics would be good. A set for which union is defined as the empty set is also a CRDT, but a pretty useless one. And the explanation sounds a bit like, it converges by loosing data consistently.

jwhitlark · on Aug 17, 2018

A thorough explanation for a 0.1 project seems a high bar. I'd like to see an example or two where it loses data in the manner he speaks of, though.

cellularmitosis · on Aug 17, 2018

Turning “CRDT” into a link would be great! Wasn’t familiar with that acronym.

qubex · on Aug 17, 2018

Conflict-free Replicated Data Type: https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...

jxub · on Aug 17, 2018

Also, https://www.youtube.com/watch?v=9xFfOhasiOE is a pretty good conference talk that offers a high-level overview of CRDTs.

cbcoutinho · on Aug 17, 2018

Wow small world! The speaker (Arnout Engelen) is also the author of the 'nethogs' tool, which I've used to diagnose networking issues.