Hacker News new | past | comments | ask | show | jobs | submit login
Schism: A library of CRDT implementations of Clojure’s core data types (github.com/aredington)
112 points by tosh on Aug 17, 2018 | hide | past | favorite | 10 comments



Once you get past all the complicated math (honestly being into Haskell has helped with this), CRDTs are actually very easy to understand, at least their semantics are.

Here's are some decent primer talks:

John Mumm - A CRDT Primer: Defanging Order Theory -> https://www.youtube.com/watch?v=OOlnp2bZVRs

"CRDTs Illustrated" by Arnout Engelen -> https://www.youtube.com/watch?v=9xFfOhasiOE

If you want to see it in action:

"Practical data synchronization with CRDTs" by Dmitry Ivanov -> https://www.youtube.com/watch?v=veeWamWy8dk

Service Discovery using CRDTs by Mushtaq Ahmed and Umesh Joshi at FnConf17 -> https://www.youtube.com/watch?v=RKSbiFDb3dU

There's also a talk with a guy sitting in a field which is the best I think I've ever seen, but I can't find it... but I can't find it in my youtube history :(

As people reason more and more about distributed systems I think CRDTs will become (if they're not already) almost required knowledge, like paxos/raft.


I was fortunate enough to work at Basho, where CRDTs were part of nearly every technical discussion about how to progress Riak.

One of my former co-workers, Chris Meiklejohn, continues to push forward with CRDTs via Lasp[0]. He has a good list of reading materials[1] that it doesn't look like he's maintained recently but should still be useful.

Guess I'll plug my meta-list of distributed systems reading lists[3] (definitely currently unmaintained) while I'm here.

[0]: https://github.com/lasp-lang/lasp

[1]: http://christophermeiklejohn.com/distributed/systems/2013/07...

[2]: https://gist.github.com/macintux/6227368


I am the author of the CT vector CRDT, also the author of the Replicated Object Notation http://github.com/gritzko/ron.

I tried to look into the code, and I am not sure which way they implemented list and vector. The explanation is a bit confusing, especially the mention of "insertion index".

  Definition and support for Schism's Convergent List type. The 
  convergent list is a simple timestamped log of entries with a vector
  clock. Convergence places entries into the resultant list In
  insertion order. The vector clock conveys that an item has been
  removed from the list on another node."

  Convergence places entries into the
  resultant vector in insertion order, with insertions occurring by
  replaying insertions operations in order.
CRDT is all about partial orders, so what does it all mean? Orders are different for different replicas.

(Disclaimer. I am not a Clojurist.)


Yeah, I couldn't follow the descriptions either.

It looks like "list" only supports appending (conj), removing the first item (rest), and removing all items (empty).

The data consists of a list of items, a "clock" (map of nodeId to Date object), and a list of (nodeId,Date) corresponding to the values.

I think they're using the clock to determine whether to drop deleted items on merge. Code for the operations is [list.cljc:153-169](https://github.com/aredington/schism/blob/master/src/schism/...). The merge code is "synchronize" a little further down (I didn't read through it).

The "vector" object supports append (conj), pop, clearing the list (empty), and setting a value at a position (assoc). It does _not_ support insert/delete. It is structured similarly to "list" except using vectors instead of lists for the values and the timestamps. Code is at [vector.cljc:222-244](https://github.com/aredington/schism/blob/master/src/schism/...)


A thorough explanation of its semantics would be good. A set for which union is defined as the empty set is also a CRDT, but a pretty useless one. And the explanation sounds a bit like, it converges by loosing data consistently.


A thorough explanation for a 0.1 project seems a high bar. I'd like to see an example or two where it loses data in the manner he speaks of, though.


Turning “CRDT” into a link would be great! Wasn’t familiar with that acronym.



Also, https://www.youtube.com/watch?v=9xFfOhasiOE is a pretty good conference talk that offers a high-level overview of CRDTs.


Wow small world! The speaker (Arnout Engelen) is also the author of the 'nethogs' tool, which I've used to diagnose networking issues.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: