Tango: Distributed Data Structures Over a Shared Log (2013)

mankurt · on Nov 13, 2016

Here is a nice summary of the Tango paper. http://muratbuffalo.blogspot.com/2014/09/paper-summary-tango...

andr · on Nov 13, 2016

Sounds like a low-level implementation of event sourcing/CQRS. Is there anything similar that is usable right now? Perhaps built on Kafka?

morsch · on Nov 13, 2016

We're using eventuate[0], which is an event-sourcing framework with deep support for cooperation via shared logs. It's based on the actor framework akka; akka itself has akka-persistence[1], which is similar but different[2]. All of these techs are usable right now.

Though it doesn't feature either implementation (he does something similar on top of Samza), I like this article[3] on the topic: turning the database inside out really is what we're doing.

[0] http://rbmhtechnology.github.io/eventuate/

[1] http://doc.akka.io/docs/akka/snapshot/scala/persistence.html

[2] http://krasserm.github.io/2015/05/25/akka-persistence-eventu...

[3] https://www.confluent.io/blog/turning-the-database-inside-ou...

noahdesu · on Nov 13, 2016

We have built an implementation of CORFU [1] (the protocol Tango is based on) that runs on Ceph/RADOS, called ZLog [0]. We have a very simple prototype of Tango running ZLog. ZLog could run on other storage systems like Kafka but we have only focused on Ceph/RADOS as the underlying storage.

[0]: https://github.com/noahdesu/zlog

[1]: https://www.usenix.org/conference/nsdi12/technical-sessions/...

aji · on Nov 13, 2016

Apache Samza, also from LinkedIn, is built on Kafka and I think could be used to do something like this

rvenkatesh25 · on Nov 13, 2016

"the abstraction of a replicated, in-memory data structure (such as a map or a tree) backed by a shared log"

If I read just this piece of text anywhere, the word popping up in my mind would be zookeeper

noahdesu · on Nov 13, 2016

Indeed, one of the prototype services built on Tango and evaluated in the paper was a Zookeeper clone.

jasonwatkinspdx · on Nov 14, 2016

One of the more eye opening aspects of the paper is just how little code it took them to duplicate the Zookeeper API atop Tango. Granted there are some caveats about a research project vs an industry ready codebase, but I still interpret it as strong evidence that their approach is a good foundational abstraction.

wavewash · on Nov 14, 2016

A couple of my friends have been looking at this paper and created their own visualization implementation: https://github.com/derekelkins/tangohs

GordonS · on Nov 13, 2016

Maybe add '2013' to the title?

EGreg · on Nov 13, 2016

Why need a shared log? Remember the CAP theorem. No need for these bottlenecks. If you want to store that A happened after B, just have A store a (hash of) B.

jamesblonde · on Nov 13, 2016

That's a type of logical clock you're describing (without a partial order over all events, just 2 events). Obviously, if you do that with all events, you will have a logical clock. The hash of the previous event is not a good logical clock, as you cannot define higher level operations over the values, such as - is this event 'newer' than this other event.

ajkjk · on Nov 13, 2016

I'm pretty sure literally nothing in distributing software is as simple as you're trying to make this sound.

0xbadcafebee · on Nov 14, 2016

Read the linked paper, it explains comparisons and its use cases.