Distributed systems for fun and profit (free ebook)

0xbadcafebee · on Jan 2, 2014

You can get more practical information by reading the wikipedia page on distributed computing. There's many different models of distributed systems that this ebook doesn't touch on. For example, several of the systems i'm familiar with don't require consensus, consistency or time ordering (or they're resolved based on certain assumptions about the model, data, application, etc)

mixu · on Jan 3, 2014

Any chance you could be more specific as to what you feel is missing in the book?

Granted, "distributed systems" is a enormous topic that no book can cover fully, but I have tried to cover things like:

- key papers (Lamport; Fischer, Lynch and Patterson; Chandra and Toueg etc.)

- topics relevant to highly successful commercial systems (e.g. 2PC => *SQL systems, Paxos => GFS/Chubby, ZAB => Zookeeper, Dynamo => Riak/Voldemort/Cassandra)

- and recent topics such as CRDTs and the CALM theorem.

Having a sense of how time, consistency and fault tolerance have been explained and handled is (I think) a prerequisite to more advanced topics, but I'd be interested in hearing what parts you'd feel need improvement because some day (~ some years from now) - I will revise the book and it would be nice to have a solid list of issues to revise.

chubot · on Jan 3, 2014

I'm not sure exactly what the grandparent comment meant, but I think I have an idea. I only skimmed the contents so take this with a grain of salt.

Your book is focusing on a pretty narrow part of distributed computing. I would rename it "Managing State in Distributed Systems", or "Distributed Storage Systems". Your examples are Bigtable and Dynamo, which fall in this category.

The book seems to be aimed at sort of a "beginning" audience. But the topics are inappropriate for a beginning audience, and skewed for an expert audience.

Real distributed systems try to be stateless wherever possible. You need "big computer science" to manage state in distributed systems, but most code in a distributed system should not manage state. These techniques should be confined to specialized storage systems.

Here are some examples of real world distributed systems that don't use the described techniques to manage state:

  - clusters of stateless web servers + single master database (99%+ of websites people use)
  - message queue / work queue.  A single machine can productively manage 1,000 - 10,000 stateless workers, depending on the workload.
  - MapReduce
  - Original GFS
  - Napster
  - BitTorrent (tracker and trackerless would be interesting to write about)
  - BitCoin

The title seems to imply a practical bent, but it seems more like a collection of ideas (which are important and interesting, but not really what engineers need to know. IMO the #1 skill for distributed computing is to be competent at BOTH programming a single computer and at system administration).

If I wanted to be harsh, I would say it looks like you read a bunch of stuff and didn't work with it or implement it? At the very least, the ideas don't seem to be put in the context of commonly deployed distributed systems.

People need to understand these simpler, more robust, and more performant techniques, and how to apply them to their specific problem domain, rather than blindly throwing consensus at every problem (which is a disturbing trend I've seen).

throwawaykf03 · on Jan 3, 2014

It goes even beyond that. A lot of other very important, fundamental topics belong under the umbrella of distributed systems, starting with routing. The Internet is, after all, a giant distributed routing system.

Another topic that's huge all by itself is peer-to-peer networks, and all their associated aspects, such as structured (DHTs like Chord, Cassandra, etc.) vs unstructured (Gnutella, Kazaa, etc.), P2P search, handling churn, handling peers with heterogenous capabilities, peer selection, topology organization, decentralized routing, file-sharing (torrents) vs streaming (PPLive, Spotify), etc.

Other topics (with several overlapping aspects) include:

- Security, such as Sybil attacks, group key management, etc;

- Overlay networks;

- CDNs;

- Ad hoc and mesh networks;

- MMOs and multiplayer games;

- SCADA and industrial control systems;

- Pub/Sub systems and application layer multicast;

- Distributed file systems;

- Load balancing and bandwidth management;

And that's just off the top of my head... I'm sure I'm missing other important topics.

strlen · on Jan 3, 2014

> It goes even beyond that. A lot of other very important, fundamental topics belong under the umbrella of distributed systems, starting with routing. The Internet is, after all, a giant distributed routing system.

Yes! DNS is also fascinating as far as distributed databases and consistency go:

http://pages.cs.wisc.edu/~akella/CS740/S08/740-Papers/MD88.p...

mixu · on Jan 3, 2014

Indeed, but ultimately covering all of those topics would require an incredible amount of time and effort. So I need to pick and choose my battles as some topics are more important or interesting to me than others. :)

throwawaykf03 · on Jan 3, 2014

Completely understand. But as chubot suggests, the topic of "Distributed Systems" is really broad and something narrower in the title, such as "Distributed Data Systems" may be more apt.

mixu · on Jan 3, 2014

> The title seems to imply a practical bent, but it seems more like a collection of ideas (which are important and interesting, but not really what engineers need to know. IMO the #1 skill for distributed computing is to be competent at BOTH programming a single computer and at system administration).

I think this is something where different authors will emphasize different aspects. My view is that understanding of how to deal with the evolution of state within a system is crucial. Even systems that are not databases per se still have a dependency on how state is managed because you want to be able to reason about how some specific answer to a computation was derived and what guarantees it comes with (from strong consistency to some alternative but hopefully precise definition). I figure there will be disagreement on whether this important, and that's fine. There are other books.

That does bring up an interesting question: which books on distributed systems do you feel exhibit your preferred approach (free or paid)?

Re: the suggested topics:

Clusters of stateless web servers + single master. This is definitely a common setup, but you need very little if any distributed systems research to implement it.

Queues: I find the larger scale implications of queuing to be rather interesting (specifically, how cascading failures can be caused by an inadequate understanding of interactions between queues) but haven't found a good discussion beyond Google's findings that doing duplicate work often pays off as reduced 95th percentile latency.

MapReduce: There are many good books covering this topic in much more depth and specificity, so I didn't feel like I had that much to add. MR does use the techniques described - beyond job assignment the whole system rests on the DFS which uses block-level replication and some coordination protocol to maintain metadata state.

I kind of assume people have had some exposure to the paradigm at this point and do address MapReduce a bit in the context of the CALM theorem, which notes that a much larger set of relational algebra operations can actually be executed safely without coordination. Another point might be that MapReduce is inefficient in that it provides too much fault tolerance for typical workloads and cluster sizes.

Original GFS: the design has been largely superseded both by newer version of HDFS (e.g. eliminating the single point of failure in the initial design) and Google's (unpublished?) internal equivalents. BTW, the original GFS relies on Chubby, which uses Paxos internally.

Napster, BitTorrent and BitCoin: peer-to-peer systems definitely deserve a more extensive treatment in a later version of the book. The issues here are different in that trust, efficiency and resiliency are more important and I didn't have the bandwidth to handle them in the book as it stands.

Thanks for your comment, and I hope this doesn't sound like a rebuttal - I just wanted to think through the topics you mentioned one by one.

strlen · on Jan 3, 2014

> Clusters of stateless web servers + single master. This is definitely a common setup, but you need very little if any distributed systems research to implement it.

First, define "stateless"? I would not characterized such a system as stateless at all. Even if you're not using sticky sessions (with cache servers/load balancers talking to each other for failover using fairly involved protocols), there's still state that's ephemeral (sessions) in your application server, as well as bulk of the persistent state that's provided with essentially "faith based consistency" (consider typical memcached cluster with client doing consistent hashing -- asynchronously replicated MySQL with failover, etc... -- in case of a failure, neither availability nor consistency are guaranteed).

On the level of protocols design, the whole idea of stateless protocols (REST) vs. stateful ones (sticky load balancers + SOAP, CORBA, RMI, etc...) is by itself a big distributed systems topic.

A web browser talking to a web server is by definition a distributed system. I am typing this as fast as I can, praying not to get the "your link is invalid" error from HN right now -- this a real example of distributed system and cache coherence/consistency/atomicity. Here's one paper that deals with just these sorts of question (in context of NFS): ftp://ftp.cs.berkeley.edu/ucb/sprite/papers/state.ps‎

> Re: Chubby and GFS

Original GFS doesn't rely on Chubby, BigTable does however (for metadata). I believe newer versions (Collosus) by extension rely on Chubby as they rely on BigTable.

F1/Spanner, however, use consensus and transactions far more than others and is very interesting in this sense.

[Edit: more elaboration on distributed systems issues in a "stateless" cluster of web servers].

strlen · on Jan 3, 2014

Couple of suggestions:

1) I'd avoid mentioning CAP and FLP impossibility result much further in, this is like starting a discussion of mathematical logic or Computer Science with Godel's Incompleteness Theorem.

I'd definitely cover vector clocks and causality before: you really need this background in order to understand CAP/FLP. You may want to take a look at the approach Vijay Garg takes in his book:

http://www.amazon.com/Elements-Distributed-Computing-Vijay-G...

(E.g., use of vector clocks for proofs)

2) Don't over-focus on "webscale" (using the term only semi-ironically here) projects (I say this having contributed to multiple such projects). NFS, AFS, and CodaFS all raised very interesting questions about distributed state and disconnected operations. You may also note how some of the authors on the Plan9 (9fs, 9p, fossil) papers overlap with authors of some of the Google papers. "Mobile sensor networks" have been essentially an excuse to build distributed systems. Chord, Pastry, and Kademelia (sp? The System that Bittorrent uses) -- are actually far more truly "distributed" than many webscale systems (consistent hashing is a very simple DHT). Finally, don't forget that the Internet itself is a distributed system: the most successful eventually consistent, highly available distributed database is DNS.

I'd actually cover the web-giant infrastructure paper stuff a bit later on. While the Dynamo paper did have some novel contributions (regarding Gossip and anti-entropy), it's not what it's notable for: what's far more interesting in that paper is the way these concepts fit together.

I'd even hold my nose and cover CORBA: it's an interesting example in that it handled everything by the textbook, but failed due to complexity costs it had to incur in order do that. Far simpler RPC protocols and Web Services prevailed at the end by promising less. JINI may also be interesting to cover -- excellent system, far too ahead of its time -- Java VM and ecosystem were immature (anyone else remember Blackdown JDK on Linux?), marketed incorrectly (driver architecture?) to a market that wasn't ready for it (this is when dot-com startups could suffice on buying a couple of E10Ks or high-end Alphas).

3) This is an enormous and highly ambitious project and I commend you from embarking on it (I just re-read http://paulgraham.com/hs.html -- see the part about hard problems and ambition). Nonetheless, if you're serious about this, be sure to get plenty of editors -- there's a lot of room to make subtle mistakes (which I've done all the time). A lot of this can be genuinely confusing -- e.g., partitions aren't just due to networks, but at the same time not every type of a failure or a stop is a partition (I used to site GC pauses an example of a partition, but now I'm not so sure).

I'd help, but I've neither the cycles nor the habit of academic rigor; if you're still in touch with any profs from your uni, they could either help themselves or point you to others.

mixu · on Jan 3, 2014

Thanks for your kind words. There are definitely some cases where the topics could be reordered for greater clarity and I'll revisit them in the next iteration of the book based on the things people have pointed out (again, after a hiatus).

One of the challenges is to find the right balance between rigorous exposition one the one hand and keeping a brisk pace on the other (after all, writing for the web is different from writing a textbook). So I am grateful for all the input I've received thus far but it has definitely been challenging to find editors and reviewers - in particular because this is an unpaid effort on my part.

zdrummond · on Jan 2, 2014

I disagree. I think it is clear that this "book" is aiming to be an introduction to Distributed systems. As such, focusing on some models and excluding others does not decrease its value.

The real question of value is; does it allow someone outside the field to come up to speed and have enough understanding, that they can venture out and learn about alternate models (such as ones that don't require consensus).

abofh · on Jan 3, 2014

As a 'bottom up' interviewer, I'm often criticized for the same approach you're taking. You might do well to start at the highest level (and pseudo code) and work your way down to the clock.

In fact, I'd start with bad examples of temporal messaging and work down from there: ("Debit 250", to "Debit account A 250", to "Debit account A for 250 at timestamp Y", to "Debit account A for 250 at signed timestamp Y") -- showing bad is often more effective than explaining good.

tantalor · on Jan 3, 2014

The ePub chapter titles are "The third chapter", etc. Would be better if they used the actual chapter titles, like "Time and order".

http://i.imgur.com/T6YtJ0F.png