Copysets and Chainsets: A Better Way to Replicate (2014)

GauntletWizard · on June 16, 2017

Precisely where they went wrong:

    In practice, the speed of recovery is typically bottlenecked by the incoming bandwidth of the recovering server, which is easily exceeded by the outgoing read bandwidth of the other servers, so this limitation is typically not a big deal in practice.

If you're recovering to one server, you're going to have a bad time. With random distribution, you recover to every server, equally, over a very short period of time. The tradeoff is that you'll have a lot of churn, as temporary failures cause a lot of data to be rereplicated, and then extra copies deleted as the come back online. On the other hand, this helps balance your utilization and load.

The actual insight is that you want failure domain anti-affinity; That is, if you have 1000 servers on 50 network switches, you want your replica selection algorithm to use not three different machines at random, but three different switches at random. If you have three different AZ's, for each copy, put one replica in each of the three. Copysets can provide this, but as stated in the article, they're much more likely to give you Achilles heels - A typical failure won't hurt, won't have any unavailability - But the wrong one, and you go down hard, with N% dataloss rather than thousandths of a percent dataloss.

In short - Failures happen. Recovering from them is what matters, not convincing yourself that they can't happen.

rescrv · on June 16, 2017

I think you're pointing out a good tradeoff here. The original copysets work allows you to explicitly tune the likelihood of data loss with the amount of work done for recovery. A cluster replicated to minimize the likelihood of losing a replica set under correlated failure will have a higher cost of recovery from failure. A cluster replicated to minimize recovery time (e.g., RAMcloud's random allocation) will likely lose entire replica sets upon a set of (even random) failures.

Chainsets were an attempt to add the properties of copysets to a system based upon chain replication.

Working with the original Copysets authors we refined the chainsets algorithm into a tiered replication algorithm that can enforce independence assumptions on the replica set (what you've termed anti-affinity). Here's the paper on the subject: https://www.usenix.org/conference/atc15/technical-session/pr...