Scaling Redis at Twitter [video]

smcleod · on Sept 6, 2014

I would be most interested in how you manage failover, sentinels are probably the worst failover mechanism I've used in a long time.

antirez · on Sept 6, 2014

I'm very interested in understanding what are your main pain points with Sentinel, that would be a very useful feedback. Since the redesign and the well specified behavior, and the incremental fixes because of the inevitable bugs in the implementation, many Redis users are happy with it, but it is always a good idea to understand the other side of the user base.

smcleod · on Sept 14, 2014

Sorry for the very late reply here. We feel that Sentinels are clearly an afterthought, hacked on the side of Redis to tick the failover box. My main pain point is that they actively re-write their config files kept in /etc which is appalling behavior - especially if you want to manage your configuration file with some sort of automation such as puppet. The actual failover process is slow no matter how you tune it and because of this the window for error is high during this time. Data loss with Redis failover seems not only inevitable but also probable.

rhoml · on Sept 23, 2014

2cents about this:

1. If you are setting up Sentinel with puppet you can do audit of a file instead of modifying it on every run http://puppetlabs.com/blog/all-about-auditing-with-puppet this way you can install, configure and start sentinel and puppet will notice that the content changed. You can decide if you want to change the file or if you would like puppet to notice about it.

2. About sentinel, don't know how you are triggering the failover but we discovered a bug on versions prior to 2.8.12 on manual failovers, https://groups.google.com/forum/#!searchin/redis-db/rhommel/... so it was fixed in 2.8.12 and never seen redis taking more than a second.

antirez · on Sept 23, 2014

I'll try to reply to the different parts of your comment:

1) "Clearly afterthought". Sorry no actual argument.

2) Actively re-write the configuration: because of this (and the fact the configuration is fsynced) the basic Sentinel guarantees hold even in case the machine running the Sentinel processes crashes. Sentinel (and Redis) provide a full API for runtime reconfiguration, so I believe there are many ways to create automatic deployments. For people using puppet, it is possible to use the include files, it is not perfect but works most of the time. However this is a design decision, not a shortcoming IMHO.

3) Redis Sentinel has one of the faster failovers you can find around, I believe you tried a very old version. Example setting the down-after-milliseconds to 2000 (2 second timeout to consider the master failing, but note that this does not change anything, you can set it to 60 seconds, the point is, after the 60 seconds, how fast it is):

    $ date; redis-cli -p 9000 debug segfault
    Tue Sep 23 18:16:43 CEST 2014
    Error: Server closed the connection

    Slave logs: 2359:M 23 Sep 18:16:46.235 * MASTER MODE enabled (user request)

The slave was elected after 3 seconds, so the failover happened in 1 second.

    $ 2719:X 23 Sep 18:16:45.993 # +odown master mymaster 127.0.0.1 9000
    ... snip more logs here ...
    $ 2719:X 23 Sep 18:16:47.014 # +promoted-slave slave 127.0.0.1:9001

As you can see at ~46 the odown was reached, and at ~47 there was already the slave promoted to master (acknowledged via INFO output processing, not just sent).

4) Sentinel data loss has nothing to do with the speed of the failover (which is super fast), but mostly with partitions. The distributed system that you obtain summing Sentinel + Redis data stores is an eventually consistent system where the merge function is using the data set of the master with the greatest configEpoch (the latest promoted by the majority of Sentinels). This, and the fact that Redis uses asynchronous replication, means that isolated masters + clients, can process writes that will disappear. However Redis has configuration options (well documented in the Sentinel official doc) in order to bound the window of lost writes in a minority partition. The options I'm talking about allow isolated masters to stop accepting writes after some time no acknowledge is received by slaves.

janerik · on Sept 6, 2014

Could you expand on this? I want to use Sentinel in an upcoming project and would be glad to hear what downsides I'll face

devanti · on Sept 6, 2014

Just wondering, is there a reason why twitter doesn't use one of the many distributed in-memory database solutions? It seems like they had to write a lot of custom layering on top of it just to scale

NathanKP · on Sept 6, 2014

At a certain point of complexity and scale the in-house, custom distribution layer is almost always going to outperform a general purpose distribution system built into the database.

General purpose distributed database clusters are progressing, and if Riak, or one of the other in memory cluster focused system had been stable and production ready when Twitter was developing their cache layer it might have been a strong contender.

However, Riak is still much, much slower than Redis, especially when it comes to accepting writes. Overall, when you have the money that Twitter does and the team that it has you can come up with something better in house that it is more efficient for your use case. And that's what they've done here by building on top of Redis.

rch · on Sept 6, 2014

One could conceivably provide a Redis backend for Riak, if one were so inclined.