How We Partitioned Airbnb’s Main Database in Two Weeks

hbhakhra · on Oct 10, 2015

This really shows the value of a few good engineers. I can easily see this taking a team of engineers much longer, but in this case a few engineers figured out the least painful solution and implemented it with minimal down time.

caffeineninja · on Oct 10, 2015

RDS wouldn't be my first choice to run in a production environment like AirBNB...

fletchowns · on Oct 10, 2015

How come? Isn't that exactly what RDS is designed for?

toomuchtodo · on Oct 10, 2015

RDS is just fine for production environments. Any tuning you need done can be done with parameters; your only limit might be that imposed on RDS for connections (based on instance size).

stock_toaster · on Oct 10, 2015

In my experience (a few years ago though, to be honest), RDS was absolutely terrible for high performance and/or latency sensitive write workloads. Due to how (again, at the time) replication was handled -- amazon apparently did (at the time? still does?) synchronous writes to each AZ, and only completes the transaction when both return. When one AZ/RDS-instance was slow or dropping packets (seemed oddly frequent at the time for cross-AZ traffic -- again about 3 years ago though), our production stack would catch fire and come to a crashing halt. Never again!

schleyfox · on Oct 10, 2015

Hi, one of the Airbnb engineers involved in this op here. Yeah... that does sound a lot like 3 years ago. The situation has gotten a lot better, especially with 5.6 and PIOPS. These days, things work pretty smoothly (even as the volume of traffic and data has scaled massively).

stock_toaster · on Oct 10, 2015

Ah. That is great to hear! It was a special kind of hell having to deal with it with such regularity, at any and all hours (4am?! of course!). ;_;

akurilin · on Oct 10, 2015

Is there any visibility into how this is actually implemented? Was considering switching from self-hosted Postgres on EC2 to RDS, but that delay would be certainly an issue.

stock_toaster · on Oct 10, 2015

No, there was zero visibility. Again..This was just over 3 years ago, so my memory isn't super fresh but we managed to wring bits and pieces of info from our paid aws support after we kept getting such horrible intermittent write latency.

We later tried just running mysql ourselves on big instances, and raid'ing across a large amount of EBS volumes... we ended up running into other weird issues with that too. We would sometimes get terrible write latency spikes, which we were told was a result of "stuck blocks on the SAN". Apparently the backing SAN would sometimes have some blocks that performed very poorly (maybe a disk under high contention in one SAN cabinet?), and this would cause our overall RDS performance to plummet, but only irregularly. We would usually get on the horn and after talking with someone it would either magically stop being slow ("we dont see any problems here!") or we would be told about some "stuck blocks" and they would do some type of remapping or migration of those blocks. Not very transparent to us what really was going on. Sometimes we would just spin up new instances and ebs volumes, and do perf testing on them until we got a set that performed consistently, until something goofy happened again a week later or something. Pretty awful. We tried local instance storage, but it just wasn't fast enough (primary reason), and it felt a bit dangerous -- even though we backed up to an EBS volume pretty regularly.

We ended up bailing from aws and saw huge performance improvements (reduction in latency, etc) by using ssds and real hardware. We even actually ended up with some cost savings! Not long after we left, amazon came out with local SSD storage (high I/O instances I think they were called?), which may have been workable, but by then we had migrated away (we still used s3 though, and still used on-demand instances for developers).

akurilin · on Oct 10, 2015

Got it, thanks for clarifying. The transition to bare metal story doesn't seem to be nearly as frequent nowadays, although it does still happen. We've been mostly lucky with AWS performance so far, but this certainly isn't super encouraging, especially given the apparent performance unpredictability here.

bpicolo · on Oct 10, 2015

Though if one part of your stack is going to be on the metal, db is a good choice.

mentat · on Oct 10, 2015

Failing between AZs is a multi minute process and sometimes doesn't work at which point you have to get someone on the phone.

toomuchtodo · on Oct 10, 2015

I would not consider 120 seconds to be excessive when you're failing over to a new master from the hot slave.

akurilin · on Oct 10, 2015

Ouch, isn't automatic failover one of the big selling points of RDS?

kureikain · on Oct 10, 2015

From what I see, that failover uses DNS. The endpoint stays same, but it's pointed to new IP address or so...And the app may continue to use the cached IP from DNS query. I have to write a daemon to listen to RDS event and restart our app if it detects a failover event :(

gaius · on Oct 10, 2015

Those who forget Oracle are doomed to reinvent it.

rcaught · on Oct 10, 2015

Those who forget open source databases are doomed to pay per-processor licensing fees.