Hacker News new | past | comments | ask | show | jobs | submit login
Scaling Riak at Kiip (basho.com)
91 points by oinksoft on May 24, 2012 | hide | past | favorite | 39 comments



It's been a riak-y day for me, so I watched this avidly.

You're no doubt wondering what the issues were; they come 20 minutes in, and were (paraphrased):

* At scale in production, adding a new node took days to complete all the handoffs; they recommend adding new nodes as soon as it's looking like you need them, rather than waiting until you're redlining.

* 2i is slow, especially in EC2; a straight KV "get" is milliseconds-denominated; 2i index queries were taking multiple seconds. Use 2i, they say, but in background processes.

* Javascript MapReduce is slow; this is well known. They confirm Erlang MR was adequate.

* As the LevelDB keyspace grows, there's a stepping function in latency; 5ms, then 15ms, then 25ms; the solution is to add nodes. (LevelDB is Google's KV store, a new option for Riak, required if you're using secondary indexes).

* Riak Control didn't work for them over low-latency connections.

* Once, a Chef misconfiguration left the whole cluster flapping on and off, which corrupted the cluster; they recovered with Basho support. Be careful about adding and removing nodes rapidly.

* Similarly, flapping a single node caused the cluster to get into a state where it wouldn't converge again; the cluster worked but no nodes could be added until they (presumably?) restarted it.


After watching the video, the takeaway for me was don't run $database on EC2 instances.

Last night I just finished replacing 6 maxed out medium instances with one $100 box from SoftLayer. :/


At that price point, it seems that you can get FAR better specs on Honelive. Kimsufi.ie is even cheaper, if you don't mind your server being hosted in the UK.

edit:

Softlayer: Intel 4x2.40GHz, 2 GB RAM ECC, $160/mo Honelive: Intel i7-2600 16, 16 GB RAM, $120/mo Kimsufi.ie: Intel i7 4x 2(HT)x 2.66+ GHz, 24 GB, $60/mo


Have you heard any reviews of Honelive? I've been looking for a dedicated server provider in the US with comparable prices to Hetzner in Germany.


I get a lot of nice little extras from SoftLayer. For example I can get boxes in the 5 corners of the internet (west, central, east, euro, asia) with free bandwidth between them.


Has anyone used Kimsufi? Are they any good? I'm at Hetzner right now, but I never found any use for that box, so I'm just paying for it for nothing...


I just signed up for kimsufi. They are the budget brand of OVH. Webhostingtalk gives them favorable reviews.


And if you do, for God's sake don't run them on EBS volumes. If you don't know by now that EBS performance is highly variable, you really haven't been paying attention.


Very useful data. Do you have anything more comprehensive about your experience?


I don't work at Kiip. :)


I thought you meant to say you had additional personal experience, but perhaps you've just been reading about Riak today.

Thanks anyway.


I've had a 3-node hardware Riak cluster lying around for about 4 months now, but only just today cut over from Postgres to it. I can talk to you about how shiny Riak is on day 1, but as this presentation shows, the day 1 behavior of a Riak cluster can be a bit of a trap.


Great to hear that you're cutting over to Riak. We highly recommend that your minimum cluster size is your N value (replication value) + 2. I your case that likely means 5 nodes for your starting cluster, not 3. There are many reasons. http://basho.com/blog/technical/2012/04/27/Why-Your-Riak-Clu...


Obviously great to know. Thanks!


Thomas, I'm in the middle of evaluating document stores and am perhaps getting stars in my eyes from Riak's easy-to-scale story. I'd love to hear more about your experiences once your cluster has had time to marinate a bit.


Interestingly enough, this is a very similar type list as you'd see for Mongo, especially in terms of overall effort.

That you'd encounter all these things at a 25mm daily ops level is pretty odd, though.


The very same folks did write about their MongoDB experience: http://blog.engineering.kiip.me/post/20988881092/a-year-with...


I think the point he was making was that the company in question never tried to scale horizontally with MongoDB because, in their words, "we believe horizontally scaling shouldn’t be necessary for the relatively small amount of ops per second we were sending to MongoDB."

Yet, they went and scaled horizontally with Riak and experienced pain.

Their opinion that it did not make sense to have to horizontally scale the "relatively small" number of ops they were sending to MongoDB is certainly their own, but then they horizontally scaled with Riak anyway and boasted about their 25MM ops per day scaling ... which, averaged out, is only about 280 ops per second.

In short, it was far from an apples to apples comparison.


It's probably significantly more than 280 ops/sec, given peak times.


"At scale in production, adding a new node took days to complete all the handoffs..."

That's a bit of a headscratcher. What is happening during those 'days' and what is the primary limiting factor?

I keep meaning to get into Riak, but then stuff like this where the system has crazy moments that are impossible to coherently reason about keep popping up.


It's not a crazy moment. They have a system running at real scale, and they found that while keeping the system up constantly with immense amounts of data in it, they were able to dynamically add a new node to their cluster --- just that balancing everything out took a lot of time for the system.

The operational challenge I infer from this is that they had waited to add that node until they really needed it, because their expectation was that adding the node would get them quick relief to their scaling issue. Instead, they got relief a few days later when the node was fully integrated.

Solution: don't wait to add nodes until the last minute.


Something they (understandably) didn't address in the video was fault tolerance on the Postgres side. What do people like these days for that?


I'll address that here: We use streaming replication with a hot standby for fault tolerance on the PostgreSQL side. For backups in addition to the streaming replication we do WAL shipping to S3 every 15 minutes or 16 MB, and we do a full base backup to S3 daily during our low-traffic time of the day, which uploads around 30 GB to S3 daily (but LZO-compressed, so actually down to around 3 GB).


Thanks, Mitchell. Is your hot standby in a different zone (e.g. one in the West, one in the East) so you can handle one of their big outages?


Yes


in the last couple of month reading this site, I see more negative MongoDB reviews than most other 'cool tech' stuff. On one hand MongoDB by carrot-or-stick was pushed into environments with high-write needs (OLTP kind of systems). On another hand having true secondary indexes and semistructured data makes Mongo a 'closest to RDBMS' choice. So things like global write lock, indexes must fit into memory, auto-sharding questions, single-thread map-reduce -- all are pretty significant limitations for an OLTP data store. I wish MongoDB does not get discouraged and instead steps back, reviews academic foundations of the system and pick a couple of use cases and optimise their builds for them (similar to datawarehouse vs oltp kind of offerings)


The glossing-over cassandra (because of Digg history etc..) was the point in the talk when I started raising eyebrows. Engineers evaluating systems should let the systems speak for themselves.

We have a fairly small cassandra cluster in production, serving over 50x the volume they mentioned in their talk, with good latency (real-time bidding) and not-too-painful operational footprint.


Great talk! Looking at Riak vs MongoDB right now for a production system in fact. The data isn't K/V though and we need rich queries so I'm not sure what our solution will be unfortunately.


Why not just use postgresql?


Agreed, start with a known entity. Take the one special table that is killing you and move it to a special database (Riak/Cassandra etc) suited for that task only once you understand exactly what you need.


I think Kiip agrees with you completely too.


I find it a better course of action to work on properly understanding your data problem and then evaluate the options available instead of just sticking with the stand-by solution ("Well, we've always done x!").

Just like with anything in technology ... there is going to be pain as you escalate the level of complexity and what you are trying to accomplish.


I disagree, based on experience. On the surface, MongoDB did solve our data problem, and we thought we sufficiently researched the technology that it would work, even attending multiple MongoDB conferences and talking to 10gen employees.

The issue was that underlying architectural decisions in MongoDB ended up biting us and limiting us rather significantly. It could be argued that this is because MongoDB is a rather new piece of tech (I disagree, I think MongoDB is fundamentally flawed, but it doesn't matter in this argument).

Because of our experience, going with the standby IS the best choice, until you _need_ something else. Riak was a change necessitated by its fast growth such that horizontally scaling was necessary when you're in an environment such as EC2. Ignoring MongoDB specifically, the IDEA of MongoDB simply isn't correct here, a Dynamo-style K/V store is the correct option, and Riak happens to be a fantastic one.


Because Postresql is not a NoSQL database despite the recent lipstick additions.

I love the database but it is not optimal if you have a fluid schema like many use cases have today.


I recently asked if anyone could point me to an (open source) app on github or wherever that has a 'fluid schema' on a NoSQL system and nobody was able to show me one. Do you have a concrete example I can look at please?


Nothing? For something as common as you say I'd have thought there would be many examples out there. I don't get how this can be touted so often when there's no concrete use cases for this 'feature'.


I think that you would find MongoDB to work nicely for you ... especially with the need for rich queries (which assumes, on my part, that you have more complex data needs, documents, etc.).

With all due respect to the Kiip engineering team, this wasn't a strong case for using Riak over MongoDB ... but rather the general pain that a engineering team feels when horizontally scaling in the cloud.


Anything concrete about why cassandra is bad?..


One key difference is that Riak will rebalance data across nodes as they are added or removed automatically, Cassandra will not. You have to manually adjust the partitioning of data, balancing it by hand.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: