MongoDB and Riak, In Context (and an apology)

dabeeeenster · on Nov 7, 2011

"Shortly before JSConf, I had personally spent some time finding out ways to demonstrate that MongoDB will lose writes in the face of failure, to be used in a competitive comparison. Let’s just say that I was successful in doing so, despite recent improvements that 10gen has made. Unfortunately, I am not at liberty to share the results, nor do I think it would be constructive to this discussion. "

Why not? Did the author at least contact 10gen with the test case?

Maro · on Nov 7, 2011

Disclaimer: I'm the competition.

It's pretty easy to demonstrate data loss with MongoDB if you're doing replication, but it's the "normal" behaviour because MongoDB uses asynchronous replication and W=1 writes by default.

Set up a n=3 cluster and a client that writes data continously, like:

i = 0; while (true) { write(_id:i, data:i); getlastresult(); /* to make sure the client sends it */ printf(i); i++; }

Now kill the master. Another node will become the new master. Issue some more writes to make sure the logs of the new and the old master diverge. Now bring back the old master, which will become a secondary, and it will say something along the lines of "finding common oplog point", and it will discard the writes that it had that were not copied to the other nodes before it was killed.

You can verify all this by looking at the i's that were acknowledged by the old master and printed by the client. The last couple of them will be gone for good.

If this is unacceptable to you, then you can run MongoDB with W=majority mode, but with MongoDB W>=2 modes (so-called consistent replication modes) are very slow.

cwestin63 · on Nov 7, 2011

As a single master system, MongoDB doesn't allow the data on different nodes to become inconsistent or go into conflict. The idea is that avoiding this prevents developers from having to worry about (and clean up) conflicting data from different nodes.

The description above left out an important part of the process that occurs in this situation, but it is documented: http://www.mongodb.org/display/DOCS/Replica+Sets+-+Rollbacks . Note that the data that has been rolled back is saved to a file so that it can be applied again if so desired.

As mentioned here, if you don't like that behavior, you can use write concerns, and require W=2 (or more) and wait for the writes to be replicated. Of course, there's a performance cost to doing that, but you can choose.

Maro · on Nov 7, 2011

Yes, but from an application perspective your database will be left in an inconsistent state. The next morning the ops guys will have to call the dev guys to "repair" the database by hand.

Truth is ScalienDB runs in W=3 mode at about the speed MongoDB in W=1 mode, so this is not a trade-off that customers have to make.

foobarbazetc · on Nov 8, 2011

Does ScalienDB support WAN replication?

Maro · on Nov 8, 2011

Not yet. ScalienDB uses synchronous replication which works only works inside the datacenter. Across datacenters is a completely different use-case coming in 2012.

foobarbazetc · on Nov 9, 2011

I understand the difference. Just wanted to know whether to spend time evaluating it or not. :)

timr · on Nov 7, 2011

"If this is unacceptable to you, then you can run MongoDB with W=majority mode, but with MongoDB W>=2 modes (so-called consistent replication modes) are very slow."

Serious question: are the consensus modes slower than for any other system (e.g. HBase, Cassandra), or is this just a re-statement of the fact that writing to N > 1 machines is inherently slower than writing to a single machine?

Maro · on Nov 7, 2011

You have to be careful here, as MongoDB and Cassandra use a different model of replication.

Cassandra does not perform replication/synchronization on a per command basis between the nodes. Roughly: the client writes to multiple nodes, which are mostly independent, so assuming client bandwidth is not the bottleneck, writing to W=2 nodes will not be much slower than W=1. In practice, since Cassandra's disk storage subsystem is also fast for writes, it's overall very fast at raw writes. (As in, fastest in my benchmarks.) The trade-off is that its replication model is eventual consistency, and reads are somewhat slowish. On the other hand, their model works well in a multi-datacenter environment (along with Riak).

MongoDB uses an asynchronous replication model. What seems to happen if you specify W=2 is that the master doesn't ACK the write to the client until one of the slaves has copied it off the master. In my measurements W=2 ran at a fixed ~30 writes/sec on EC2, which means this mode may as well not be there. (This W=2 performance problem was also verified by customers looking at MongoDB.)

If you look at my company's product, ScalienDB, it uses highly optimized synchronous replication model (Paxos) and a storage engine designed for that. It's actually faster running in W=3 mode than certain other NoSQLs in W=1 mode. My bet is that this is what enterprises are going to want if they're going to use a NoSQL as a primary-copy database.

(Test for youself, all products are open-source, it'll cost you less than $20 on AWS.)

jbellis · on Nov 7, 2011

1) The explanation of Cassandra isn't quite correct. See http://www.datastax.com/docs/1.0/cluster_architecture/replic... for details.

2) Cassandra read performance is on par with writes now: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-...

3) Your explanation doesn't make sense to me. No matter the value of W, MongoDB should make best efforts to get it to all the replicas, no? So lower W should affect availability and perhaps latency but throughput should be unaffected, given a benchmark with sufficient client threads.

Maro · on Nov 7, 2011

My Cassandra benchmarks were performed a couple of months ago.

Turns out MongoDB doesn't scale well with number of connections due to software issues (eg. one thread per connection instead of async io). At about 500 connections Mongo starts to break on the platform we tested.

moonboots · on Nov 7, 2011

I appreciated the tone and contents of the article until this paragraph. Revealing the details of the flaw would allow MongoDB supporters to continue the discussion. I would like to know if it's related to a design choice or if it's a trivial bug.

azylman · on Nov 7, 2011

That was what stuck out to me from this, too... sounded like "nyanyanyana, I found a critical bug in your product - but I won't tell you what it is"

someone13 · on Nov 7, 2011

Really? Maybe I'm just an optimist, but it sounded more like "Every product has its bugs, and there's no point in pointing fingers here when it's not the point of this blog post."

I'm sure that Riak has bugs too - there are very few software products that DON'T (the provably-correct C compiler CompCert[1] being one that shouldn't). But it seemed to me that the post's author was simply trying to avoid getting sidetracked.

[1] http://compcert.inria.fr/compcert-C.html

dhimes · on Nov 7, 2011

I read it with this tone also, but the "I'm not at liberty..." part is different from "I don't want to bash them here," and is a little troubling.

sylvinus · on Nov 7, 2011

That sounded very wrong to me too. Rightfuly or not, I immediately got the image of Basho using their "secret" bugs to demo a MongoDB fail to a big potential customer.

feralchimp · on Nov 7, 2011

I will now take 30 seconds to continuously click my screen where the "upvote" button initially appeared.

A test case that confirms data loss would actually be one of the few things that COULD make this MongoDB discussion constructive.

dextorious · on Nov 7, 2011

WTF guys? It's not like it takes a genius to devise one, given how Mongo works. There's even one given above...

k_bx · on Nov 7, 2011

I have strong knowledge and some good experience with MongoDB, and now, on my new job, we will use Riak (it's a new project that just starts) as primary server. And while user-friendliness of it is a bit horrifying sometimes (hope I can help to change that) in comparision to MongoDB, Riak's real beauty is it's architecture and main ideas.

And yes, these are just different databases. MongoDB is much more RDBMS-like (in terms of consistency, B-Tree indexes and queries), while Riak is what you think of when you think "auto-scalable". So now I love them both :-)

Maro · on Nov 7, 2011

If you:

  1. evaluate the default settings with which MongoDB ships and
  2. evaluate MongoDB performance with more strict,
     consistent options and learn that it's quite slow and
  3. look at my comment above

... then you will quickly see that MongoDB is not RDBMS like at all in terms of data safety and consistency goals.

Currently 10gen is shooting for light-weight web workloads to spread the product, and they're very successful.

k_bx · on Nov 8, 2011

Well, yes, it would be fair to say that when you need to use MongoDB instead RDBMS you have to make safe=True commits all the time (otherwise you can really lost piece of them). I didn't do any benchmarks, because for usage like this I am completely ok with RDBMS, and use MongoDB in more "warehouse"-like use-cases (statistics, caching, long-running m/r on data etc.)

So MongoDB gets it's own niche: use it for "not 100% important data", but instead you get really high availibility of MongoDB cluster, really high speed and document-storage.

What you said about speed on safe commits is really interesting for me. I mean, of course they're slow consequentially, but in parallel they should be just as fast. Or am I wrong here?

Maro · on Nov 8, 2011

MongoDB has very nice tunability. The new version ships with journaling turned on by default.

You can specify `safe`, which means the data will be written to the journal. You can also specify `fsync`, which means additionally the system will issue an fsync on the journal. Additionally, there's the question of the replication protocol, `W` in MongoDB. What I said is that if you turn on `W=2` it will be slow. You should also know that even with `safe` and `fsync` you may lose data if running in `W=1` mode if the master goes down in your replicata set. See my comment above and the reply from a MongoDB guy.

k_bx · on Nov 9, 2011

Yes, I understood what you said about W=2 and about safe. Maybe what I wrote wasn't too clear :-)

secoif · on Nov 8, 2011

"Have you tried out Riak 1.0? It’s awesome."

There appears to be a bug in Riak 1.0, rendering it totally useless, at least with the NodeJS Driver:

https://github.com/frank06/riak-js/issues/97#issuecomment-25...

tldr; Executing a save then remove then a getAll yields a 500 error.

...so we moved our app to MongoDB. Critical mass in the community and the myriad of 3rd party drivers and tools that depend on each other means I'd vouch this kind of issue has better chance of being fixed quickly if it occurred in the NodeJS MongoDB native driver.

In our case, we are yet to have any specific needs of a NoSQL DB, just one that works, and this experience gave me no confidence in the Riak ecosystem. Sorry guys.

nphase · on Nov 8, 2011

"Unfortunately, I am not at liberty to share the results, nor do I think it would be constructive to this discussion."

- I don't understand this mindset. Forgive me for being naive, but if we're to "toast to the challenges and all become stronger, more proficient, and more successful as a result," then why don't we share these edge cases and fix them?