VoltDB is out, benchmarks against Cassandra

banjiewen · on May 26, 2010

This reads like marketing-blogspam to me. These two projects solve totally different problems, and the author acknowledges it, yet he still goes on to highlight benchmarks where VoltDB is 10-20x faster than Cassandra. Of course it's going to be! It's an in-memory store!

So why benchmark against Cassandra? It's got a lot of buzz around it, of course. What a better way to shove your name into the "NoSQL" ring. Blech.

With regard to the numbers themselves, I would refer the author to the CouchDB boys' discussion of benchmarks (under "Good Benchmarks are Non-Trivial"): http://books.couchdb.org/relax/reference/high-performance

jefffoster · on May 26, 2010

From their white-paper apparently DBMS systems spend 35% of their time doing buffer management, 17% doing logging, another 19% doing latching and finally 21% doing locking. This leaves only 7% for "useful work". In comparison, VoltDB has 95% capacity for useful work.

Reads even more like marketing blogspam!

cx01 · on May 26, 2010

Why do you think it is blogspam? You can also read the H-Store papers if you're interested in the details.

hdiedrich · on May 27, 2010

http://cs-www.cs.yale.edu/homes/dna/papers/vldb07hstore.pdf

aweisberg · on May 26, 2010

"Cassandra writes to disk. VoltDB is an in-memory database. So I gave both systems plenty of RAM to hold the data set and turned Cassandra's consistency settings pretty low." So in memory in both cases. Cassandra has to write a log, but not synchronously.

skorgu · on May 26, 2010

I think Cassandra always writes to the Commit Log before returning success, it just doesn't update the SStables yet:

> Commit logs receive every write made to a Cassandra node and have the potential to block client operations [1]

I'm no expert so it may be possible to turn this off but I couldn't find reference to it.

[1] http://wiki.apache.org/cassandra/CassandraHardware

neilc · on May 26, 2010

I think Cassandra always writes to the Commit Log before returning success

Sure, but the write is not the significant part (it is likely to be cached in memory); the question is how often the commit log is sync'ed to disk. I'm no Cassandra expert, but I believe the default is to fsync() the commit log periodically, but to allow operations to return successfully before an fsync() has occurred. There's also a mode to require fsync() before returning success for an operation.

http://wiki.apache.org/cassandra/Durability

aweisberg · on May 26, 2010

http://wiki.apache.org/cassandra/Durability

"Cassandra's example configuration shows CommitLogSync set to periodic, meaning that we sync the commitlog every CommitLogSyncPeriodInMS ms, so you can potentially lose up to that much data in a crash ... You can also select "batch" mode, where Cassandra will guarantee that it syncs before acknowledging writes, i.e., fully durable mode"

Cassandra has very fine grained control over just about everything to do with consistency and durability. I believe you can pick your desired level of consistency at access time.

ergo98 · on May 26, 2010

>These two projects solve totally different problems

No they don't. This is the wavy-hands NoSQL defensive shield that reeks of insincerity. If you show Cassandra or Redis or some other solution replacing a MySQL install, well that's just awesome, but don't dare compare if it doesn't come out the winner.

A lot of people have workloads that could work in VoltDB, a classic RDBMS, or Cassandra, equally. There are workloads that only fit in specific silos, but they are less universal than you imply.

>So why benchmark against Cassandra? It's got a lot of buzz around it, of course. What a better way to shove your name into the "NoSQL" ring. Blech.

Okay this is just silly. Cassandra is the big name in the "next gen database" world -- of COURSE any new entrant is going to compare against it.

tlack · on May 26, 2010

That's like benchmarking Berkeley DB vs. MySQL. They are on totally different levels of complexity. You can't compare memory-only db performance against a disk based db, period.

ergo98 · on May 26, 2010

>You can't compare memory-only db performance against a disk based db, period.

But...you can. What do you mean you can't compare? Clearly you can, however mortified you might be at that prospect.

A reasonable motorcycle can go from 0-60 in about 4 seconds. A reasonable car can do it in about 9 seconds. But you need to carry two passengers so the car is your only option, and such a comparison doesn't matter to you, but to a lot of people it's interesting if ultimately they just want to get from A to B as quickly as possibly. Then again if you want to transport goods maybe you need a truck, or a train.

This is so silly. Wait -- hand wavy -- that's right, nothing can be compared to Cassandra but pure love itself.

mileszs · on May 27, 2010

I don't think it's hand-wavy. I think you're upset about something else related to Cassandra that perhaps you read recently -- not tlack's comment. Suggesting it would make more sense to compare an in-memory data store to another in-memory data store would be a more interesting comparison seems a perfectly valid suggestion.

"This is so silly. Wait -- hand wavy -- that's right, nothing can be compared to Cassandra but pure love itself."

C'mon, man. That doesn't further discussion. That sort of statement serves only to incite anger.

ergo98 · on May 27, 2010

> I think you're upset about something else related to Cassandra that perhaps you read recently

Huh? No, I love Cassandra. She's a beaut.

tlack didn't say "it would make more sense to compare an in-memory data store to another in-memory data store". They said "You can't compare memory-only db performance against a disk based db, period.". There's a pretty profound difference between those two statements.

tlack · on May 27, 2010

Of course you can compare them. You can compare the speed of Oracle on a huge RAC cluster vs. text files on an Amiga 500 floppy drive. But no one would, because it's stupid and worthless. I guess that's what I meant: this is a stupid and worthless article.

jhugg · on May 27, 2010

I didn't mean to claim anyone will be struggling to decide between VoltDB and Cassandra and then choose VoltDB based on the benchmarks we did. I think that's as ridiculous as you do.

Our point, which perhaps I made poorly, was twofold. 1. You can be both fast and SQL. Nothing about the language itself was ever the bottleneck. 2. VoltDB isn't just for big complicated transactions. You can use SQL for KV-type workloads and perform.

There's 100 other reasons to pick one data layer over another, and the best tool will be different for different problems.

ntoshev · on May 27, 2010

Why not compare it to Redis instead? They are both in-memory snapshotting stores, sure Redis does less but it is still a closer solution. But, Redis would probably be faster.

aweisberg · on May 27, 2010

Redis does not support partitioning so it is even more apples to oranges then VoltDB vs. Cassandra. Both VoltDB and Cassandra rely on adding nodes to scale.

pquerna · on May 26, 2010

in memory vs not, will always have these kinds of results.

most people, most systems, aren't so hot about in memory datastores -- MySQL had MySQL-Cluster/NDB, and for lots of reasons it had trouble ever taking off.

For certain use cases, and definitely for benchmarking purposes, in memory datastores will always crush the competitors, but in the end, most people like reliable data storage.

jbellis · on May 26, 2010

Most use cases also don't have the entire data set equally active, and it's silly to pay to keep everything in RAM instead of just the hot parts.

hdiedrich · on May 27, 2010

That's part of the proposal: real world data requirements will soon completely fit into RAM available on a rather standard system.

cx01 · on May 26, 2010

In-memory doesn't imply "unreliable". If you need high-availability you'll want to use replication anyway.

jbellis · on May 26, 2010

In-memory with periodic snapshots sure implies "not durable," though. (That is, you will lose whatever data has not yet been snapshotted on a power failure or crash.) Replication doesn't fix that, unless you're willing to impose the latency of replication to different datacenters for each update. (I don't think Volt even supports this. Most systems don't.)

cx01 · on May 26, 2010

Why? If a single node fails, you still have the data on 2 other nodes. Nothing is lost. Of course, assuming you have a reliable UPS.

shpxnvz · on May 27, 2010

Because the data may not have replicated to the other nodes yet. As the parent comment noted, you could synchronously write to all replica nodes, but thats a big performance penalty and not partition tolerant.

hdiedrich · on May 27, 2010

They pay the penalty in latency, not in throughput. That's like no performance penalty at all if you work asynchronously. They make sure that the data cannot become inconsistent between nodes by ordering the transactions. The penalty that comes from this is that latency can become high, i.e. the answer time for individual transactions. While maintaining a maximized throughput of millions of transactions a second. It's like an internal 'pumping rhythm' with tightly synched timestamps.

aweisberg · on May 27, 2010

VoltDB replicates synchronously.

jhugg · on May 26, 2010

So if your entire data center loses power... If you have any notice (local UPS, etc), you snapshot to disk. If you have no notice, you lose some number of transactions depending on your data size, snapshot frequency and disk speed. Seconds or minutes, but probably not hours.

Future versions of VoltDB will do more address this single-data-center catastrophe scenario.

aquark · on May 26, 2010

Can anyone suggest problem domains where it is useful to have the ability to do 100K writes per second, but not have durable transactions?

I can see the in memory story for read only queries, or transient data.

cx01 · on May 26, 2010

Transactions ARE durable if you use replication.

leej · on May 26, 2010

can you elaborate more on those reasons? (just asking)

seldo · on May 26, 2010

I'm at Gluecon, where Mike Stonebraker (CTO of VoltDB) gave a talk about Volt this morning. He's very clear that VoltDB is not a NoSQL product; it's a SQL product, but a next-generation SQL product. He says they've achieved enormous performance gains by keeping ACID compliance by throwing out tons of older tech still present in MySQL and Oracle.

kemiller · on May 26, 2010

What sorts of older things did they throw out? And is there not even eventual persistence with VoltDB? Is there a recovery log or some such? Otherwise it seems like serious power event and you're toast.

beagle3 · on May 26, 2010

Well, a quick read of their white paper makes it look like they just copied the kdb design (http://cs.nyu.edu/shasha/papers/hpts.pdf - which they actually reference in one of their papers).

But perhaps they did something new as well?

edit: citeseer link was broken; replaced with link to shasha's website.

hdiedrich · on May 27, 2010

It's consistent, not only eventual consistent, it's k-safe, replicated and it's written to disk consistently at intervals you set. http://community.voltdb.com/node/54 (may require login). It's really pretty neat.

aquark · on May 26, 2010

Isn't the D in ACID for 'durable'? The blog post implies that this is an in memory store only ... I feel I must be missing something.

KERMIT · on May 26, 2010

"Durability" in that sense has nothing to do with how the data is stored, or how long it's available for.

It simply means that after a transaction has been reported as having been committed successfully, it won't ever be rolled back.

KERMIT · on May 26, 2010

It makes sense why he's saying that. The people responsible for large, important data sets usually have lots of money to spend storing that data. Since they care about their data, they know better than to get involved with NoSQL databases, and actively shun anything that claims to be a NoSQL database. The real money will flow towards those providing real databases.

bradfordw · on May 26, 2010

If he wanted something a bit closer to the voltdb model, why not go with mongo since your writes aren't immediately put to disk? It's more of a measure of how quickly you're shuttling data over the wire.

When you agree that it's apples to oranges, why continue?

codavid · on May 27, 2010

My understanding is that mongodb is not durable (http://ivoras.sharanet.org/blog/tree/2010-02-20.mongodb-and-...) - I am not sure I understand how VoltDB is durable, though (but then I know nothing about db architecture).

aweisberg · on May 27, 2010

MongoDB partitioning support is still under development. Probably more apple to oranges then VoltDB vs. Cassandra.

aditya · on May 26, 2010

This is cool, eventually we'll have every no-sql use case covered, so there exists a data store which conforms to your applications usage with respect to the CAP theorem...

riffraff · on May 26, 2010

code and configurations or it didn't happen

codahale · on May 27, 2010

I see absolutely zero details about the actual configurations of either system. No replicability means it's just another opinion paper.

zipstudio · on May 26, 2010

I wonder if this DB will be available from any clouds?