MemcacheDB: A distributed key-value storage system designed for persistence

chops · on Jan 5, 2009

This seems very cool.

I could see this going a step further to merge it with being an actual cache, keeping much of the frequently used data in memory, while still saving the data so that it's persistent. I'm for sure going to tinker around with this.

MemcacheQ (message queuing system) linked on that page seems like it will solve a problem I had been thinking about recently for my system.

On first glance, this looks like some pretty useful stuff in the overall.

schtono · on Jan 5, 2009

MemcacheQ seems especially interesting! Anyways, the limitation of 64k-1 of body length is a bit frustrating...

strlen · on Jan 5, 2009

I've used an earlier version of this - and unless things have changed since Spring 2008 - this isn't usable with a high volume of traffic: I benchmarked it by having multiple fork()'d processes load data from a "partitioned" MySQL database - at one point, memcachedb just gave up and quit reading from the UNIX domain socket.

A possible alternative is Tokyo Cabinet: Tokyo Cabinet is a DBM implementation (BDB - on which memcachedb is based - is another) that memory maps the entire hash bucket. Tokyo Cabinet also has an RPC layer built on top of it which specifically has a memcached compatibility mode. I haven't yet tried, but I'd be interested to see what results people have with it:

http://tokyocabinet.sourceforge.net/index.html http://tokyocabinet.sourceforge.net/tyrantdoc/

mdasen · on Jan 5, 2009

For those who might not know how this speedup is possible:

Traditional databases use data structures like b-tree indexes. These are awesome! They look up data in log(n) time (n being the size of your dataset). That means that as your data grows, the time to execute a single query doesn't grow so much. So, what's the problem with databases? Well, the problem is that as your data set grows, you're likely getting more queries as well. So, you end up with m * log(n) where m is the number of queries that users are making. Boo!

What about throwing hardware at it? I hear that's all the rage! The problem with that is that it's really, really difficult to split data across more than a handful of machines. Replication is good for a few machines, but eventually you hit a wall with writes since writes must happen on every server in the cluster.

Or do they? There's a technique called sharding that you may have heard of. Basically, it's putting some articles on server 1 and some articles on server 2. Then, when you need article 450, you need to figure out what server it's on and get it. This is what MemcacheDB (and memcached) do. They hash the key you give it and that tells them what server it will be on without having to actually ask each server if they have it. That's a constant time process so it scales infinitely (unlike those hosting companies that claim they scale infinitely). All you have to do is keep throwing boxes at it as you grow.

Awesome! That's so much better than MySQL! Well, no. I mean, it is in some ways, but you can't say something like "get me the latest article that someone has commented on". That might be useful to your site, might not. All you can do is lookup by the key.

It gets hard when you want to do something such as getting the comments for an article. You can't say, get me the comments for article 450. You have to actually store a list of the comment ids in the article, get the article, and then request the comments based on that list in the article object. Worse, lets say two people comment on an article at the same time. You'd better make sure you start a transaction before reading the object, modify the list, and save it. If you don't, you'll hit race conditions that can see a comment's association lost (something you don't have to worry about with foreign-key associations).

Plus, beyond that trouble, you still can't say "get me the article with the most comments".

To an extent, I'm wondering how this is much better than memcached. If you still need/want an RDBMS for query capabilities, do you really need another key-store that saves to disk? memcached can always be reconstituted from the RDBMS if a server fails, but I guess MemcacheDB forgoes that warm-up process (similar to the Shop.com Cache) and that can be a bonus. However, it's a lot slower than memcached due to the disk access.

It looks really interesting, but I'm not sure how useful it would be (I should say, in my line of work). I won't give up queries and their site says emphatically that it's not a cache. If you're doing something that needs a simple key-value store, this is going to be an amazing product.

teej · on Jan 5, 2009

"It gets hard when you want to do something such as getting the comments for an article. You can't say, get me the comments for article 450. You have to actually store a list of the comment ids in the article, get the article, and then request the comments based on that list in the article object. Worse, lets say two people comment on an article at the same time. You'd better make sure you start a transaction before reading the object, modify the list, and save it. If you don't, you'll hit race conditions that can see a comment's association lost (something you don't have to worry about with foreign-key associations)."

----------

I'm glad you brought this up. I've been working on a project that "solves" this issue. It's basically a distributed memory hash-table like memcached, but the values are array that can be modified in a CRUD manner. I call Alchemy (http://github.com/teej/alchemy/tree/master)

So in your example you would have:

    alchemy[ article_450_comment_ids ] = [17, 39, 158]

Where you could then look up your three article comments out of Memcache, MySQL, or whatever else. At the moment, it's mostly just a proof of concept, though I am using it in a small live app. I feel like a true distributed, super-fast, relational, in-memory database is close, there are just a few more pieces of the puzzle to work out. For me, Alchemy is one of those pieces.

mdasen · on Jan 6, 2009

You can do it in an even easier way.

For example:

article450 = {'title': 'This Awesome Thread!', 'text': 'Blargh!', 'comment_ids': [17, 18, 19]}

So, you can embed it right in the object just as easily as setting up another object in the store just for the ids.

That helps with foreign-key type situations, but it doesn't help for a lot of other things.

I guess my perspective on it is that the restrictions are fine for small apps with few features, but for small (size of data) apps you don't need 20,000 operations per second. MySQL handles it easily. Likewise, if you're dealing with a bigger app, you're going to want to be able to query data in more ways. MemcacheDB is great for what it is: a key-value store. It's not a replacement for a RDBMS.

In fact, you really can't create such a database as you propose. The issue is that there are problems in computer science that grow at certain rates. For example, sorting is an n * log(n) problem. You simply can't do better than that. That's how indexes work (very simplistically). If you limit yourself to operations that you can do in constant time, you have hash tables. You can't query a hash table except by key. Distributed hash tables are decently well known systems, but they don't replace databases in most applications and you can't make them more database like unless you eliminate the property that makes them scale so well.

teej · on Jan 6, 2009

"So, you can embed it right in the object just as easily as setting up another object in the store just for the ids."

That doesn't solve any of the issues for which I built Alchemy. The point was to create an array store that could be reliably modified without transactions. I built it after having real-life issues with race conditions in Memcache where transactions weren't an acceptable alternative.

In the end, I really would like to see cache solutions that can be easily dropped in for the different bottleneck situations, but keeping the standard RDBMS.

bd · on Jan 5, 2009

It looks similar to Google App Engine Datastore [1] or Amazon SimpleDB [2], so I guess it could have similar use cases.

If you already architect your application to use these cloud services, it should be easier to switch to/from MemcacheDB on your own servers.

[1] http://code.google.com/appengine/docs/datastore

[2] http://aws.amazon.com/simpledb

mdasen · on Jan 6, 2009

It isn't similar.

With App Engine and SimpleDB, you can query on any field in the model. With MemcacheDB you can only query on the key.

So, with App Engine and SimpleDB you can say "give me all the people records where their age is > 18". You can't do that with MemcacheDB. With MemcacheDB, you can only say, get me the records with the IDs x, y, and z.

What App Engine and SimpleDB eliminate is joins. MemcacheDB would be more scalable than either App Engine or SimpleDB, but it doesn't allow you to do much. That's what makes memcached a good cache. You have a strict limitation in how you can get data, but it scales. SimpleDB and App Engine are better than most RDBMSs in terms of scalability, but they still use index lookups when you query them.

To put it in SQL terms:

App Engine/SimpleDB: "SELECT * FROM people where age > 18"; "SELECT * from articles where comment_count > 10"; "SELECT * FROM articles where author_id = 28" These are all valid. Most queries that don't involve joins are valid.

MemcacheDB: "SELECT * FROM people where id = 8"; "SELECT * FROM people where id IN (8, 16, 27)" Those are the only types of queries that you can do with MemcacheDB. None of the queries above that work with App Engine and SimpleDB work with MemcacheDB.

bd · on Jan 6, 2009

Thanks for the clarification. MemcacheDB does indeed look like App Engine Datastore/SimpleDB sans extra query machinery.

Which is still interesting, as this is anyway their most efficient use pattern (in terms of resource consumption, not ease of development).

Attribute-based queries on App Engine can hit restrictive CPU quotas quite easily, while key-based queries are very fast.

mdasen · on Jan 6, 2009

No, it's nothing like them in implementation. MemcacheDB is like a giant hash table where your app server can figure out where something is based on what the key hashes to. An individual item is stored on one server (no redundancy) and which server it gets written to is a matter of the hash.

Google's BigTable is a single-master, column based storage system with redundancy. That's a huge difference. In fact, there's very little that is similar.

While Amazon hasn't published as much about SimpleDB, it's most definitely not a giant hash table. It's likely either a column store like BigTable/HBase or a document store like CouchDB.

They're completely different tools. MemcacheDB isn't anything like the other two and for scalability purposes it's important to realize why so that you can choose the correct tool for the job.

If you're interested in a tool like the App Engine Datastore or SimpleDB, there's HBase, CouchDB, and HyperTable which will all fit the bill.

bd · on Jan 7, 2009

You are right. I was under impression that BigTable is distributed hashtable (as MemcacheDB), but it's not.

I revisited Google IO talk on App Engine Datastore where they explicitly say it [1]. They call it sharded sorted array.

Here "sorted" is the key difference: it means you can do efficient prefix and range scans of contiguous areas in one sweep without extra disk seeks.

This wouldn't be the case for MemcacheDB even if you created some clever key naming scheme, as locality there is defined by their hash function.

That of course, being in addition to many other features that GAE Datastore has and MemcacheDB doesn't have.

I mentioned similarity based on how fundamental is the "key->value" aspect of both MemcacheDB and GAE Datastore, as opposed to traditional relational databases accessed via SQL.

[1] http://sites.google.com/site/io/under-the-covers-of-the-goog...

ezmobius · on Jan 6, 2009

I've played with this and I like the idea a lot. but I found that BDB sucks still and gets corrupted too easily.

So Instead o've been using tokyo tyrant. It supports memcached protocol as well as HTTP and native client protocols.

http://tokyocabinet.sourceforge.net/tyrantdoc/

Much better then memcachedb and serves the same purpose.

chadr · on Jan 6, 2009

Do you have any more info on the corruption? I have found bdb to be very reliable. I used it as the backend storage engine for Sun's LDAP server product.

geuis · on Jan 5, 2009

Does this support a REST command interface?

mdasen · on Jan 5, 2009

No. It supports the memcached interface. Most languages have libraries for this as memcached is decently popular.

moonpolysoft · on Jan 6, 2009

I've been working a lot on the performance aspects of Dynomite and I think it is a better solution now than memcachedb. One big issue with memcachedb is that it leaves replication management completely to the user. So if you want to be able to seamlessly handle machine failure with a high traffic site, you need to do a lot of work to set up replicas, do clever things with DNS, and even then you won't get a guarantee of availability for that data. Dynomite handles all of that transparently, plus it isn't tied to a single storage engine. So that you can use a different storage engine based on the type of load you plan on generating.

The code is here: http://github.com/cliffmoon/dynomite