I feel somewhat responsible for this confusion, as the guy being quoted here... ...

ithkuil · on May 11, 2012

I have a use case, which I don't know if it's common or not.

I want to put millions of items in riak, play with it, and then throw then away.

I might want to do that because I'm testing out something, or because it's the result of some periodic batch processing in production, which I want to get by key later.

Unfortunately, riak doesn't seem to have the notion of a "db", "keyspace" or whatever you want to call it; i.e. something which you could "drop" and that will simply delete a directory with a dozen of files in it (should be quite cheap).

The only thing I can do is to drop the whole riak db, which has the following problems:

1) I have to do it manually on all nodes (stopping the cluster, deleting the files etc)

2) I cannot share a riak cluster between several users/team, so that each user/team can play with a portion of it but there is only a central installtion of the whole cluster. Every application (which I want to be able to drop all the db and recreate it) has to run it's own riak cluster.

Initially I thought that "buckets" were intended to solve this problem, but buckets don't map to a separate storage location, it's just a way to group items. Even listing all buckets present in the db requires scanning all keys and, as the doc says, "Similar to the list keys operation, this requires traversing all keys stored in the cluster and should not be used in production."

Although I've been told that "riak is not designed to do this and that", I'm not sure if these limitations are really technical, or just because the product development effort was targeted at some of the aspects, and these issues could be addressed in a later stage.

Any idea?

aphyr · on May 11, 2012

Tough call. If you did want to use Riak for fast bucket-drop, your best bet might be to:

a.) Run multiple clusters--not too difficult. Just give each one a different erlang cookie and run em on subsequent ports.

b.) Take bitcask_backend or leveldb_backend and add drop-bucket functionality. Custom backends are more difficult than running multiple clusters, but certainly not impossible. You could build it on top of fold or split writes up into, say, one leveldb per bucket. Don't recall if the vnode interface has drop-bucket so you might have to write some plumbing alongside Riak. jrecursive has done this in Mecha.

If I were building something like this, I might look first at Cassandra or Hbase, or possibly sharded master-slave postgres.

ithkuil · on May 14, 2012

Thank you for your answer. The problems I see with (a) from the top of my head are:

1. Even if it's easy, somebody has to do that. 2. Setting up all the monitoring etc for each instance 3. Running more than one riak daemon on the same machine means that the riak daemon is unaware of the IO operations performed by the other one, hence IO throughput could suffer. This means that in practice you would need to mount separate disk heads (and we are back to 1.) 4. Each riak instance will require some RAM as well, so memory has to be allocated and there is the risk that's over-allocated. 5. Port allocation. I fear it would end up with smth like: "just keep a internal wiki page where each 'db space' is mapped to a port number"

Well, the problem with (b) is of course that I don't have time to do that. For now we stick to cassandra, but Riak is so nice in many aspects that I really hope that at some point, as the product matures, more resources can be invested in aspects which are not currently perceived as "selling points" for riak, but are important for some scenarios and not technically impossible.

aphyr · on May 14, 2012

Yeah, if you're using Cassandra and the GC/rebalancing issues aren't affecting you, you're probably fine sticking with it. Both are Dynamo-structured, so your consistency/failover model advantages are similar.