RethinkDB screencast - from queries to sharding under 15 minutes

RyanZAG · on Jan 17, 2013

Would be really nice to see a demo of RethinkDB under real load. All of the RethinkDB slides and info I have seen generally have 1-4 servers and maybe 50mb of data at most. At this level, you might as well just be using textfiles...

Anybody know of any demos of RethinkDB handling, say, 100gb of data? And running decent queries on it?

coffeemug · on Jan 18, 2013

slava @ rethink here. Let me explain the state of affairs on this.

The underlying storage engine was tested on commodity systems and super-duper enterprisy storage systems, and can do hundreds of thousands of ops/second on tens of terabytes of data (that required pretty beefy setups, though). When we added clustering on top of the storage engine, we avoided thinking of performance too much (in the interest of shipping), so everything slowed down significantly. Here's our (rough) roadmap:

  - New protocol buffer API and some more checklist features (1.4)
  - Secondary indexes, huge ReQL improvements (1.5)
  - Performance and scalability (1.6)

We'll be doing scalability and performance demos that I hope will be really impressive, but it'll take ~4 months to get there.

amikazmi · on Jan 19, 2013

Does it mean that until these changes are done, you wont declare RethinkDB to be production ready? (you mentioned at 2.0)

Can you guys add a "rough roadmap overview" page to the docs, so we could have a general idea what is the status?

I like the way the RubyMine does it:

http://confluence.jetbrains.net/display/RUBYDEV/Development+...

nullspace · on Jan 17, 2013

I love the way you guys have stated the advantages and disadvantages of RethinkDB in your FAQs. Just wondering about one thing in there:

"RethinkDB is a great choice if you .... are planning to run anywhere from a single node to sixteen node clusters."

With a sharded master-slave setup with one slave each, this leaves us with a total of 8 shards. This is enough for most use cases, but is there a reason it is limited to 16 nodes?

coffeemug · on Jan 17, 2013

There is a bottleneck in the metadata propagation code that slows down the system after roughly 16 nodes (there is one place where we used an O(N^3) algorithm in the interest of shipping the product). This isn't an inherent limitation, just the state of affairs today. We'll resolve this in the next few releases, but we wanted to be up front about this limitation for the time being.

jdoliner · on Jan 17, 2013

Short answer it's a pretty arbitrary cutoff.

16 is the largest number of nodes we've done sufficiently rigorous tests on to be sure that things go smoothly. So that's the highest number we're comfortable citing on our site. It's a conservative estimate though so you should be fine straying past it.

We know of a few things which becoming scalability concerns with a large numbers of machines but we're talking close to 100 machines. These will hopefully be addressed soon.

ukd1 · on Jan 17, 2013

RethinkDB has been awesome to learn, I found it really easy to get up and running, install the ruby driver and just start coding! Reminds me of starting with MongoDB.

amikazmi · on Jan 17, 2013

What is the state of Rethink db? Is it fit for production use?

In the website it stated as 1.3.2 (which imply production ready) but I think I saw some comments a month ago from you that it's not fit for production use yet.

What about secondary indexes?

Are the machines in the screencast are very weak? a simple query (get 2 rows of the dota table) running ~100ms is really slow- is it because you're using the web interface?

RethinkDB seems cool and I really want to try it in my next pet project :)

coffeemug · on Jan 17, 2013

> Is it fit for production use?

Not yet. We'll bump the release to 2.0 when it's ready for production.

> What about secondary indexes?

They're coming -- see https://github.com/rethinkdb/rethinkdb/issues/88

> Are the machines in the screencast are very weak?

No, the 100ms roundtrip includes the HTTP request over our admittedly very unsophisticated WiFi network.

Hope this helps!

jdoliner · on Jan 17, 2013

The fact that it's HTTP slows this down a bunch too. With a normal client you'd have the TCP connection already made.

tomjen3 · on Jan 17, 2013

Any reason you didn't go with the standard 1.0 is the first non-beta release?

coffeemug · on Jan 17, 2013

Yes -- we had an internal versioning scheme that crossed 1.0 very early on. Having different internal and external versioning schemes went against our intuition of having an open development process, so we decided to bite the bullet and keep the version post 1.0. It isn't ideal, but it's done :)

taf2 · on Jan 18, 2013

[edit] I realize there is a lot of buzz around being "NoSQL", but seeing how you support similar concepts: join, count, group, where - why not provide at least a partial SQL interface for the features you do support? Even mysql originally did not provide "all" features of SQL e.g. no foreign key constraints and probably still multiple features are unsupported in mysql but at least with a sql interface it is easy for anyone to quickly pick up the DB and start integrating it... plus SQL is actually a really nice "Query language"... IMO... and perhaps many others too...

- p.s. love the demo video looks awesome how easy it is to use and I like the query language you created looks nice too.

coffeemug · on Jan 18, 2013

Under the hood RethinkDB already supports multiple protocols, so from our perspective there is little difference between a SQL front-end and a ReQL front-end. Of course to users, that's the only thing that matters.

There are still many improvements we can make to the core system and an enormous number of people are already interested in it, so we decided to satisfy them first. We might add a SQL front-end at some point, but it's not very high on the priority list now.