It's been a few weeks since I did a benchmark comparison, and I don't have a setup anymore that I can re-run to get real numbers.
RethinkDB is decent but not great at simple queries, like "select * where foo = 1". Unfortunately, and it's a rather big minus, Rethink doesn't have a planner, and every index has to be specified explicitly. This puts a burden on the application, which needs to be completely index-aware.
Where it really does break down is on aggregations. On a 800k dataset of semi-complex documents, the query "select x, count(*) from y group by x" (x here is a string with a cardinality of 89) took 4 seconds with Postgres, 84 seconds with Rethink on a test box.
In this scenario, Postgres was actually running in a slow Virtualbox VM, whereas Rethink was running natively; on our production servers, the same query takes about 500ms. In this scenario, Rethink used about 6GB out of the 8GB of memory allocated to it, whereas Postgres just streams the data through the FS cache -- so I think Rethink is just using a really bad query strategy.
Rethink's benefit, of course, is horizontal scaling, and it's supposed to be able to parallelize reads. But it has to work for the types of queries you give it. I strongly advise looking at your application's query needs, importing some test data, and seing how they perform. If you don't need aggregations or complex joins, it might work for you.
Thank you for coming back to this thread to share your experience, much appreciated!
I did not realize RethinkDB lacks a query planner/optimizer. That is a huge downside. The docs didn't seem to make much effort to point out this limitation.
RethinkDB is decent but not great at simple queries, like "select * where foo = 1". Unfortunately, and it's a rather big minus, Rethink doesn't have a planner, and every index has to be specified explicitly. This puts a burden on the application, which needs to be completely index-aware.
Where it really does break down is on aggregations. On a 800k dataset of semi-complex documents, the query "select x, count(*) from y group by x" (x here is a string with a cardinality of 89) took 4 seconds with Postgres, 84 seconds with Rethink on a test box.
In this scenario, Postgres was actually running in a slow Virtualbox VM, whereas Rethink was running natively; on our production servers, the same query takes about 500ms. In this scenario, Rethink used about 6GB out of the 8GB of memory allocated to it, whereas Postgres just streams the data through the FS cache -- so I think Rethink is just using a really bad query strategy.
Rethink's benefit, of course, is horizontal scaling, and it's supposed to be able to parallelize reads. But it has to work for the types of queries you give it. I strongly advise looking at your application's query needs, importing some test data, and seing how they perform. If you don't need aggregations or complex joins, it might work for you.