If your dataset fits comfortably on one postgres instance, and will continue to ...

sametmax · on Feb 16, 2017

Right now not really. Cockroach perf don't allow you do have a big dataset given the performances.

manigandham · on Feb 17, 2017

You are misunderstanding this article. This is not a benchmark, this is a test of how correct the database is with distributed transactions and data in the worst conditions possible. These are not real-world performance numbers in any sense.

sametmax · on Feb 17, 2017

You are misunderstanding these comments.

The problem is not just the performances, it's that distributing has a huge cost in term of servers and maintenance.

If you can write only 50 times a second, your data set won't get big enough to justify distributing it.

Put your millions of row in one server and be done with it. Cheaper, faster, easier.

There is a tendancy nowaway to make things distributed for the sake of it.

Distribution is a constraint, not a feature.

manigandham · on Feb 17, 2017

Why do you keep repeating 50 w/s when that's not an actual performance number? CDB will likely run with thousands of ops/sec per node.

Can you really not see why distributed databases are needed? High availability, (geo) replication, active/active, oversized data, concurrent users, and parallel queries are just a few of the reasons.

Distribution will always come with a cost, but the tradeoffs are for every application to make. We use a distributed SQL database and while it's faster than any single-node standard relational DB would be, speed isn't the reason we use it.

theptip · on Feb 16, 2017

Can you elaborate on this point please? My reading of the article and docs is that perf is expected to scale linearly with the number of nodes (and therefore with dataset size).

sametmax · on Feb 17, 2017

Let's say you get 50 writes / sec per node. You need what, 1000 nodes to get to the playing fields of Postgres, with a simpler and cheaper setup ? Right now it's really not competitive, unless they improve the performance 1000 times. It makes no sense to buy and administrate many more machines to get the power you can have with one machine on another, proven tech.

theptip · on Feb 18, 2017

> Let's say you get 50 writes / sec per node.

If the DB can only handle 50 ops/sec then the point you are making here is valid.

But see https://news.ycombinator.com/item?id=13661349, that's a pathological worst-case number. You should find more accurate performance numbers for real-world workloads before making sweeping conclusions.

Your original comment was:

> Even if they manage to multiply this by 100 on the final release, it's still way weaker than a regular sql db

This is what my comment, and the sibling comments, are objecting to, and I don't think you've substantiated this claim. 100x perf is probably well in excess of 10k writes/sec/node, which is solid single-node performance (though you'd not run a one-node CockroachDB deployment). Even a 10x improvement would get the system to above 1k writes/sec/node, which would allow large clusters (O(100) nodes) to serve more data than a SQL instance could handle.

Obviously I'd prefer to be able to outperform a SQL instance on dataset size with 10 nodes, but for a large company, throwing 100 (or 1000) nodes at a business-critical dataset is not the end of the world.

threeseed · on Feb 16, 2017

This statement makes absolutely no sense.

Performance is very loosely correlated with dataset size and less so in most distributed NoSQL databases like CockroachDB.

sametmax · on Feb 17, 2017

At a given point in time yes, but to get this data set, you need to write it into the DB. Big datasets implies either you have been receiving data for very long of you did it quickly on a shorter period of time. It's usually the later. Which mean you need fast writes. 50 writes / sec is terrible, even more if to improve that you need to buy more servers while you 30 euros / months postgres instance can deal much more than that.