*You generally have to partition your data horizontally and thus give up many of...

jbellis · on March 3, 2010

> There are plenty of databases that will partition data without giving up any SQL features, but they cost money.

They also either rely on a single huge SAN for storage (single point of failure + expensive as hell) like Oracle RAC, or they require specialized gear like infiniband to reduce intra-node latency like Exadata (starting price: seven figures) or they're analytics databases that are designed for huge queries with latencies to match like Vertica, ParAccel, etc. (Think minutes between data being loaded and being available to query.)

I'll take NoSQL, thanks.

strlen · on March 3, 2010

Afaik Exadata wasn't even originally meant for OLTP. Seems like another case of a high-latency analytics/warehousing system being marketed as a "distributed database". They're now claiming that they can get OLTP grade performance with SSDs on Exadata, but I don't buy it. The promotion of Exadata is ironic, given I remember one of their engineers claiming (on his personal weblog) about impossibility of OLTP on top of shared nothing not too long ago.

As for the high-latency analytics databases (Vertica, Greenplum et al), I don't see much market for them either. Their big advantage over Hadoop was claimed to be the ability of non-programmer analysts to use them (via SQL), but Hive (which now even has JDBC drivers for it, allowing it to work with existing OLAP tools) solves that problem as well.

neilc · on March 3, 2010

Would would the need for "specialized gear like infiniband to reduce intra-node latency" be limited to parallel databases? (I assume you mean "inter-node"...)

wmf · on March 3, 2010

This whole discussion is about parallel databases since that's the only way to scale beyond the performance of one machine.

neilc · on March 3, 2010

Well, replace "parallel databases" with your favorite term for the parallel databases that fall outside NoSQL (VoltDB, Exadata, shared MySQL, etc). My point being that the alleged need for high-speed interconnects is orthogonal to SQL vs. NoSQL.

jbellis · on March 3, 2010

But it's not. Because SQL databases (strictly speaking any requiring strong consistency... which is mostly RDBMSes) are highly latency sensitive, where NoSQL databases like Cassandra design around that by saying "hey, you could not see the most recent write for a few ms, unless you request a higher consistency level." And most apps are fine with that. As a bonus you get multi-datacenter replication with basically the same code, another place most RDBMSes are weak.

It's a classic design hack -- redefining your goal as an easier problem.

wmf · on March 3, 2010

Oh, I see what you're saying. Yes, the interconnect is orthogonal (although you could argue that strong consistency requires more complex protocols like 2PC so interconnect latency becomes critical).

barrkel · on March 3, 2010

Consistency, Availability, Partition tolerance. NoSQL stores usually sacrifice consistency, and instead settle for eventual consistency. SQL (i.e. RDBMS) stores, with their usual emphasis on transactions, must necessarily sacrifice something else. SQL that doesn't hew to a hard line on consistency and transactions doesn't really have all the features of SQL. This is the distinction that matters most, IMHO, in the NoSQL strand of thinking.

wmf · on March 3, 2010

I admit that the SQL vendors (besides MySQL) made a mistake by putting ACID above scalability; that's clearly not always the right choice. However, CAP still allows a SQL database that is scalable, consistent, and available.

pashields · on March 3, 2010

No, that's exactly what CAP doesn't allow. Unless by scalable you mean non-horizontal scaling. In which case yes, but we already knew that big machines make things fast.

evgen · on March 3, 2010

> CAP still allows a SQL database that is scalable, consistent, and available

Name one please. It seems you are either fundamentally mistaken about what CAP implies or are constraining the "solution" to a clustered system that is effectively a single RDBMS hiding behind lots of tightly-coupled components.

wmf · on March 3, 2010

A tightly-coupled (whatever that means) cluster sounds like a perfectly legitimate way to scale to me.

barrkel · on March 3, 2010

And what if that datacentre goes down? And what if you want reasonable (<50ms) latencies in different parts of the world?

freetard · on March 3, 2010

Except for postgresql.

wanderr · on March 5, 2010

It certainly seems that the NoSQL movement is largely fueled by two facts: 1. MySQL sucks 2. Oracle is expensive

I'd love to see Postgresql get more attention, as I feel that they have scaling up and scaling out handled fairly well, whereas MySQL/InnoDB has a hard time even scaling up (which is why the Drizzle project even exists).

wmf · on March 3, 2010

Does Postgres partition across a cluster? That's what we're talking about here.

akvadrako · on March 3, 2010

Yes. This is what Skype does.