Hacker News new | past | comments | ask | show | jobs | submit login

You generally have to partition your data horizontally and thus give up many of the features that SQL has to offer

There are plenty of databases that will partition data without giving up any SQL features, but they cost money.




> There are plenty of databases that will partition data without giving up any SQL features, but they cost money.

They also either rely on a single huge SAN for storage (single point of failure + expensive as hell) like Oracle RAC, or they require specialized gear like infiniband to reduce intra-node latency like Exadata (starting price: seven figures) or they're analytics databases that are designed for huge queries with latencies to match like Vertica, ParAccel, etc. (Think minutes between data being loaded and being available to query.)

I'll take NoSQL, thanks.


Afaik Exadata wasn't even originally meant for OLTP. Seems like another case of a high-latency analytics/warehousing system being marketed as a "distributed database". They're now claiming that they can get OLTP grade performance with SSDs on Exadata, but I don't buy it. The promotion of Exadata is ironic, given I remember one of their engineers claiming (on his personal weblog) about impossibility of OLTP on top of shared nothing not too long ago.

As for the high-latency analytics databases (Vertica, Greenplum et al), I don't see much market for them either. Their big advantage over Hadoop was claimed to be the ability of non-programmer analysts to use them (via SQL), but Hive (which now even has JDBC drivers for it, allowing it to work with existing OLAP tools) solves that problem as well.


Would would the need for "specialized gear like infiniband to reduce intra-node latency" be limited to parallel databases? (I assume you mean "inter-node"...)


This whole discussion is about parallel databases since that's the only way to scale beyond the performance of one machine.


Well, replace "parallel databases" with your favorite term for the parallel databases that fall outside NoSQL (VoltDB, Exadata, shared MySQL, etc). My point being that the alleged need for high-speed interconnects is orthogonal to SQL vs. NoSQL.


But it's not. Because SQL databases (strictly speaking any requiring strong consistency... which is mostly RDBMSes) are highly latency sensitive, where NoSQL databases like Cassandra design around that by saying "hey, you could not see the most recent write for a few ms, unless you request a higher consistency level." And most apps are fine with that. As a bonus you get multi-datacenter replication with basically the same code, another place most RDBMSes are weak.

It's a classic design hack -- redefining your goal as an easier problem.


Oh, I see what you're saying. Yes, the interconnect is orthogonal (although you could argue that strong consistency requires more complex protocols like 2PC so interconnect latency becomes critical).


Consistency, Availability, Partition tolerance. NoSQL stores usually sacrifice consistency, and instead settle for eventual consistency. SQL (i.e. RDBMS) stores, with their usual emphasis on transactions, must necessarily sacrifice something else. SQL that doesn't hew to a hard line on consistency and transactions doesn't really have all the features of SQL. This is the distinction that matters most, IMHO, in the NoSQL strand of thinking.


I admit that the SQL vendors (besides MySQL) made a mistake by putting ACID above scalability; that's clearly not always the right choice. However, CAP still allows a SQL database that is scalable, consistent, and available.


No, that's exactly what CAP doesn't allow. Unless by scalable you mean non-horizontal scaling. In which case yes, but we already knew that big machines make things fast.


> CAP still allows a SQL database that is scalable, consistent, and available

Name one please. It seems you are either fundamentally mistaken about what CAP implies or are constraining the "solution" to a clustered system that is effectively a single RDBMS hiding behind lots of tightly-coupled components.


A tightly-coupled (whatever that means) cluster sounds like a perfectly legitimate way to scale to me.


And what if that datacentre goes down? And what if you want reasonable (<50ms) latencies in different parts of the world?


Except for postgresql.


It certainly seems that the NoSQL movement is largely fueled by two facts: 1. MySQL sucks 2. Oracle is expensive

I'd love to see Postgresql get more attention, as I feel that they have scaling up and scaling out handled fairly well, whereas MySQL/InnoDB has a hard time even scaling up (which is why the Drizzle project even exists).


Does Postgres partition across a cluster? That's what we're talking about here.


Yes. This is what Skype does.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: