Hacker News new | past | comments | ask | show | jobs | submit login
NoSQL at Netflix (netflix.com)
86 points by abraham on Jan 28, 2011 | hide | past | favorite | 9 comments



Are we doomed to have heterogeneous data storage? I keep wondering if the NoSQL landscape will settle down. The RDBMS world has really consolidated around SQL and there are a few major databases out there such that it's not too hard to pick which one you should use. You can answer 90% of RDBMS selection questions with MySQL, PostgreSQL, and Oracle.

The NoSQL landscape is not as well defined, and perhaps it covers so much more territory that it will never settle down. Perhaps we will all be stuck using three different databases for their various features because we can't come up with a few technologies that are good enough for 90% of cases.


Like [programming] languages, there are pros, cons, and trade-offs for each flavor.

There are stores that are ahead in providing better, "runtime" control over facets of the CAP theorem -- ex. Cassandra.

There are stores that have several and more expressive datatypes -- ex. Redis.

Then there are stores like InnoDB, which some are more familiar with its pairing with MySQL. The interesting parts here, as far as one store to rule them all, are developments like HandlerSocket or building other structures atop the same store -- ex. Voldemort.

Similarly, there are stores that share a common filesystem [structure] and can be queried in different manners -- ex. Apache HBase (or any Hadoop / HDFS based system).


I wish the blog post went into more details about which Netflix services are implemented with SimpleDB, HBase, and Cassandra. He stresses that these (seemingly very similar) systems have unique capabilities, but he doesn't explain why they need all of them.


But these are all, like you said, stores. The RDBMS-equivalent term is "database engine"; one database management system can have multiple engines (like MySQL's InnoDB and MyISAM, for example) while keeping all its query parsing, planning, optimization, networking, etc. in common.


"Are we doomed to have heterogeneous data storage?"

If I rewrite this, the answer will become clear: "Won't there ever be a one-size-fits-all solution?"

We have technology for the 90% of cases. These are the 10%.


That's the beauty of the NoSQL landscape (not a fan of the term). Each db has a specific quality, and while a general SQL db can be moulded to meet a certain need, there may already a db solution that is perfect for it.


And there is already a good "classification" system that you can use to categorize the solutions.

There are Column Stores (Cassandra, HBase), KeyValue Stores (Riak, Voldemort), Document Stores (MongoDB, maybe CouchDB), Graph Databases (Neo4j) and something I usually call Data-structure Stores (Redis).

For Column and KeyValue stores, you can often also say that they're implementations of BigTable or Dynamo.


Frankly, the great thing about leading edge datastore development right now is that it's all over the map. Many of them will fall off the map over time, but having a couple of additional models to add on to the current great RDBMS tools can only be a plus.

I imagine that a significant driver of datastore heterogeneity is that there are a number of very different in-house datastores that are supporting extremely successful commercial ventures. It used to be that "you don't get fired for choosing Oracle." Now that Amazon and Google, etc, have paved the way, it's much more feasible for a corporate infrastructure developer to try to find a way off of the high-dollar proprietary systems like Teradata.

I also think it's interesting to note that some (most?) of the largest scale commodity RDBMS users couldn't get close to where they are without something like memcached.


Point to take away: use the right tool for the job!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: