Hacker News new | past | comments | ask | show | jobs | submit login
HBase to Cassandra: why we switched (ria101.wordpress.com)
22 points by jbellis on Feb 24, 2010 | hide | past | favorite | 4 comments



"[..] and Cassandra being more suitable for real time transaction processing and the serving of interactive data."

Does Cassandra actually support transactions?

"For example, adding a new node to the system becomes as simple as bootstrapping its Cassandra process and pointing it at a seed node (an existing node within your cluster)."

You could easily have this in a distributed system that has a single master (implemented as a distributed state machine), without all the disadvantages of a gossip-protocol.

"Secondly I have come to the conclusion that Cassandra’s P2P architecture provides it with performance and availability advantages. Load can be very evenly balanced across system nodes thus maximizing the potential for parallelism, the ability to continue seamlessly in the face of network partitions or node failures is greatly increased, and the symmetry between nodes prevents the temporary instabilities in performance that have been reported with HBase when nodes are added and removed"

None of these features require a P2P system. Actually, a P2P system will in most cases be slower than a hierarchical one.


> Does Cassandra actually support transactions?

He means in the sense that databases have typically been divided into "transaction processing" (doing a small set of operations over and over with large concurrency) and "analytics" (doing potentially monstrous ad-hoc queries w/ very low concurrency).

> You could easily have this in a distributed system that has a single master (implemented as a distributed state machine), without all the disadvantages of a gossip-protocol.

Sure, but then you have all the disadvantages of a single master system. :)

For most systems the single master system and its potential for catastrophic downtime if failover goes badly (which it _always_ does eventually; if you claim otherwise you are a novice or selling snake oil) is the worse choice.

> a P2P system will in most cases be slower than a hierarchical one.

I call BS. An O(1) routing p2p system like Cassandra has no inherent speed disadvantage over a heirarchical system.

Case in point: Cassandra is substantially faster than HBase, its closest heirarchical competitor. There's also Hypertable, but to a first approximation nobody uses it so I don't know of any benchmarks.


> For most systems the single master system and its potential for catastrophic downtime if failover goes badly (which it _always_ does eventually; if you claim otherwise you are a novice or selling snake oil) is the worse choice.

I don't know what you're talking about. If implemented correctly, a hierarchical system is extremely unlikely to fail. I mean, if your network is split into 3 partitions, the master will be unavailable, but in that situation you're going to have worse problems than availability, because your web-servers are unlikely to be even reachable from outsite the datacenter.

> I call BS. An O(1) routing p2p system like Cassandra has no inherent speed disadvantage over a heirarchical system.

Nope. If you have a master that stores the dictionary, then all lookups are also O(1). Even better, you can randomly distribute keys across the nodes and are not bound by the hashing algorithm.

I don't know about the performance between HBase and Cassandra; I'm strictly talking about theoretical performance.


> If implemented correctly, a hierarchical system is extremely unlikely to fail.

You should go show google how they're doing it wrong so they can keep app engine up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: