Cassandra is not row level consistent

jbellis · on Nov 25, 2016

Cassandra developer here.

Lots of comments here about how Cassandra is AP so of course you get inconsistent (non-serializable) results.

This is true, to a point. I'm firmly convinced that AP is a better way to build distributed systems for fault tolerance, performance, and simplicity. But it's incredibly useful to be able to "opt in" to CP for pieces of the application as needed. That's what Cassandra's lightweight transactions (LWT) are for, and that's what the authors of this piece used.

However! Fundamentally, mixing serializable (LWT) and non-serializable (plain UPDATE) ops will produce unpredictable results and that's what bit them here.

Basically the same as if you marked half the accesses to a concurrently-updated Java variable with "synchronized" and left it off of the other half as an "optimization."

Don't take shortcuts and you won't get burned.

ncw33 · on Nov 25, 2016

I agree, Cassandra is doing here exactly what I (as a user) would expect.

When using a DB like C*, you always have to ask yourself, "does this update happen before or after this one - have I done anything to ensure that's the case?"

In this example, the second query (the UPDATE) is being partially applied before the first query (the INSERT) - and that's OK, because there's no ordering or dependency in the second query that forces it to run second. So the reordering of the queries that he's observing is legal, and can be simply avoided with "UPDATE ... IF revision = :last-rev".

garyrichardson · on Nov 25, 2016

My biggest complaint about C: SQL-like interface. These seems like a terrible decision to me.
If you are used to relational databases and SQL, it's a real struggle to get your head in the right place. Some of the gotchas:

Cell level vs row level consistency/locking (as stated in this article) * Grouping / counting isn't really a thing. * WHERE statements are actually used for identifying the key/value pair you want and not for filtering the rows. * Indexes aren't really indexes. They are definitions of partitions and sort order.

sargun · on Nov 26, 2016

That's actually my favourite part of C*. I think the CQL interface makes programming against it extremely easy, and I dare might say enjoyable.

It makes getting standard, and understanding what's going on a breeze.

isoap · on Nov 25, 2016

> the second query (the UPDATE) is being partially applied before the first query (the INSERT) - and that's OK

"Partially applied" is ok with a database?

The description of Cassandra on it's site is "Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data."

If it's mission-critical data, I wouldn't do arbitrary things with it for conflict resolution that can corrupt data.

pkolaczk · on Nov 25, 2016

Cassandra is perfectly fine. If you want your writes to be consistent, learn to use LWTs properly. The same way, if you want your data to be fully consistent in RDBMS, learn to use SERIALIZABLE isolation level (which is not default in most RDBMSes for performance reasons). If, in an RDBMS, you use SERIALIZABLE for half of the updates and READ UNCOMMITTED for another half, guess what consistency guarantees do you really get?

brianwawok · on Nov 25, 2016

This is not arbitrary. It may be not intuitive coming from a rdbms background, but it isn't arbitrary.

As Johnathon pointed out, it's like properly using synchronized or volatile half the time. I don't call it sometimes not working as arbitrary. I call it expected for not following the rules of the system.

tremon · on Nov 25, 2016

If I follow your argument correctly, it is basically the same argument as "your C compiler is correct, what you've written is invalid and the standard allows undefined behaviour here".

Which may be a technically valid argument against the compiler/database system, but it's not a valid argument for defending the system as a whole: if a standard allows arbitrary execution instead of bailing out on non-standard (ambiguous) input, it is unreliable.

hamax · on Nov 25, 2016

Is variable assignment in c/java/... unreliable? It behaves very similar to what C* does. Concurrent access and modification will produce undefined behaviour if you don't explicitly protect it.

brianwawok · on Nov 25, 2016

Exactly.

Getting access to things like concurrent locks is HARD to get right. That is why there are so many simple languages that don't let you touch concurrency.

Doesn't mean there is no need for it in the world, and no one should be able to use it.

pkolaczk · on Nov 25, 2016

RDBMSes can cause similar inconsistencies if you don't know what you're doing. It is like setting read uncommitted and then complaining about dirty reads.

isoap · on Nov 25, 2016

Let's pretend I'm leading a blind child by telling them which direction they should go, and if they don't step carefully, they could be hurt.

If I can't see the child, should I continue to give them direction, or tell them to stop?

In this use case, the database makes changes to data without knowing what is correct and what is harmful. That is not the user's fault. It's a code choice.

hamax · on Nov 25, 2016

No, it's a users choice to use a DB that has this tradeoff.

Like it was said a couple of time already, all of your complaints about C* would work just as well for locking mechanisms in most popular languages.

throwanem · on Nov 25, 2016

This would be a more sympathetic response if Cassandra did less to tempt devs not intimately familiar with it into doing things which will hose them later on. I've worked with it very successfully in the past and contributed to one of its client libraries, but I hesitate to recommend it because that trait makes it unnecessarily dangerous to use.

webmaven · on Nov 25, 2016

So, since their use of UPDATE is problematic, what is the correct way to release the locks?

pkolaczk · on Nov 25, 2016

DELETE ... IF EXISTS

DELETE ... IF <condition>

UPDATE ... IF <condition>

utobi · on Nov 25, 2016

Disclaimer: I am working with the author

The problem is even with using LWTs for both updates you are running into the same problem, the only thing you gain is that one of the statements (either lock or release) wins and in this case it then only works because the lock uses a TTL and is removed after some time.

Also, two conflicting statements from the same thread going to the same node should be easily serializable for Cassandra - or at least be logged.

> Don't take shortcuts and you won't get burned

Since you are the Datastax CTO, maybe some alignment with the marketing team on how features are communicated to users might help users not getting burned? :D

4ad · on Nov 25, 2016

This is too subtle. Incompatible operations should be rejected. Reliance on the programmer to correctly use systems that don't enforce consistency to write consistent transactions is a bad strategy.

sorkin2 · on Nov 25, 2016

This kind of nannying can be really expensive to do correctly when trying to build high performance systems -- to the point where it doesn't make sense to punish everyone just because someone who didn't read the manual MIGHT misuse the product. It's not at all unreasonable to mix transactions with different guarantees if they never touch the same data, and tracking that accurately enough without pissing off your customers with performance needs seems like a fool's errand.

cowardlydragon · on Nov 25, 2016

Read the CAP theorem, or stick to your single node.

j_s · on Nov 25, 2016

The situation could be considered similar to .NET adding the IllegalCrossThreadException exception to Windows Forms in v2¹, defaulting to detecting invalid behavior with the option to disable checking (Control.CheckForIllegalCrossThreadCalls).

.NET v4.5 introduced a new approach (Task-based Asynchronous Pattern)² with changes to framework and language (C#5) to "simplify" implementing things correctly.

.NET took the developer-friendly path; Cassandra, not yet.

¹ https://web.archive.org/web/20060414185346/http://msdn.micro...

² http://stackoverflow.com/a/18033198

verifex · on Nov 25, 2016

haha, I was just going to ask you what you thought of this, and I'm glad to see you responded already! Yes! :)

isoap · on Nov 25, 2016

> Don't take shortcuts and you won't get burned.

I'm sorry that the user did something unexpected and then posted about it in a way that made your application look bad. I know that must be frustrating. However...

Calling the user out as doing something wrong when your application is failing because of a use case you can't handle properly just looks bad. You serve your users, not the other way around. Don't forget that.

If it were me and there were a case that my application couldn't handle properly, if I couldn't fix it, I'd raise an error, and then document clearly that they should not do this, such that when they search for that error, they'd find the answer. Then, I'd work to see if there were a way I could avoid the error altogether by not allowing that use case.

nkellenicki · on Nov 25, 2016

There is no "one size" fits all database or distributed system in existence. To say that there is is to say that MySQL is equally as well suited to all use cases as Cassandra or Redis. They all store and serve data, right?

Cassandra makes no claims to be such a holy grail. Read their documentation, and you can see the use cases it is good for and those it is not.

The author of this blog post chose one it is not good for.

Put another way, "I'm sorry that the Lamborghini you bought broke when you attempted to go off-roading with it. Perhaps you should have bought a Jeep instead?"

imagist · on Nov 25, 2016

It's not possible to detect whether the user wants CP. You can assume they want CP, but the entire point of Cassandra is that it doesn't make that assumption.

Cassandra is AP with opt-in CP. This is an explicit tradeoff. You're giving up the assumption (which enables error checking) that everything is CP in order to get AP performance. This tradeoff is one of the main use cases for which Cassandra exists.

The vast majority of the time, error checking is way more valuable than AP performance, so your approach to handling the error makes sense, but if that's your situation you shouldn't be using Cassandra. There are a wide variety of ACID-compliant relational databases that do what you want.

TL;DR: Using Cassandra and expecting CP error checking is like using a hammer and expecting screwdriver behavior.

throwanem · on Nov 25, 2016

The high-handedness doesn't improve your argument. Consider forgoing it next time.

helper · on Nov 24, 2016

As a long time Cassandra user its easy to forget that some of Cassandra's semantics will be surprising to new users. That being said, if you are considering adopting an AP database it really is important for you to know the details about how write conflicts get resolved. This is perhaps the biggest difference between Cassandra other databases like Riak and ought to be part of your decision making process instead of a surprise you run into later.

That being said, using Cassandra for distributed locks is a terrible idea. I can't think of any way in which Cassandra would be better than using {Zookeeper,etcd,consul}. Trying to force a database to do something it really isn't designed for will almost always lead to disappointment (and often resentment) of said database.

siculars · on Nov 25, 2016

Seconded. Don't use an AP system for distributed lock management. As with most things, use the right tool for the right job.

When I speak to developers about Riak I tell them the biggest difference between systems like Riak (including Cassandra) and traditional relational systems is not the data model, ie. relational vs non-relations (or structured vs unstructured) but rather the architecture, ie. distributed vs not distributed. As in a "C" system vs a "non-C" system, where "C" is consistency.

Disclaimer, I work for Basho, makers of Riak.

lucian1900 · on Nov 25, 2016

Minor nitpick: distributed systems can be consistent too. Your comment implies all distributed systems choose availability over consistency.

joncrocks · on Nov 25, 2016

While they don't all pick one over another, there are limitations on how available and consistent you can make systems, for the same level of partition tolerance - https://en.wikipedia.org/wiki/CAP_theorem

pkolaczk · on Nov 25, 2016

While technically you're right, the CAP theorem uses a very strict definition of availability, which is often not needed to consider the system available in real-life. Also the consistency in CAP theorem is defined as linearizability, and this is just one of very many kinds of consistency. Often the systems are ok with weaker types of consistency.

Therefore, I'd really like that everyone stopped labeling distributed databases as AP or CP, because this is an oversimplification and most of the time, just plain wrong.

crypto5 · on Nov 25, 2016

I think you can make Cassandra "C" by using quorum reads and writes. Also they support some SERIAL consistency level, which uses PAXOS underneath, and supposedly achieves consistency as well.

mdani · on Nov 25, 2016

The IF statement uses Paxos as underlying implementation and still it may result into an inconsistency if you are using two separate CQL statements. Cassandra does not provide begin ... end construction where everything is atomically committed or rolled back.

pkolaczk · on Nov 25, 2016

You can use Cassandra LWT as a building block for multi-partition transactions, in a similar way like CAS atomic operations are used to build locks, mutexes and monitors and then monitors are used to implement ACID transactions in RDBMS. However, noone says it would be easy or performant, therefore this is probably not a good idea.

mdani · on Dec 4, 2016

Right. Cassandra crawled at 20 TPS when we sent IF cql statements. One thing to note is that Cassandra txns can never be rolled back - they can only be committed/ retrried and so on.

im_down_w_otp · on Nov 25, 2016

Quorum won't solve consistency problems like this unless you can also guarantee a serialized reader/writer for any given piece of data. The best you can do by pinning queries to primary replicas and enforcing R + W > N is RYOW consistency, and that depends on the aforementioned serialization point.

pkolaczk · on Nov 25, 2016

Quorum reads + writes and applying non-destructive updates only are enough to get linearizabile consistency in Cassandra, even if clocks are not perfectly synchronized.

im_down_w_otp · on Nov 26, 2016

If you're applying non-destructive updates then you don't even need quorum to achieve consistency. You're just writing immutable data.

pkolaczk · on Nov 26, 2016

If you're applying non-destructive updates without quorum, you would not get monotonic read consistency, because stale reads would be possible. You could write a row, immediately read it back from another replica and find out the row is not there, because that replica didn't get it yet. You'd have only eventual consistency.

chungtg · on Nov 25, 2016

Datastax seems to think that it is a good idea: http://www.datastax.com/dev/blog/consensus-on-cassandra

user5994461 · on Nov 25, 2016

That piece was written in 2014, which was a different time ;)

Plus, if we were to take the vendor words at face value, we'd all be using Docker and MongoDB in production.

djsumdog · on Nov 25, 2016

I work for a large company that used docker in production. A lot of companies are. There are a lot of big names using Kubernetes, Mesos or some other container orchestration system.

ddorian43 · on Nov 25, 2016

Just like many companies use mongodb and nanoservices.

flurdy · on Nov 25, 2016

:) My last client (very, very big client) for the past year ran nearly all their production systems with Docker and MongoDb...

Never had any problems with Docker. MongoDB on the other hand... Suffice to say it is really fragile when spread across multiple datacentres, which is probably not a surprise to HN.

pkolaczk · on Nov 25, 2016

Details matter. All updates in the Datastax post are protected by LWT. That code is correct. The OP's code was wrong, because he was mixing non-LWT updates.

convolvatron · on Nov 25, 2016

cassandra never claimed to be a consistent distributed database. its really quite sad that someone had to find that out the hard way.

pkolaczk · on Nov 25, 2016

It is optionally consistent if you know how to use it and you know its limitations. Their use of Cassandra was obviously wrong. Mixing LWT and non-LWT is like having only half of your shared data protected by mutexes - this isn't going to work correctly.

isoap · on Nov 25, 2016

In such a situation though, shouldn't it raise an error? At least that way, the user could have a chance to recover.

Why should it make an arbitrary decision that unknowingly corrupts data for some users?

hamax · on Nov 25, 2016

The only way for C* to know that it should raise an error is if it would implicitly protect all writes with LWT and this is not what most users want.

Following the parent's example, if you don't protect memory access with a lock, you can't know that somebody else locked it.

prodigal_erik · on Nov 25, 2016

"Must be consistent" is a property of the data, not of a particular transaction. Data on which inconsistent transactions are ever allowed should be declared, preferably with big hazard signs.

pkolaczk · on Nov 26, 2016

The data are not either conistent or inconsistent. The data are either correct or incorrect, and the definition of "correct" is determined by business requirements. Most of the time weak eventual consistency model provided by Cassandra is sufficient to keep data correct if you are using it right. E.g. you don't need ACID and serializable consistency to do financial stuff correctly - banks and accountants figured out how to do this a long time before computers were invented. That's why Cassandra has serializable consistency as an opt-in only (LWTs), but this comes at a price of latency and availability. By using strong consistency all the time, you'd lose most of the benefits of Cassandra, and you could probably just replace it with a single RDBMS node (and suffer scalability and availability problems then).

brianwawok · on Nov 25, 2016

Right does Java throw an exception if you forget a lock around a shared variable?

MrBuddyCasino · on Nov 25, 2016

> I can't think of any way in which Cassandra would be better than using {Zookeeper,etcd,consul}

This is the correct answer. Zookeeper & Curator make it nearly foolproof to implement a distributed lock correctly, whereas with Cassandra, its the other way round.

PDoyle · on Nov 25, 2016

> Zookeeper & Curator make it nearly foolproof to implement a distributed lock correctly, whereas with Cassandra, its the other way round.

Cassandra makes it nearly distributed to implement a foolproof lock correctly?

MrBuddyCasino · on Nov 25, 2016

Yes, nearly! But not quite.

AznHisoka · on Nov 25, 2016

what about using redis for distributed locks? (assuming you are not using redis cluster)

craz · on Nov 25, 2016

You'll want to use a library that implements Redlock: http://redis.io/topics/distlock

SEJeff · on Nov 25, 2016

You really don't want to use redlock: http://martin.kleppmann.com/2016/02/08/how-to-do-distributed...

Antirez's rebuttle didn't really rebuke martin's overall theory that redlock is a very bad locking algorithm: http://antirez.com/news/101

dpark · on Nov 25, 2016

Did anyone else ever publish a meaningful response? I thought Antirez's response was reasonable but I also recognize the inherent bias. I'd love to see responses from other experts, ideally written such that non-experts can reasonably understand.

SEJeff · on Nov 25, 2016

I recall several people on twitter (I follow a lot of distributed systems people / cs professors as it is part of my job to build these things) who agreed with Martin's analysis.

dpark · on Nov 25, 2016

I'll have to see if I can dig some of the responses up. It bugs me that I still don't know if this algorithm is actually safe.

SEJeff · on Nov 25, 2016

Honestly given Antirez's response I'm genuinely surprised no one has written a TLA+ or similar formal verification proof. Then the outcome is binary; either redlock is sound and works as expected, or it is not.

Amazon's AWS Architect, James Hamilton is a big fan of this approach, as are most of the heavyweights in distributed systems:

http://perspectives.mvdirona.com/2014/07/challenges-in-desig...

EDIT: prefaced URL with http

nurettin · on Nov 25, 2016

Thanks, I learned the fencing token lock algorithm from this.

helper · on Nov 25, 2016

I personally wouldn't use redis for anything besides a cache.

fgasparini · on Nov 25, 2016

Why not? It has a WAL persisted to disk at a desired interval. It provides the same durability levels of most databases.

http://redis.io/topics/persistence

helper · on Nov 25, 2016

Let me be clear, I think redis is a well engineered piece of software and I have been constantly impressed with the way antirez runs the project.

Specifically for the question of a locking service, I need more availability than what a single node can offer. I'm not sure if you are advocating for using redis cluster as a locking service, but since it intentionally doesn't offer write safety during a partition that does not seem advisable.

SEJeff · on Nov 25, 2016

https://aphyr.com/posts/283-jepsen-redis

Animats · on Nov 25, 2016

This:

    INSERT INTO locks (id, lock, revision)
    VALUES ('Tom', true, 1)
    IF NOT EXISTS USING TTL 20;

looks like a race condition. The same problem comes up in SQL databases - you can't lock a row that doesn't exist yet. If you write, in SQL:

    BEGIN TRANSACTION
    SELECT FROM locks WHERE id = "Tom" AND lock = true AND revision = 1;
    -- if no records returned
    INSERT INTO LOCKS locks (id, lock, revision) VALUES ('Tom', true, 1)
    COMMIT

you have a race condition. If two threads make that identical request near-simultaneously, both get a no-find from the select, and both do the INSERT. SELECT doesn't lock rows that don't exist.

The usual solution in SQL is to use UNIQUE indices which will cause an INSERT to fail if the record about to be inserted already exists.

I ran into this reassembling SMS message fragments, where I wanted to detect that all the parts had come in. The right answer was to do an INSERT for each new fragment, then COMMIT, then do a SELECT to see if all the fragments of a message were in. Doing the SELECT first produced a race condition.

dhd415 · on Nov 25, 2016

Some SQL databases (e.g., SQL Server and PostgreSQL) offer key-range locking which, along with a serializable isolation level, will prevent this race condition.

linuxhansl · on Nov 25, 2016

ON DUPLICATE KEY is another construct that is used in some DB (MySQL or HBase with Apache Phoenix and others).

chillaxtian · on Nov 25, 2016

this is not a race in cassandra.

IF NOT EXISTS causes replicas to agree on the result using PAXOS, and only responds with success if a certain number of replicas concur + write the transaction to disk.

i believe it is a quorum for SERIAL consistency level, and local quorum (quorum of replicas in the local DC) for LOCAL_SERIAL.

aartur · on Nov 25, 2016

And there's another surprise waiting to be discovered. The execution of a LWT is not guaranteed to return applied/not-applied response [1]. It can raise a WriteTimeout exception that means "I don't know if applied". It looks like in that case it can be worked around by inserting a UUID and in case of a WriteTimeout reading the UUID using SERIAL consistency and checking if it's the inserted UUID. But generally this limitation of LWTs makes implementing some algorithms impossible, e.g. you can't implement a 100% reliable counter.

[1] https://issues.apache.org/jira/browse/CASSANDRA-9328

seanparsons · on Nov 25, 2016

I railed against CQL right from the start and it's precisely because of this kind of thing. Imitating SQL has the side effect of setting certain expectations and drags a certain mental model along with it.

im_down_w_otp · on Nov 25, 2016

If you're not writing purely immutable data or can't 100% guarantee a serialized reader/writer, then you're just looking for trouble with Cassandra.

beefsack · on Nov 25, 2016

The post was quite interesting, but I find image macros and GIFs really distracting in technical writings.

zzzcpan · on Nov 25, 2016

Shouldn't Cassandra be using Lamport timestamps or even vector clocks there? Relying on timer and its resolution sounds strange for a database, especially a distributed one.

parenthephobia · on Nov 25, 2016

Cassandra's timestamps can be specified by the client: it's possible to use any integer as your timestamp, such as Lamport timestamps, or even atomic distributed counters (possibly using Cassandra's own counters).

Vector clocks are still out, though.

Edit: Actually, you can't do Lamport timestamps because you can't query the current value of a timestamp.

Re-edit: That's wrong. I shouldn't believe any old blog I find. (I'm leaving it in the comment because it was quoted in a reply.)

chillaxtian · on Nov 25, 2016

> Edit: Actually, you can't do Lamport timestamps because you can't query the current value of a timestamp.

you can get the timestamp of each column back as part of a SELECT.

is that not enough?

supergirl · on Nov 25, 2016

http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-v...

zzzcpan · on Nov 25, 2016

"Conversely, if there are concurrent changes to a single field, only one will be retained, which is also what we want."

They are wrong though. As HN submission illustrates, people want some order and eventual consistency, not a rule to select a single field during concurrent changes. And this is where a Lamport timestamp could help.

sigy · on Nov 25, 2016

No matter what scheme you use, it is still a rule. Lamport timestamps, whatever are just different ways of resolving a conflict. In general, the fact that you have designed your system to have this conflict in the first place is an easier problem to solve than having users understand more complex resolution methods -- especially those that push a callback into the app design rather than actually just solving the basic problem in a pragmatic way.

leastangle · on Nov 26, 2016

Author here.

Great discussion around the CAP theorem but it misses the point. AP vs CP / Cassandra being AP is not relevant to this particular problem:

1) This is not a distributed systems corner case. You will run into this if you are running Cassandra on a single node. A node should be able to guarantee consistency internally during normal operation. If it is not able to do that, there is something wrong with the system.

2) This is a case where queries are being send from the same process/thread and go to exactly the same nodes. Attach a simple, monotonically increasing query counter to each call and you can easily serialize it on the other side.

sigy · on Nov 25, 2016

I see this as a basic misunderstanding of how LWT works. If you want to ensure serializable operations, then you need to use LWT with preconditions that ensure serializable operations.

Even better, stop trying to emulate the old and tired distributed lock methods that have been proven over and over again to be insufficient.

refresh-creds · on Nov 24, 2016

If it's not obtained synchronously what does the lock achieve?

supergirl · on Nov 24, 2016

Couldn't explain the problem in a less cringy way? Sounds like a bug to me. So file a bug report?

detaro · on Nov 24, 2016

I think not strictly a bug, just surprising behavior. They linked to the documentation that shows that it is "expected" to work like that, and to two requests for features to help mitigate this issue.

supergirl · on Nov 25, 2016

bug/feature whatever. CASSANDRA-6123 is exactly addressing row level consistency. but here maybe they found another case.

agentgt · on Nov 25, 2016

I guess I'm old or just not hip (most likely both) but I had to google WAT (I know WTF but WAT ... never seen it).

Even now I'm still not sure but I presume WAT = what!

ashmud · on Nov 25, 2016

In case this is not the explanation you found: http://knowyourmeme.com/memes/wat

hesselink · on Nov 25, 2016

I know it from this presentation: https://www.youtube.com/watch?v=AU2Rhq5eWa4

pushrax · on Nov 25, 2016

It is a particular form of "what" that is used when dumbstruck.

webmaven · on Nov 25, 2016

Closely related is WUT, particularly the popular variant LOLWUT: http://knowyourmeme.com/memes/lolwut

x0x0 · on Nov 25, 2016

with a strong subtext of what the fuck / you must be fucking kidding me

second_picard · on Nov 25, 2016

Hazelcast has a distributed lock (see http://docs.hazelcast.org/docs/3.7/manual/html-single/index....) and I've used for more than a year to synchronize jobs across a cluster.

steeve · on Nov 25, 2016

Using Cassandra as a lock is a terrible, terrible idea.

known · on Nov 25, 2016

With the Oracle/PostgreSQL, readers never wait for writers and writers never wait for readers http://philip.greenspun.com/sql/your-own-rdbms.html using underlying locking mechanism http://www.beej.us/guide/bgipc/output/html/singlepage/bgipc....

known · on Nov 25, 2016

Its' better to implement https://en.wikipedia.org/wiki/Priority_inversion in all https://en.wikipedia.org/wiki/NoSQL#Types_and_examples_of_No...

cbsmith · on Nov 25, 2016

It feels like it is 2013 all over again: https://aphyr.com/posts/294-jepsen-cassandra

mydpy · on Nov 25, 2016

As others have noted, this blog uses an approach to data modeling that is considered an anti pattern for an AP data store like Cassandra.

neeleshs · on Nov 25, 2016

Anyone has experience around this on HBase, a CP database?

linuxhansl · on Nov 25, 2016

Apache HBase committer here.

HBase is strictly CP (except for its geo-replication, and optional timeline-consistent region replicas).

It uses MVCC for row "transactions" to always keep rows consistent. HBase also has checkAndPut and checkAndDelete primitives, which are atomic, as well as Increment and Append, which are atomic and serializable.

http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html explains it fairly well.

Together with Apache Phoenix you have full multi-row transactions, but they come with a price obviously.

lerax · on Nov 24, 2016

[flagged]

stonewhite · on Nov 25, 2016

100,000 nodes on Apple and 2,500 nodes on Netflix would like to have a word with you.

shin_lao · on Nov 25, 2016

Apple no longer uses Cassandra.

chillaxtian · on Nov 25, 2016

https://twitter.com/ftgfop1/status/774011830733647872

jeromatron · on Nov 25, 2016

I'm not sure how you got that impression but they still have huge deployments of Cassandra.

phamilton · on Nov 25, 2016

Maybe they are thinking of Facebook.

sigy · on Nov 25, 2016

Some details besides ad-hominem might actually be helpful. When you make statements like this it just looks like trolling.

vesak · on Nov 25, 2016

Cassandra is not a person, so that was not ad hominem.

pekk · on Nov 25, 2016

You are technically right (the best kind), but the underlying idea is valid, that this is just cussing out a product with no technical detail, and that's not constructive.

pkolaczk · on Nov 25, 2016

Technically, Cassandra was a person in Greek mythology. So still ad hominem.

sctb · on Nov 26, 2016

You can't comment like this here. Please post civilly and substantively or not at all.

https://news.ycombinator.com/newsguidelines.html

logicallee · on Nov 25, 2016

I second other people's responses that you should clean this up and add a lot of substance.

Often the best comments start off exactly like yours: it's clear you feel very strongly about the issue, so edit it and say how/why!! list the points that irked you.

angersock · on Nov 25, 2016

Maybe your startup just sucked at Cassandra?

late2part · on Nov 25, 2016

To be fair, it's easy to suck at Cassandra and most folks that use it used to suck at it.

SomeStupidPoint · on Nov 25, 2016

To be fair, some starups are being trendy and using Cassandra when a traditional SQL database would better fit their needs.

One I worked at used a framework designed for SQL, wrote a Cassandra storage backend, didn't denormalize data at all (!) and then (badly) performed JOINs in software because the framework was built around the storage having relational operations (!!) topped off by their backend implementation not being aware of partitioning keys, and hence using unique values for them (!!!).

The first two I discovered when trying to diagnose why certain parts of our website/API were extremely slow. The third I discovered mid-import when suddenly the (staging) cluster stopped being able to process queries.

There wasn't any problem Cassandra was solving, our read volume was much higher than our write volume (like, 10x-100x), and we didn't even have that much data in the first place. Premature optimization (for a problem we were a long way from having) and being trendy.

I don't think Cassandra is particularly bad (and can be used in awesome ways for high-write, low-read immutable timeseries data) but you have to use Cassandra as Cassandra, not as a trendy MySQL.

bbcbasic · on Nov 25, 2016

Bet that was an optimal solution for the devs CVs and attracting VC money.

late2part · on Nov 25, 2016

Agreed. If Writes >> Reads and TPS > 1M Maybe Cassandra :-)

chillaxtian · on Nov 25, 2016

agreed, cassandra is sharp edges all around.

late2part · on Nov 25, 2016

If you or anyone else enjoys it - ping me via email, I might have a cool job for you.

angersock · on Nov 25, 2016

You should probably put your email in your profile then... ;)

late2part · on Nov 25, 2016

My bad. Added.