This isn't anything against Redpanda, but I'm always amazed how badly all these distributed databases do in Jepsen.
What would one use them for in practice, which wouldn't be better suitable by a (the thing I've used), say postgresql and streaming replication in case the server goes down? (I'm not suggesting there isn't a good application, just I'm not knowledgeable enough to know of one).
When a distributed database is designed, you must navigate and optimize several complex technical tradeoffs to meet the architecture and product objectives. The specific set of tradeoffs made -- and they are different for every platform -- will determine the kinds of data models and workloads that the database will be suitable for, especially if performance and scalability are critical as in this case.
The reason distributed databases tend to be buggy, especially in the first iterations, is straightforward if not simple to address. While it is convenient to describe technical design tradeoffs as a set of discrete, independent things, in real implementation they are all interconnected in subtle, complex, nuanced ways. Modifying one design tradeoff in code can have unanticipated consequences for other intended tradeoffs. In other words, there isn't a set of simple tradeoffs, there is a single extremely high-dimensionality tradeoff that is being optimized. Not only are complex high-dimensionality design elements difficult to reason about when writing code the first time, any changes to the code may shift how the tradeoffs interact in non-obvious ways. Humans have finite cognitive budgets, so unless it is obvious that a code change has the potential to have unintended side effects, we generally don't spend the time to fully verify this fact.
I can't tell you how many times I've seen tiny innocuous code changes alter the behavior of distributed databases in surprising ways. This is also why once the core code seems to be correct, people are reluctant to modify it if that can be avoided at all.
I'm constantly surprised more folks don't use FoundationDB, I'm pretty sure the Jepsen folks said something to the tune of the way FoundationDB is tested is far beyond what Jepsen does (Good talk on FDB testing: https://www.youtube.com/watch?v=4fFDFbi3toc).
My read is that most use cases just need something that works _enough_ at scale that the product doesn't fall over and any issues introduced by such bugs can be addressed manually (i.e. through customer support, or just sufficient ad-hoc error handling). Couple that with the investment some of these databases have put into onboarding and developer-acquisition, and you have something that can be quite compelling even compared to something which is fundamentally more correct.
As someone who is switching to FoundationDB: because it's not easy. It doesn't look like other databases, it isn't in fashion (yes, these things matter), and it requires thinking and adapting your application to really use it to its full potential. It could also benefit from a bit more developer marketing.
Having looked at FoundationDB a bit it wasn't clear why I would choose it. It has transactions, which is nice, but not that big of a deal despite how much time they put into talking about it. I actually don't even need transactions since all of my writes commute, so it's particularly uninteresting to me.
They say they're fast, but I didn't find a ton of information about that.
Ultimately the sell seemed to be "nosql with transactions" and I just couldn't justify putting more time into it. I did watch their excellent talk on testing, and I respect that they've put that level of effort into it, and it was why I even considered it, but yeah, what am I missing?
Different systems solve different problems and have different functional characteristics. Actually one of the thing which Kyle highlighted in his report is write cycles (G0 anomaly), it isn't a problem of the Redpanda implementation but a fundamental property of the Kafka protocol. Records in Kafka protocol don't have preconditions and they don't overwrite each other (unlike the database operations) so it doesn't make sense to enforce order on the transactions and it's possible to run them in parallel. It gives enormous performance benefits and doesn't compromise safety.
There's a lot of different ways to answer this, but I think about it as a different architectural paradigm. Yes you can do stream-ish things with Postgres but at some level of scale you'd be putting a square peg in a round hole.
Databases are only as bad as the filesystem wrt being mutable, but since we do expect them to be a "better filesystem" it's surprising we let them get away with losing data. IMO beyond transactions you should just be able to unwind any writes to a SQL database, including deletes, for at least a day.
But if you ask Alan Kay he'd say all programs should have an explicit concept of time and be able to operate on the past and future state of everything.
CAP-theorem and configuration. When choosing a distributed database you are in a way already by definition giving up a chunk of of the C, Consistency.
Once you have a distributed database, they often have a myriad of tuning parameters that all impact in which corner of the CAP triangle you want to be. Using this you have to choose what risks you are willing to accept. If all my replicas are all in the same rack, any timeout-issues found would often just be academical and I can pragmatically design such a system very differently, vs if they are in different parts of the globe.
I might also have such high influx of low value data that I can accept some losses, the cost of a total deadlock would be more than just one transaction lost in cyberspace, not everyone is building a bank. That said, it still sucks a lot to have inconsistent data regardless of your application, so in such cases aim for lost data rather than wrong data. So in theory, the distributed approach is consider wrong, but in practice it might just be good enough.
This also ties into how you model your data, a lot of the faults found in the latest mongodb analysis was around multi-document transactions, but nobody uses mongodb this way, mostly it’s just a place where you dump standalone documents into it.
In the end, you have to go back to original question of why you are choosing a distributed DB in the first place, is it for scale, HA, regionality or other reasons. Then design around that, it’s never a silver bullet fix all solution.
Taking your example of streaming replication. How would that behave if primary acked to the client and then crashed, before the replica received the transaction? The alternative of waiting for the replica before you ack instead gives you 2 sources of failure. You’ve now reinvented a distributed system and are now in the same soup as all these other databases, just with other tradeoffs. :)
> When choosing a distributed database you are in a way already by definition giving up a chunk of of the C, Consistency.
Er, I don't think that's right. Distributed systems (even those running in asynchronous networks) can and often do satisfy linearizability. Gilbert & Lynch's proof of the CAP theorem just says that if you do choose linearizability (C) in an asynchronous network (P), you can't also guarantee total availabilility (A)--under some network faults, some operations may not complete. https://users.ece.cmu.edu/~adrian/731-sp04/readings/GL-cap.p...
totally different approaches tho. people have tried what you proposed many times before and for some scale succeeded. hard to compare at all when you dig into the details.
expect a companion post. this was super fun to partner with kyle on this. +1 would recommend to anyone building a storage system.
What would one use them for in practice, which wouldn't be better suitable by a (the thing I've used), say postgresql and streaming replication in case the server goes down? (I'm not suggesting there isn't a good application, just I'm not knowledgeable enough to know of one).