You are leaving out CockroachDB from that group as well. But it's not really a new generation of consensus algorithms. There is still Paxos, 2PC, etc., just more granularity. The trade offs article discusses still apply though, you still trade something in each case for consistency, including latency.
It's a shame these "NewSQL" databases attack AP databases, claiming that most projects don't need them. While in reality latency is super important and consistency is something of an afterthought, even for hardcore RDBMS users.
Sharding sucks and we really need to get away from it. I wonder if there's any databases like these that use a hierarchy to distribute and manage access of data rather than with a flat topology of arbitrary chunks of data. Currently there's a lot of shitty hacks, like memory-resident databases in front of disk-backed, proxies, etc. We can do better than that if we build hierarchical data and node management into the network.
I mean, isn't that still just sharding? What does hierarchy give us?
Look at Bigtable [1], for example. Tablets are organized in a three level B-tree-like hierarchy, where data is only stored at the lowest level. Is this like what you're talking about?
Not necessarily. It's good to use a hierarchy, but they're still sharding. And the client has to contact all the tablets directly. And the client has to use certain key naming conventions to benefit from the sharding. And there's a seemingly unnecessary amount of round trips. And a lot of other extra complexity, to the point where a bigtable can perform very efficiently, or it can perform many times poorer, depending on many factors. It sounds like a pet rather than cattle.
The downside of sharding is operationally it's a pain, and operational pain translates into pain either in your wallet or for your employees or customers, none of which you want.
Hierarchy is supposed to give you management, access, and performance gains, but if it's too specific you have to use the data exactly how the database wants you to and then you can't take advantage of general interfaces. It shouldn't get in your way, it should just give you free performance boosts with known guarantees and be as simple as possible while also surviving scaling.
I'm not saying this is easy, or even feasible, but it's where we should be headed if we want dev, ops, and the people in suits to be happy.
> The downside of sharding is operationally it's a pain
I am really curious about this premise. Isn't it only a pain if there's no direct system support? (i.e. you have to manually rebalance shards, split up queries at the application layer, etc.) Isn't this the whole thing that NewSQL systems are trying to solve?
Moreover, I don't quite understand what benefits a hierarchical structure provides that a flat (or mostly-flat) partitioning scheme doesn't, ceteris paribus in terms of system support. Especially: doesn't a hierarchical structure mean it takes O(log n) round trips to resolve a lookup rather than just O(1) as in a flat partitioning?
Holy cow, this is an intense article, with great definitions for a bunch of detailed variations of distribute system jargon, and even has some nice images to explain the ideas!
My specialty is in eventually consistent data types, which there section is much smaller on, but they have some links to follow up on CRDT stuff.
> are often limited in functionality and impose performance overheads
This is for the most part true, with CRDTs. However we just formalized and proposed in a new paper, with some Stanford colleagues, how to construct a generalizable CRDT. One that actually lets other more case-specific CRDTs be built on top of it, which we use at gunDB, so that way you can get the advantages of the more optimized CRDTs but still have the rest of your data glued together.
Another set of really interesting CRDTs which they didn't mention, are the ones used for making decentralized versions of Google Doc:
- And we recently did a "layman" animated explainer of a similar algorithm, that explains it for the non-academically minded, with simple analogies to "distributed systems" that happen in every day life: http://gun.js.org/explainers/school/class.html
I'll definitely be referencing the OP's article in the future though. Even though I'm in this world, I constantly get the vernacular mixed up. And this will be a really great dictionary to use.
You can read a comparison between Spanner's atomic-clock based approach and FaunaDB's consensus algorithm here: https://fauna.com/blog/distributed-consistency-at-scale-span...
And if you want to look at how these consistency tradeoffs show up in application programming, check out the bit about serialized indexes in the second half of this blog post: https://fauna.com/blog/consistent-transactions-in-a-globally...