Distributed algorithms in NoSQL databases

jchanimal · on Sept 29, 2017

This article leaves out the new generation of distributed consensus algorithms used by databases like Spanner and my employer FaunaDB.

You can read a comparison between Spanner's atomic-clock based approach and FaunaDB's consensus algorithm here: https://fauna.com/blog/distributed-consistency-at-scale-span...

And if you want to look at how these consistency tradeoffs show up in application programming, check out the bit about serialized indexes in the second half of this blog post: https://fauna.com/blog/consistent-transactions-in-a-globally...

zzzcpan · on Sept 29, 2017

You are leaving out CockroachDB from that group as well. But it's not really a new generation of consensus algorithms. There is still Paxos, 2PC, etc., just more granularity. The trade offs article discusses still apply though, you still trade something in each case for consistency, including latency.

It's a shame these "NewSQL" databases attack AP databases, claiming that most projects don't need them. While in reality latency is super important and consistency is something of an afterthought, even for hardcore RDBMS users.

0xbadcafebee · on Sept 29, 2017

Sharding sucks and we really need to get away from it. I wonder if there's any databases like these that use a hierarchy to distribute and manage access of data rather than with a flat topology of arbitrary chunks of data. Currently there's a lot of shitty hacks, like memory-resident databases in front of disk-backed, proxies, etc. We can do better than that if we build hierarchical data and node management into the network.

elvinyung · on Sept 29, 2017

I mean, isn't that still just sharding? What does hierarchy give us?

Look at Bigtable [1], for example. Tablets are organized in a three level B-tree-like hierarchy, where data is only stored at the lowest level. Is this like what you're talking about?

[1] https://research.google.com/archive/bigtable-osdi06.pdf

0xbadcafebee · on Sept 30, 2017

Not necessarily. It's good to use a hierarchy, but they're still sharding. And the client has to contact all the tablets directly. And the client has to use certain key naming conventions to benefit from the sharding. And there's a seemingly unnecessary amount of round trips. And a lot of other extra complexity, to the point where a bigtable can perform very efficiently, or it can perform many times poorer, depending on many factors. It sounds like a pet rather than cattle.

The downside of sharding is operationally it's a pain, and operational pain translates into pain either in your wallet or for your employees or customers, none of which you want.

Hierarchy is supposed to give you management, access, and performance gains, but if it's too specific you have to use the data exactly how the database wants you to and then you can't take advantage of general interfaces. It shouldn't get in your way, it should just give you free performance boosts with known guarantees and be as simple as possible while also surviving scaling.

I'm not saying this is easy, or even feasible, but it's where we should be headed if we want dev, ops, and the people in suits to be happy.

elvinyung · on Sept 30, 2017

> The downside of sharding is operationally it's a pain

I am really curious about this premise. Isn't it only a pain if there's no direct system support? (i.e. you have to manually rebalance shards, split up queries at the application layer, etc.) Isn't this the whole thing that NewSQL systems are trying to solve?

Moreover, I don't quite understand what benefits a hierarchical structure provides that a flat (or mostly-flat) partitioning scheme doesn't, ceteris paribus in terms of system support. Especially: doesn't a hierarchical structure mean it takes O(log n) round trips to resolve a lookup rather than just O(1) as in a flat partitioning?

vikiomega9 · on Sept 29, 2017

I've always thought about sharding with a caveat of some notion of an implicit index either via timestamps or composite keys.

virmundi · on Sept 29, 2017

It's an interesting article. Given some of the grammar errors, does anyone think that it's written by a computer?

bruth · on Sept 29, 2017

Should be noted it is from 2012 in the title.

random023987 · on Sept 29, 2017

Nice article, but the arrows are wrong on the push-pull diagram under gossip protocols.

marknadal · on Sept 29, 2017

Is this website related to the popular http://highscalability.com/ ?

Holy cow, this is an intense article, with great definitions for a bunch of detailed variations of distribute system jargon, and even has some nice images to explain the ideas!

My specialty is in eventually consistent data types, which there section is much smaller on, but they have some links to follow up on CRDT stuff.

> are often limited in functionality and impose performance overheads

This is for the most part true, with CRDTs. However we just formalized and proposed in a new paper, with some Stanford colleagues, how to construct a generalizable CRDT. One that actually lets other more case-specific CRDTs be built on top of it, which we use at gunDB, so that way you can get the advantages of the more optimized CRDTs but still have the rest of your data glued together.

Another set of really interesting CRDTs which they didn't mention, are the ones used for making decentralized versions of Google Doc:

- Martin's Kleppmann has an exceptionally great/interesting talk on this: https://www.youtube.com/watch?v=yCcWpzY8dIA&feature=youtu.be...

- And we recently did a "layman" animated explainer of a similar algorithm, that explains it for the non-academically minded, with simple analogies to "distributed systems" that happen in every day life: http://gun.js.org/explainers/school/class.html

I'll definitely be referencing the OP's article in the future though. Even though I'm in this world, I constantly get the vernacular mixed up. And this will be a really great dictionary to use.

bullen · on Sept 30, 2017

I think most databases are missing a fundamental requirement to make high latency distribution possible without massive headaches:

Async HTTP clients with thread pool and non-blocking IO.

http://github.com/tinspin/rupy/wiki/Fuse