Not to detract from the very intelligent and reasoned posting, but what tiny per...

kev009 · on April 3, 2014

That's a very obtuse point of view. I'm curious sociologically: what field do you work in and what your exposure to data is?

Consider an inventory system for a big box retailer. I can't think of anything better than a fat-ass RDBMS as the primary data store. Sharding sounds like a horrific idea. There are myriad workloads like this.

Personally, I've seen pgsql handle terabytes of data just fine and it wasn't really noteworthy or a source of problems to even bring up considering something else. YMMV but it's a good idea to use logic and reason to dictate architecture instead of following the shiny thing or hubris.

davidw · on April 3, 2014

Well, yes and no.

Everybody knows that relational databases don't scale because they use JOINs and write to disk.

Also, relational databases weren't built for web scale. MongoDB handles web scale. You turn it on and it scales right up.

And before you knock shards, shards are the secret ingredient in the web scale sauce. They just work.

Furthermore, relational databases have impetus mismatch, and Postgresql is slow as a dog. MongoDB will run circles around Postgresql because MongoDB is web scale.

ro_sharp · on April 3, 2014

Are you being intentionally sarcastic? Because this reads a lot like http://www.mongodb-is-web-scale.com/

Edit: Whoops, just read your reply :)

GFischer · on April 3, 2014

His post was a textbook example of Poe's Law

http://en.wikipedia.org/wiki/Poe's_law

"without a clear indication of the author's intent, it is difficult or impossible to tell the difference between an expression of sincere extremism and a parody of extremism"

pistle · on April 3, 2014

Thank you for this. TO all the haters, davidw is quoting a funny animated pokeyfun of people who lightly consider a DB problem and throw out the NoSQL mantras, without even understanding what they, themselves, are even saying and implying.

rimantas · on April 3, 2014

Check this "web scale" out: http://smalldatum.blogspot.com (via http://dom.as/2014/03/31/mongo-io/ )

davidw · on April 3, 2014

I was actually referring to this: http://www.mongodb-is-web-scale.com/ - which is what contingencies' comments reminded me of, but I guess people either didn't get it or thought it was a bit stale. C'est la vie.

I am a happy Postgres user and always default to it unless I am really sure a project calls for something else.

captainmojo · on April 3, 2014

It cracked me up!

And same here, I tell people to start their datastore selection with looking for a reason NOT to use Postgres.

sanxiyn · on April 3, 2014

You know what is web scale? WebScaleSQL is. :)

http://webscalesql.org/

contingencies · on April 3, 2014

Sure, if you are running stats across everything in a nontrivial and frequently changing way, then you have a great ally in an RDBMS. But I don't believe many people do that, because usually that sort of stuff is pretty damn predictable, executed offline, or can be consolidated from shards.

However, if you have any of the following: (1) vastly different security requirements for different parts of your datastore (2) vastly different backup schedules or temporal sensitivities (3) privacy requirements deriving from different legal jurisdictions (4) wish to scale by running on commodity hardware (5) cannot tolerate any downtime whatsoever ... and probably many other cases ... then in my experience you are going to meet some serious issues with conventional RDBMS, at least with the vast majority of configurations.

I'm all for logic and reason too... but your comments seem closer to name-calling and a single example.

arethuza · on April 3, 2014

Apart from your point on "wish to scale by running on commodity hardware" I'd say that relational databases handle all of those other things pretty well - might cost you an arm and a leg for the licenses, hardware and network connections but those goals are achievable.

Anyway, in a lot of environments it's application's that drive choice of database engine - not the other way round.

kev009 · on April 3, 2014

I counter that many people have met each of your numbers for the past 20 years using commercial RDBMS.

I can't think of anything that is magnificently easier or better at solving your numbers, especially all together. #4 seems less relevant, is it really cheaper than operationalizing a distributed system? These days, likely for situations where consistency can be relaxed. Not so for many business workloads.

Can you enlighten us with some example products for your numbers?

contingencies · on April 3, 2014

Haha, went out and these comments got downvoted to pluto. Honestly though, I haven't heard a decent argument in response other than "lazy is good". Sure, but architecturally, you're basically in the "engineers run the architecture" or "its an architecture of convenience for business purposes" camp. I'm in the former, I'd like to hope that some nontrivial subset of the participants here are in the former, but most are no doubt in the latter. People get upset when you slight their world. That's understandable. The TLDR is: even if people made a lot of stuff happen 20 years ago; it doesn't justify using the same methods today, and discussing the tradeoffs is constructive not dismissive.

kev009 · on April 4, 2014

The reason you're getting down voted so heavily is because you lobbed heavy accusations without any backup (projects, whitepapers, journal submissions please). Distrusted databases are still a specialty today, mainly because they have inherent tradeoffs. If you don't understand how hard those tradeoffs are you SHOULD NOT be using a distributed database by default. I was hoping maybe you had something tangible to share.

contingencies · on April 4, 2014

If you look at what I actually said, I was expressing some skepticism with regards the payoff from investing time on very low level optimizations on conventional RDBMS for most workloads versus sharding the database and/or migrating to other storage models. That's a tangible line of thinking to consider. Note that I did not at any point say "someone's PhD asserts...", talk in absolutes, or slam RDBMS as a potentially viable or proven option.

cwyers · on April 3, 2014

"However, in this day and age using them just feels a little ... lazy ... for most workloads."

IN DEFENSE OF BEING LAZY AS A PROGRAMMER

The essential mission of a computer programmer is to use computers to solve problems. Being lazy can come in one of two forms:

1) Solving problems badly or not solving them at all, or 2) Relying on someone else's solution instead of coming up with your own.

Using a RDBMS is Type-2 Lazy. Now, I want you to get out a pen and paper and write this next bit down, because it is the most important thing you will ever learn:

EVERYBODY SHOULD BE TYPE-2 LAZY BY DEFAULT, ONLY DEVIATING FROM THIS IF THERE IS A COMPELLING REASON NOT TO.

Why?

1) Other people's solutions have been used, which means they've been tested in real-world use. Things you haven't thought of yet because you don't yet have a working solution have been at least discovered, because people are using it. Sometimes they're even addressed. 2) Other people's solutions may have tools, documentation and communities built around them, making them easier to learn about, use and work with.

There are two decades of work put into Postgres itself, and even longer periods of work put into the general field of relational databases. Corner cases you can't even conceive of have been encountered and patched for. The entire codebase of Postgres contains large amounts of accumulated wisdom on how to store data in a safe and retrievable fashion. And large communities have sprung up, to provide you with tools and wisdom on how to use it to best suit your needs.

NoSQL databases are useful for certain workloads and setups. It would be absolutely wrong to dismiss them out of hand. Having said that, anyone whose DEFAULT PREFERENCE is to eschew traditional RDBMS as a data store in favor of software that has been around for less than a quarter of the time that even the newer of the popular RDBMS systems have been around because using well-tested solutions is LAZY needs to have a restraining order keeping them at least 100 yards away from a keyboard.

contingencies · on April 3, 2014

Absolutely agree. The compelling reason can be business requirements as previously noted, eg. scalability, security, law. Unfortunately if you're doing something global and non-trivial that's the rule rather than the exception, in my experience.

Confusion · on April 3, 2014

Don't feed the troll.