To my primitive brain, NoSQL always sounded like "I am going to makeup data model and integrity on the fly and let the user figure out how to handle it". No Thanks. But may be I don't get it. GraphQL honestly gives me the same vibes. Happy to be corrected.
Not that this is particularly relevant to the discussion, but GraphQL is actually much closer to SQL than most NoSQL solutions because it's statically typed and the schema is statically defined. The GraphQL schema acts as the contract of what the API can serve to the client, much like a schema in a relational database.
I also wanted to kill old, obsolete, crummy SQL when I first learned it as a teenager. It reminded me of FORTRAN or COBAL or something.
It took quite a lot of theory and maturity to understand why RDBMS is the right paradigm. SQL is imperfect, but it's good enough to not be worth replacing.
Whenever I've seen benchmarks of postgresql managing JSON versus mongodb, postgresql won (at least comparing apples-to-apples; postgresql is much more conservative by default, in terms of reliability and data consistency, than mongodb). There's literally no reason to going with a lot of NoSQL databases over using a SQL database as a key-value store.
> SQL is imperfect, but it's good enough to not be worth replacing.
Great way to put it.
> There's literally no reason to going with a lot of NoSQL databases over using a SQL database as a key-value store.
To be devil's advocate, the arguments I hear range more on ease of ops/scalability for noSQL, apparently some do it better than SQL implementations, and some drivers can make trivial operations easier as well. That said, I'd pick PostgreSQL for pretty much any new project I get to work on.
It turns out that most SQL databases, when used as a key-value store, shard better than most NoSQL databasing.
You run into issues with things like JOINs across shards, but there's literally no upside even there to NoSQL versus using a SQL databased in a disciplined way. I've designed systems with the same type of infinitely horizontally scalable KVS-type storage for all the changing state in SQL.
SQL also means you can do things like:
- Do local joins (if e.g. all the data for a user is in the same shard)
- Keeping a small set of relational data (not the stuff which needs to scale) and have one technology
- Use read replicas to scale some of the stuff which doesn't fit in the "infinitely horizontally scalable KVS" model
.... and so on. You can't not think about it, but if you do think about it, it's all upside and no downside.
1) "RDBMS is the right paradigm." That's high praise. RDBMS is exactly the Right Thing, with a trademark and all.
2) "SQL is imperfect, but it's good enough to not be worth replacing." That's definitely not high praise. It means that this implementation has all sorts of warts and annoyance. However, that doesn't rise over the bar of replacing.
As a teenager, I definitely did conflate the two. Perhaps that's where your misread came from?
It probably comes from 90% of discussion on SQL revolves around de-conflating the two. It’s pretty much impossible to complain about SQL without clarifying the difference, and even then. Really I think we should just permanently assume the other party has never conceived of such a difference, no matter how much they insist otherwise.
SQL is here to stay, if only because few can be convinced that another language could query the RDBMS directly (without first compiling down to SQL). Peolle also keep persisting in this belief it’s standardized, for god knows what reason
In abstract, I don't see a lot of downside to compiling down to SQL as an intermediate format. Or vice-versa, for that matter.
In practice:
1) Most attempts to do this historically have missed the point of an RDBMS (e.g. most ORMs).
2) SQL is cumbersome, but it's not quite cumbersome enough I'd want to bother with something else.
As a footnote, I find SQL-grade standards to be incredibly helpful. In 2022, I wish there were a strict standard, but in new domains, standards like this mean:
1) I can read code for virtually any database, with maybe gentle use of search engines, unless I get into really hairy corners. Ultimately, those, I can look up too.
2) If I write code for a new database, I don't need to learn anything; I can understand syntax with a few web searches.
3) For 90+% of SQL, I can write automatic scripts quickly and easily to translate between databases (when migrating, the remaining few percent, someone can do by hand, and it's a very tractable chore).
That's not the case jumping into a graph database or other stuff.
At the same time, they don't over-constrain things. If you want your database to have a BCD type, JavaScript stored routines, or some wonky form of virtual tables, you can.
Now, in 2022, we know enough to standardize all of this stuff, but I'm not sure we did when these tools were coming out.
As a footnote: I'm working in a domain with no standards, and if everyone could just pick JSON (or XML, or just about any one thing), we'd already have 50% of the benefit of a full standard. If people standardized a few nouns and verbs (e.g. 'user' versus 'user_id' versus 'actor' versus 'agent' versus ...), we'd be another 50% of the way there. Systems do different things, and I don't think going 100% of the way makes sense until we understand the domain better, but partial standardization is a huge win. I've been pushing hard for having partial standards, and hard for not having full standards yet.
Sometimes you just want to store data and worry about validation later. I'm in that situation now, with different tasks requiring different kinds of validation for the same data. Sometimes I want all the postal codes that users have entered, sometimes I want only the valid postal codes.
Just to be pedantic you can use SQL to query NoSQL databases, just look at the PartiQL ExecuteStatement API for DynamoDB. I assume your question is about why anyone would use a NoSQL databases instead of a relational database. The short answer is scale. If you have well define access patterns that rely only on a primary index, you can effectively scale to billions of QPS. If you want to make relational queries, and/or can’t know your access patterns in advance, use a relational database.
NoSQL is like simple assignment to storage and one layer of obvious indexing. Its lower level, and therefore much faster and predictable because there is no query planner guessing how to assemble things.
Tuning SQL on the other hand is a guessing game with the query planner. You can express a lot more but 1% of those expressions could down your DB.
I like both, but if you want performance NoSQL is the winner.
It matters to clarify this because if you get rid of every use case SQL+RDBMS optimize for, NoSQL is obviously going to be the winner. A car with no safety measures and without most consumer-oriented features is probably going to be "faster" than one that has them.