I inherited a graphql project that was a work of art in that respect. All access control etc was done in the database and triggers were effectively used the works.
The resulting application was a complete performance failure. Databases are slow and generally involve latency. They are a source of truth and an engine for querying.
They are not meant to be your applications working data model.
Especially since its literally untrue. SQL, with the addition of recursive queries (supported by all since the late 90s AND used in an example within this very article), is Turing complete. There's literally no operation on data that can't in principle be expressed as an SQL query (albeit the construction might be painful and tedious).
But in all seriousness, DataFrame centric operations are a superset of SQL (you can always do a df.sql("...") if you want to ) and have a lot more efficient implementation of both OLTP/ORM requirements and OLAP/DS/BI requirements.
They also encourage composability, modularization, reuse, unit and data testing ...
So it's ironic I feel like SQL's replacement is another declarative language - its original inspiration - plain English.
Just a natural language transpiler (like Palantir's Ontology plus Looker's Malloy (reverse disclaimer: I do not work for or enjoy either of these products but these underlying concepts are correct)) with some fancy domain heuristics and light AI (I suspect a Pareto like model that supports 80% of use cases only needs a semantic graph with a few thousand nodes and vertices)
> There's literally no operation on data that can't in principle be expressed as an SQL query (albeit the construction might be painful and tedious).
There are some problems though:
- Single query can't write to multiple tables.
- Single query can't return multiple resultsets.
You can write a stored procedure, but this is no longer "an SQL query", strictly speaking. And once you start writing stored procedures, you are no longer using just SQL, but whatever Ada-inspired procedural extension to SQL was implemented by the database vendor. In other words, you are using "a traditional programming language to work with the data".
A single query can write to multiple tables, using CTE's in PostgreSQL for example.
You could compose a SQL query that allows you to map multiple resultsets to 1 resultset, although that feels a bit awkward.
WITH a AS (
insert into a (k, v) values ('a', 1.0) returning *
), b AS (
insert into b (k, v) values ('b', 2.0) returning *
)
SELECT
row_to_json(a)
FROM
a
UNION ALL
SELECT
row_to_json(b)
FROM
b;
You can only select, you can't nest cte, the query planner has no understanding of joins that cross cte boundaries so they are totally unoptimized, you can't use distinct or group by or etcetc. Really the CTE implementation in SqlServer is basically a parser level hack.
Show me any sql dialect that will allow that CTE you give here (edit; and what on earth is that supposed to actually mean)
> Or can you only use SELECT for WITH queries
you can only use a select inside a cte (or should be able to) because the 'e' stands for 'expression'. It seems postgres does allow an insert with and output which sort of makes sense but I doubt it's in the standard.
Your last sentence makes no sense to me. Give an example.
Also you failed to give an example that group by/distiinct weren't allowed in ctes.
Good to know, I wasn't aware PostgreSQL supports this.
I'm currently on SQL Server and it doesn't support INSERT as a CTE (and I think most DBMSes out there still don't). It would definitely make my life easier if it did...
> There's literally no operation on data that can't in principle be expressed as an SQL query
That may be technically true (or not, I don't know), but in many cases some complex data manipulation (especially when it's done in multiple passes) practically needs a lot of ram and a programming language to be more time efficient.
Technically true, but sql has the ability to induce discipline - every single time I’ve seen folks given the opportunity to do the analysis off a database or warehouse and use pandas/spark, they almost always end up writing bad code, processing one row or cell at a time. Almost always, they could have done it with sql itself and gotten much better performance but this “extra flexibility” allowed them to be lazier.
Sql can also have the same issue with feature creep (snowflakes array columns often invites people to write pathetic sql trying to join on this col for example) but with the right design you can force your users to interact with your data more efficiently.
Them's fighting words :)
But point taken regarding the benefit of modern language tools applied to query-based work.