These are all really great complaints; and yet I still prefer writing plain SQL to using ORMs and their nifty tricks. I think the main reason is that I don't ever want to have a lot of SQL logic in my code. Short queries (say, 10 lines or less) are okay, if they're one-off and pertain to the function's logic and are never repeated by other pieces of code. But writing complex queries through an ORM layer means they're going to be in code -- and harder-to-maintain -- than if they were where they should be: In views, functions and procedures. ORMs also obscure the execution plan and make optimization more difficult, which in production isn't something I really want to worry about.
Writing SQL is like uploading a program; just like a shader to a GPU. And if the network stopped being the bottleneck, the DB server's load would become the problem. Just like with a GPU, you want to upload your programs and data as infrequently as possible. But unlike a GPU, you have full flexibility to permanently store your logic right there in your SQL DB. If it's well-documented, it's much more streamlined to write views and views-on-views, than to keep all that DB logic in your code. I don't find that it becomes a mess at all, because properly crafted views and procedures strongly encourage forced indexing and checking execution plans when you refactor them; as opposed to in-code, ad-hoc queries where it's hard to know if someone ever ran tests to optimize them.
Some statements do need to be very large, for efficiency. Others are better being broken up into sub-procedures, temp tables or even roundtrips. A great query is a work of art where each part of it is optimized not just for execution plan but also for read/write, network state and expected load. What I like about SQL is the flexibility to decide which way to go, and prioritize based on how often something will be used and what kind of strain it'll place on each of the layers in the stack. As with anything powerful, there are a million ways it can be abused. But what I've found in the ORM space and also with noSQL solutions is that simplicity of use comes at the price of flexibility and control.
[edit]
>> null handling is complicated
This would probably also be my biggest complaint about the language itself; the existence of IS NULL / IS NOT NULL and both null-safe and -unsafe comparisons gets confusing (and especially painful if you're trying to write a parser). I think it's one of those things like `undefined` vs `null` vs `false` in Javascript where, on second thought, it's just too late to fix it without breaking half a billion websites.
> I don't find that it becomes a mess at all, because properly crafted views and procedures strongly encourage forced indexing and checking execution plans when you refactor them; as opposed to in-code, ad-hoc queries where it's hard to know if someone ever ran tests to optimize them.
I have to provide caution here, when you create views and stored procedures, I've always found it a mess to maintain:
1) VCS for stored procedures and views is usually non existent. Good luck understanding how these change over time and who changed them.
2) Deploying application logic alongside these systems is very painful, because unless you've versioned your views/stored procedures, when you make a breaking change you need to stop the entire application, make the change in the DB, and restart the entire application again on the new version. Zero-downtime deploys would require versioned views/stored procedures.
3) It quickly becomes a challenge to answer the question "what happens when I do X?" reliably, where X is insert, delete, or even select a row. Once you have complex enough stored procedures, modifying the data changes things in unexpected or unintended ways. A dev unaware of how the system works might assume that inserting a row simply inserts a row, but instead due to stored procedures hidden from application logic, it might cascade into inserting rows elsewhere, deleting rows, or modifying other rows in other tables. Discovering these issues without knowing that they could exist is often done around midnight during a production outage, because a new feature was released that did something in the DB that was presumed safe, but wasn't. If the code for the business logic were in one place, the application, it would been much easier to see what the change would actually do.
I understand entirely that performance gains from good DB use are astronomical, but in my humble experience, I've found them to be more trouble in the long run than they are worth.
> VCS for stored procedures and views is usually non existent
This is a mindset problem, not a technology problem. Treat your stored procedures/functions/views like code, not like data. Keep a separate "code" schema, put your procedures/functions/views definitions in files, and store your files in Git, just like Java or Ruby code. Deployment then becomes an automated drop and recreate of the code schema in a single transaction with zero downtime.
> Deploying application logic alongside these systems is very painful
This is not my experience at all. The stored procedures form an API. If changing the API breaks your application, you are doing something fundamentally wrong. API versioning should only be necessary if third parties depend on your API, but I wouldn't recommend giving third parties direct access to the database anyway.
> Once you have complex enough stored procedures, modifying the data changes things in unexpected or unintended ways
I assume you mean triggers because stored procedures don't happen as side effects, you have to call them explicitly. Regarding triggers, I agree with everything you say.
Re: VCS, I’ve found that with a small amount of setup, tools like Liquibase[0] allow you to maintain functions and views (with “CREATE OR REPLACE” or equivalent) as SQL files in a file system with full VCS, diffing, etc. just like any other code.
You still need to sequentially define your changes (migrations) which isn't exactly the VCS experience people are used to. If all changes to your db require new file entries, rarely do you need to diff let alone bisect to find out the history. It does make it harder to find out the current shape of your app though with all the ALTER statements to a single table strewn about multiple files. I wonder if it's feasible to write a tool that allows you to generate sequential migrations from DDL organized like tradition code with modules and items dealing with the same domain in the same file after diffing it with the sequential migrations already in the codebase.
No you don't have to structure your project that way. You can have all your logic (views/functions) stay in a single file per, and use git like usual with the standard diff you are used to. Your changelog xml just needs to be setup for it.
So, after digging about in the docs, I found the Best Practices[0] page which lays out what you describe. I was disappointed to find it is just using idempotent creation statements for procedures and views and putting them all in files that are re-run on each migration step-along with each new roll forward migration-so that new versions of these logic objects replace any old ones. This is not exactly something that liquidbase provides, should be easily replicatable, and I was hoping it'd do diffs to provide similar experience on tables as well.
After some web searching, I came across a project[1] that's trying to do what I describe but it appears to be dead. I'm surprised that migration tools (that I looked through) don't already support such a basic approach that I suspect I'm missing something obvious. Some fundamental block in how SQL works or maybe it doesn't work that well in practice for, in concept, it sounds easy to script using what they call a shadow db and `pg_diff`.
Oh, ORMs are not the solution. I agree, ORMs often bring new issues (like, loading too many entries in memory), and don't solve many problems (encapsulation).
> I don't ever want to have a lot of SQL logic in my code.
That works if you are a small team. With big teams / multiple teams, the small problems become a huge problem: someone / some team will write huge statements, will use weird syntax, and so on. (C++ has similar problems btw) Specially the lack of proper encapsulation can become a huge problem.
> Some statements do need to be very large, for efficiency.
Well, then the problem is that SQL doesn't give you a good way to avoid large statements.
> there are a million ways it can be abused
Exactly. Other languages (like Java) make it harder to abuse the language. If you really want, you still can; but it's easier to refactor and maintain.
> Writing SQL is like uploading a program; just like a shader to a GPU... But unlike a GPU, you have full flexibility to permanently store your logic right there in your SQL DB.
I agree - this is a really interesting way to put this.
Writing SQL is like uploading a program; just like a shader to a GPU. And if the network stopped being the bottleneck, the DB server's load would become the problem. Just like with a GPU, you want to upload your programs and data as infrequently as possible. But unlike a GPU, you have full flexibility to permanently store your logic right there in your SQL DB. If it's well-documented, it's much more streamlined to write views and views-on-views, than to keep all that DB logic in your code. I don't find that it becomes a mess at all, because properly crafted views and procedures strongly encourage forced indexing and checking execution plans when you refactor them; as opposed to in-code, ad-hoc queries where it's hard to know if someone ever ran tests to optimize them.
Some statements do need to be very large, for efficiency. Others are better being broken up into sub-procedures, temp tables or even roundtrips. A great query is a work of art where each part of it is optimized not just for execution plan but also for read/write, network state and expected load. What I like about SQL is the flexibility to decide which way to go, and prioritize based on how often something will be used and what kind of strain it'll place on each of the layers in the stack. As with anything powerful, there are a million ways it can be abused. But what I've found in the ORM space and also with noSQL solutions is that simplicity of use comes at the price of flexibility and control.
[edit] >> null handling is complicated
This would probably also be my biggest complaint about the language itself; the existence of IS NULL / IS NOT NULL and both null-safe and -unsafe comparisons gets confusing (and especially painful if you're trying to write a parser). I think it's one of those things like `undefined` vs `null` vs `false` in Javascript where, on second thought, it's just too late to fix it without breaking half a billion websites.