> It is possible to keep database functions in git. It requires some discipline but it's not hard.
My experience differs. How do you check that the stored procedures in the DB are exactly as defined in Git?
I remember spending several hours on a bug that was caused by an altered stored procedure. One SQL function was not as it should have been. I don't know if someone had patched it live, or if there had been an unnoticed failure in a DB migration script.
With a compiled executable, consistency is guaranteed. With interpreted code (Python, PHP), consistency check is done with a command like `git status`. But there is no standard way to compare the state of store procedures in the DB with their expected code in versioned files.
> I remember spending several hours on a bug that was caused by an altered stored procedure. One SQL function was not as it should have been. I don't know if someone had patched it live, or if there had been an unnoticed failure in a DB migration script.
If you have everything under version control you can just reapply that version's stored procedures off the database scripts. If someone or something is not handling failures in deployment scripts (not just database migrations) then you get that sort of problem.
I have done this with both T-SQL and PL/SQL. The stored code is all available in system views and easily extracted with a script. If you set up the script to run automatically at a given time, you know that the version in git is at most n hours out of date. If you leave the control up to the scheduled job, then your commit messages may not be what you want. But there's nothing keeping you from running your extraction script off schedule and then writing a good commit message.
We store our database functions in SQL scripts that are automatically executed with every Flyway migration, so-called callback scripts.
In the callback script every function is declared to be created or recreated. As Flyway is executed with each application start, the database functions are recreated with every application start, even if they didn’t change.
This approach has its pros and cons but it ensures that database functions always match the version of the application code.
Because the callback script of a database function is stored in a file that is edited when needed, the Git history works the same as for application code.
I want to thank you for saying you don't like SQL. Now in your future, please tell it to everyone involved where you work so that you won't work on it and they'll find someone else to do the SQL.
I've worked at place where people hated SQL and where hiding it. Instead of crafting nice and performing SQL, they wrote "select * from table" and then did foreach loops in the code to filter the results they wanted.
Isn’t the author talking about procedural SQL? Regular SQL is just fine. Great even. But procedural SQL truly sucks and is highly inconsistent between DBs.
I'm interested in languages experimenting with integrating tables/dataframes and the relational model as native parts of the language
With how much of the code I've written that's been around representing entities and doing CRUD on collections of them it seems a bit wild not to have native "Table" types.
There's an HN user who has a neat language called "Empirical" based on this, and there's an active research language called "Flix" that has this as well. APL/K and other array langs feature a similar concept as I understand it.
> You’ll be printf debugging at best, and you almost certainly won’t have access to powerful development tooling like code-completion through LSPs
Yes, you stay away from enterprise level databases, you will get what free beer gets you.
All big name databases have proper IDE tooling with ability to single step stored procedure code.
> Deployment and versioning:
Likewise, there are source control tooling for database schemas on enterprise databases.
> And sure, you may be able to activate an extension for an alternative scripting language with better syntax, but do you really want something like a Python VM running inside of your database?
No, I rather enjoy the advanced GC and JIT capabilities of JVM and CLR implementations.
Another good reason for code in the database - keep compute near the data. Particularly for aggregate functions where you're asking for a small summary of a potentially large amount of data, you're better off implementing that logic in the database to avoid excess data transfer.
problem with code in database.
1- impossible to unit test
2- program only work with a specific brand and sometimes version of the database
3- impossible to put breakpoint or inspect variable with a debugger
4- database admin change code directly in production without first making change in version control and using deployment pipeline
It being available means people will do it, and I want nothing to do with that.
My team is unfortunately responsible for maintaining a legacy monstrosity of an app that uses thousand line stored procs. You’re kinda advocating for that as ‘right’. I’ll take a ‘fad’ database missing the ‘proper tooling’ to let people do such things any day.
while i admit it’s doable if using traditional SQL database it’s sadly not possible with newsql and noSQL databases.
Also in my last 30 year working for different industries, it’s extremely rare that DBA write unit test or a proper deploy system to automatically rollback if anything break while changing database schema or stored procedure.
the best DBA i have met use Liquibase or something similar to treat database code like app code. But this is the exception not the norm.
Commitment specific to language/vendor and debugging clunkiness are valid points. However, if having code outside the db is even an alternative for you, you are already far more deeply committed to some language/framework, or your app is trivial.
Debugging databases has slightly different aims than debugging code, and it's much easier since you have the whole relational db available as a debugging tool. It's effortless to store states in temporary tables.
Interestingly when one uses SQLite (or virtually any embedded/serverless[1] database engine), the code is always executed within the application process, regardless of being written in pure SQL of by the hosting language, making the dichotomy irrelevant.
Not having support for stored procedures can be used by some as an argument against SQLite, but the fact is that store procedures conceptually make no sense in embedded databases if you can extend the query language with the host code.
One big issue I've seen on SQLite is that if you add an index or view on an expression which uses code from the host process, no other application will be able to read data from the database file, unless you have such code in an extension that can be read by such applications.
No. The distributed cache product was GemFire, which was derived from the cacheing strategy used in Gemstone/S and Gemstone/J. There was a series of aquisitions and I think Pivotal owns it now.
Gemstone/S is an ACID object database that uses a modified Smalltalk data model and a Smalltalk dialect as a the data language. If you are developing for it on one of the supported client Smalltalk platforms, the experience is magical. eg. if the GS Smalltalk throws an exception, it can be caught by the front end and the debugger call stack display (ST doesn't unwind the stack when searching for a handler) shows both like they were on the same machine.
You write scripts to deploy your stored procedures, DDL, and functions and you store them in version control. You run those scripts from a remote connection with the appropriate database client installed using DB credentials with the appropriate privileges. Alternately, you run the updates directly on the database server as part of an Ansible/Puppet/Chef script.
My point is that you make the database deployment part of your pipeline just like everything else.
I think that the underlying problem here is that most developers, and some DBAs, aren't keeping DDL under version control. Instead, they are just using GUI tools to alter tables and, on very special occasions, create procedures.
They don't treat the database with the same reverence as their app code and then blame the database for being out of sync.
Onebox deployment typically deploys the new revision of the application code to a single host/box and leaves the rest of the fleet in the older revision and then you run your tests against the new revision running in that single box. You also might monitor the metrics from that single box and attempt to find regressions from the rest of the fleet.
If the code is deployed to the database (version controlled or not) all application servers will start using the new code immediately which is not what we would want from a onebox deployment first strategy. Usually all the servers will share the same database and thus the code in the code in database model. What am I missing.
You could easily create a new revision of the function you want to change, and have the first box to use the new revision while the rest keep using the old one.
In postgres that would be trivial with revision schemas and a clever usage of the SEARCH_PATH.
Sure, it works for a narrow use-case like a pure function that just queries and returns data. What about stored procedures/triggers that mutate data on insert/delete etc?
With one-box, you can assign a specific set of customers/canaries to be served by the new 1-box and validate your results. If there is a problem you rollback and only your canary data was affected. It is just not very simple IMO.
That is a neat idea. I was sort of thinking of that, but was concerned about invalid and unused database objects.
You'd avoid name collisions pretty handily, though. I'm guessing that you would just need some policy to drop all of the old versions after some period.
You'd also just need to bundle grant exec statements with the procedure DDL.
That would work well with Oracle and DB2 as well. I'm not totally sure about SQL Server, though.
Ah. I see my misunderstanding. This is my fault for trying to think at 4am.
That is a bit trickier. You will absolutely not be able to do that without computing stored procedure names using a version number, which seems awful and not worth the trouble.
That said, you can avoid application errors during rollout by keeping procedure signatures the same and not removing columns from returned result sets. The unfortunate side effect is that errors in your procedures can work with the new nodes and cause errors in the previously functional ones. Although, I guess you already had that potential exposure with any other database schema change that was made, right?
SQL is not terrible when you get used to it.
They conflate database functions with triggers. It's very easy to restrict your database code to pure functions.
It is possible to keep database functions in git. It requires some discipline but it's not hard.
Migration tools like Flyway help keep track of db function changes.
At small single server scale db functions can be much faster for fetching data.
Tools such as Postgrest prove that you can do much of your crud work in DB functions.