Hacker News new | past | comments | ask | show | jobs | submit login
Code in Database vs. Code in Application (brandur.org)
33 points by todsacerdoti on July 10, 2022 | hide | past | favorite | 41 comments



I disagree with the author on few points.

SQL is not terrible when you get used to it.

They conflate database functions with triggers. It's very easy to restrict your database code to pure functions.

It is possible to keep database functions in git. It requires some discipline but it's not hard.

Migration tools like Flyway help keep track of db function changes.

At small single server scale db functions can be much faster for fetching data.

Tools such as Postgrest prove that you can do much of your crud work in DB functions.


> It is possible to keep database functions in git. It requires some discipline but it's not hard.

My experience differs. How do you check that the stored procedures in the DB are exactly as defined in Git?

I remember spending several hours on a bug that was caused by an altered stored procedure. One SQL function was not as it should have been. I don't know if someone had patched it live, or if there had been an unnoticed failure in a DB migration script.

With a compiled executable, consistency is guaranteed. With interpreted code (Python, PHP), consistency check is done with a command like `git status`. But there is no standard way to compare the state of store procedures in the DB with their expected code in versioned files.


> How do you check that the stored procedures in the DB are exactly as defined in Git?

How do you know that a binary is produced by some code in git? Because you built it. Same principle applies.

But also, the db can list its code.


> I remember spending several hours on a bug that was caused by an altered stored procedure. One SQL function was not as it should have been. I don't know if someone had patched it live, or if there had been an unnoticed failure in a DB migration script.

If you have everything under version control you can just reapply that version's stored procedures off the database scripts. If someone or something is not handling failures in deployment scripts (not just database migrations) then you get that sort of problem.


I have done this with both T-SQL and PL/SQL. The stored code is all available in system views and easily extracted with a script. If you set up the script to run automatically at a given time, you know that the version in git is at most n hours out of date. If you leave the control up to the scheduled job, then your commit messages may not be what you want. But there's nothing keeping you from running your extraction script off schedule and then writing a good commit message.


We store our database functions in SQL scripts that are automatically executed with every Flyway migration, so-called callback scripts.

In the callback script every function is declared to be created or recreated. As Flyway is executed with each application start, the database functions are recreated with every application start, even if they didn’t change.

This approach has its pros and cons but it ensures that database functions always match the version of the application code.

Because the callback script of a database function is stored in a file that is edited when needed, the Git history works the same as for application code.


The solution is that nothing gets to production unless it gets migrated in via the CI pipeline. Admittedly it's still a problem during development.

But a tool like Flyway versions your DB. You know if you are out of date as the migration table is behind.

It's not as good as compiled code but it works when you need it.


I agree with all your points, except for SQL.

Getting lost in SQL is easy, due to its syntax and lack of standards.

Compare that to any of the mainstream programming languages, the difference in cognitive load to process is stark.


It might be Stockholm syndrome but I find SQL to be one of the best DSLs I know. It's only good at querying relational data but it's good at that.

I might have spent too much time in SQL but it one of my favorites.


I always live in the fear of accidentally destroying data or tables with SQL.

Not a rational fear, just an irrational one. A primitive fear of the unknown.

But, to be fair, I have had to work with some of the worst SQL constructs and stored procedures.

shudder


I want to thank you for saying you don't like SQL. Now in your future, please tell it to everyone involved where you work so that you won't work on it and they'll find someone else to do the SQL.

I've worked at place where people hated SQL and where hiding it. Instead of crafting nice and performing SQL, they wrote "select * from table" and then did foreach loops in the code to filter the results they wanted.


Isn’t the author talking about procedural SQL? Regular SQL is just fine. Great even. But procedural SQL truly sucks and is highly inconsistent between DBs.


I'm interested in languages experimenting with integrating tables/dataframes and the relational model as native parts of the language

With how much of the code I've written that's been around representing entities and doing CRUD on collections of them it seems a bit wild not to have native "Table" types.

There's an HN user who has a neat language called "Empirical" based on this, and there's an active research language called "Flix" that has this as well. APL/K and other array langs feature a similar concept as I understand it.

https://www.empirical-soft.com/

https://flix.dev/

I think this also meshes with the idea of LINQ and it's predecessor QUEL.


> You’ll be printf debugging at best, and you almost certainly won’t have access to powerful development tooling like code-completion through LSPs

Yes, you stay away from enterprise level databases, you will get what free beer gets you.

All big name databases have proper IDE tooling with ability to single step stored procedure code.

> Deployment and versioning:

Likewise, there are source control tooling for database schemas on enterprise databases.

> And sure, you may be able to activate an extension for an alternative scripting language with better syntax, but do you really want something like a Python VM running inside of your database?

No, I rather enjoy the advanced GC and JIT capabilities of JVM and CLR implementations.


Can you please provide some examples (anything other than Oracle)?

I do a lot of programming with stored procedures in MySQL and being able to single step stored procedure code is something I’d like to try.


- MS SQL, https://www.sqlshack.com/debugging-stored-procedures-sql-ser...

- DB 2 on z/OS, https://ibm.github.io/db2forzosdeveloperextension-about/docs...

- Informix, https://www.ibm.com/docs/en/informix-servers/12.10?topic=dsr...

Basically all enterprise level RDMS support debugging them for years.

As for MySQL, this seems to do the trick for VS users, https://dev.mysql.com/doc/visual-studio/en/visual-studio-deb...

I never really used MySQL, so I can't be of much help there.


Another good reason for code in the database - keep compute near the data. Particularly for aggregate functions where you're asking for a small summary of a potentially large amount of data, you're better off implementing that logic in the database to avoid excess data transfer.


problem with code in database. 1- impossible to unit test 2- program only work with a specific brand and sometimes version of the database 3- impossible to put breakpoint or inspect variable with a debugger 4- database admin change code directly in production without first making change in version control and using deployment pipeline


1 - https://docs.oracle.com/cd/E15846_01/doc.21/e15222/unit_test...

2 - Fair enough, so is the same when using C extensions, nothing new when using standards with multiple implementations

3 - https://www.thatjeffsmith.com/archive/2014/02/how-to-start-t... (2014 tutorial on purpose)

4 - Just like a UNIX admin can do the same in a production server if the culture isn't there


you are assuming I am using Oracle SQL on a database server I control.

What if instead we have to use Amazon Aurora or Azure cosmosdb …

you can’t simply use C for your stored procedure and you can’t simply login in the vm to have a debugger.


Nope, I am assuming that you are using a proper RDMS with enterprise capabilities and not fad databases with lower quality tooling.

Oracle was just an example of what the minimum bar should be.

With Oracle you can also use C for the stored procedures if you are so much inclined, although I really wouldn't advocate for it.

As for logging in, it is a matter of permissions and policies.


If embedding C functions into my database is the ‘right way’, well I’ll stick to the ‘wrong way’.


Apparently we have some reading comprehension problem => "...although I really wouldn't advocate for it.".


I skimmed, sue me :)

It being available means people will do it, and I want nothing to do with that.

My team is unfortunately responsible for maintaining a legacy monstrosity of an app that uses thousand line stored procs. You’re kinda advocating for that as ‘right’. I’ll take a ‘fad’ database missing the ‘proper tooling’ to let people do such things any day.


1. https://pgtap.org/ 2. As opposed to Python 2.7? 3. Different level of computing, SQL being a 4GL. (Also, EXPLAIN ANALYZE) 4. https://sqitch.org/


Tool like sqlitch are great and i used them myself when working for a startup.

but in big company only DBA are allowed to make change to the db not the developers. And sadly most dba refuse to use those tool.


While the points are true for manual CREATE OR REPLACE, same could be said of any ad-hoc editing stuff in production by hand.

1, 4 - can be achieved with tools such as DBT

2, 3 - certain databases come with proper Python environment with various tooling around that


while i admit it’s doable if using traditional SQL database it’s sadly not possible with newsql and noSQL databases.

Also in my last 30 year working for different industries, it’s extremely rare that DBA write unit test or a proper deploy system to automatically rollback if anything break while changing database schema or stored procedure.

the best DBA i have met use Liquibase or something similar to treat database code like app code. But this is the exception not the norm.


Commitment specific to language/vendor and debugging clunkiness are valid points. However, if having code outside the db is even an alternative for you, you are already far more deeply committed to some language/framework, or your app is trivial.

Debugging databases has slightly different aims than debugging code, and it's much easier since you have the whole relational db available as a debugging tool. It's effortless to store states in temporary tables.


Interestingly when one uses SQLite (or virtually any embedded/serverless[1] database engine), the code is always executed within the application process, regardless of being written in pure SQL of by the hosting language, making the dichotomy irrelevant.

Not having support for stored procedures can be used by some as an argument against SQLite, but the fact is that store procedures conceptually make no sense in embedded databases if you can extend the query language with the host code.

One big issue I've seen on SQLite is that if you add an index or view on an expression which uses code from the host process, no other application will be able to read data from the database file, unless you have such code in an extension that can be read by such applications.

[1] https://www3.sqlite.org/matrix/serverless.html


laughs in Gemstone/S

His position is arguably an engineering limitation of the databases used, rather than an actual principle.


That’s the datagrid product similar to oracle coherence and Gigaspace right?


No. The distributed cache product was GemFire, which was derived from the cacheing strategy used in Gemstone/S and Gemstone/J. There was a series of aquisitions and I think Pivotal owns it now.

Gemstone/S is an ACID object database that uses a modified Smalltalk data model and a Smalltalk dialect as a the data language. If you are developing for it on one of the supported client Smalltalk platforms, the experience is magical. eg. if the GS Smalltalk throws an exception, it can be caught by the front end and the debugger call stack display (ST doesn't unwind the stack when searching for a handler) shows both like they were on the same machine.

GemTalk Systems has a free trial that works with Squeak/Pharo as a client. https://gemtalksystems.com/small-business/gsdevkit/


On further searching GemFire has been open sourced as Apache Geode.

https://geode.apache.org/


How would you do one-box deployments with code in the database? Everything else could be solved with perhaps some or a lot of pain.


It's just scaffolding.

You write scripts to deploy your stored procedures, DDL, and functions and you store them in version control. You run those scripts from a remote connection with the appropriate database client installed using DB credentials with the appropriate privileges. Alternately, you run the updates directly on the database server as part of an Ansible/Puppet/Chef script.

My point is that you make the database deployment part of your pipeline just like everything else.

I think that the underlying problem here is that most developers, and some DBAs, aren't keeping DDL under version control. Instead, they are just using GUI tools to alter tables and, on very special occasions, create procedures.

They don't treat the database with the same reverence as their app code and then blame the database for being out of sync.


Onebox deployment typically deploys the new revision of the application code to a single host/box and leaves the rest of the fleet in the older revision and then you run your tests against the new revision running in that single box. You also might monitor the metrics from that single box and attempt to find regressions from the rest of the fleet.

If the code is deployed to the database (version controlled or not) all application servers will start using the new code immediately which is not what we would want from a onebox deployment first strategy. Usually all the servers will share the same database and thus the code in the code in database model. What am I missing.


You could easily create a new revision of the function you want to change, and have the first box to use the new revision while the rest keep using the old one.

In postgres that would be trivial with revision schemas and a clever usage of the SEARCH_PATH.


Sure, it works for a narrow use-case like a pure function that just queries and returns data. What about stored procedures/triggers that mutate data on insert/delete etc?

With one-box, you can assign a specific set of customers/canaries to be served by the new 1-box and validate your results. If there is a problem you rollback and only your canary data was affected. It is just not very simple IMO.


That is a neat idea. I was sort of thinking of that, but was concerned about invalid and unused database objects.

You'd avoid name collisions pretty handily, though. I'm guessing that you would just need some policy to drop all of the old versions after some period.

You'd also just need to bundle grant exec statements with the procedure DDL.

That would work well with Oracle and DB2 as well. I'm not totally sure about SQL Server, though.


Ah. I see my misunderstanding. This is my fault for trying to think at 4am.

That is a bit trickier. You will absolutely not be able to do that without computing stored procedure names using a version number, which seems awful and not worth the trouble.

That said, you can avoid application errors during rollout by keeping procedure signatures the same and not removing columns from returned result sets. The unfortunate side effect is that errors in your procedures can work with the new nodes and cause errors in the previously functional ones. Although, I guess you already had that potential exposure with any other database schema change that was made, right?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: