Hacker News new | past | comments | ask | show | jobs | submit login
Database Migrations (vadimkravcenko.com)
79 points by bndr on Oct 1, 2023 | hide | past | favorite | 67 comments



Two additional rules I don't see followed often, but made a past life of mine much easier:

1. rollbacks are bullshit, stop pretending they aren't. They work fine for easy changes, but you can't rollback the hard ones (like deleting a field), and you're better off getting comfortable with forward-only migrations

2. never expose real tables to an application. Create an "API" schema which contains only views, functions, procedures, and only allow applications to use this schema. This gives you a layer of indirection on the DB side such that you can nearly eliminate the dance of coordinating application changes and database migrations

You can get away without these rules for a long time, but 2 becomes particularly useful when more than one application uses the database.


Former DBA here turned Python dev (flask mostly) and now Data Engineer: 1 million percent this:

“ 1. rollbacks are bullshit, stop pretending they aren't. They work fine for easy changes, but you can't rollback the hard ones (like deleting a field), and you're better off getting comfortable with forward-only migrations”

Think of database migrations like you would versioning an API, deprecate and then drop old columns only after declaring a breaking change and giving consumers time.

Or take the Google approach where the protobufs contain all the fields from the start of time or something along those lines so that old applications can still work and the service devs can choose to migrate as needed.

But yeah re: point 1 databases are best thought of like the arrow of time, just evolve forward not backwards and you’ll be fine and not lose data or worse.


> Think of database migrations like you would versioning an API, deprecate and then drop old columns only after declaring a breaking change and giving consumers time.

This ties into my second point as I eventually made an "api_v2" database schema and migrated users as you describe.


nice! glad it worked out


Great post, thanks for the insight.

I have a question regarding this point:

> 2. never expose real tables to an application. Create an "API" schema which contains only views, functions, procedures, and only allow applications to use this schema.

Would a view-based indirection help with rollbacks? For example, in scenarios where a column is added/dropped, would it work to just join with a new relationship column? Rolling back would consist of updating the view to include/exclude the join operation, and the old data would remain in place.


Yes, this absolutely works. Rather than viewing it as a rollback enabler, I think of it as a way to try things while keeping an escape hatch. Instead of deleting a column from a table, you can remove it from the view, and only drop it from the table when you're really sure that's safe.

But dropping a column from the view is an API-breaking change, so rather than updating the existing view I'd make a new view, possibly in a new API schema (e.g. "api_v2"). Then you migrate clients/applications to the new view, and only when nothing uses the old view would I drop both the old view and the column from the underlying table in new migrations.


This is spot on. I was having this exact discussion last week about how best to "strangle" an older application and deploy the new replacement for an upcoming project.


agree with 1, hard pass on 2, I worked in a shop like that for a long time and as you end up doing joins and subqueries with the views, query performance goes to shit. the stored procedure part, (edit: on the other hand,) is a whole other level that I wont even get into.


I don't think that's what's meant by #2... About 12 years ago I worked at a place where all the crud operations were stored procs (calling tables/views/functions as needed). The application was only granted execute on the specific schemas. That meant no direct access for crud.

A number of Schema changes and query optimizations could be handled by updating the stored procs without having to recompile the application.


You're right, but I think GP has a point as well. Anything you expose to the application has a potential for misuse, and a lot of "misuse" of a DB means queries with bad performance.

But, I'd argue the API schema gives you better control over this by pushing what might otherwise be application-level logic into the database. I've solved a lot of bad ORM behavior this way.


Both decent approaches and we reduced a lot of friction by using code generation.

Unfortunately, modern development favors having all logic on the application side. There are a lot of benefits that brings including better testing (stored proc testing frameworks never really caught on) however I find now a lot of people I work with rely too heavily on the ORM and don't know how to work with the databases (writing sub optimal queries, not properly indexing, etc)


joins and subqueries are part of SELECT statements, not crud. the purpose of using views is for the SELECT side of the application, not the CRUD side.


You're describing the "R" in CRUD, my friend: https://en.wikipedia.org/wiki/Create,_read,_update_and_delet...

In the example above, the select statements were inside stored procedures, and applications were only granted permissions to execute those with appropriate parameters.


I always wonder why this pattern is not more popular: https://sive.rs/pg

We try to put a clean API on everything, except the one place where it really matters...


SQL is the clean interface. Databases are not designed to run your extra business computations, they are there to store and give you access to your data with efficiency and some guarantees. DBMS' are impossible to debug, cannot be shut down like you would with a faulty service node, and locking/crashing the production one can grind your entire company to a halt. The above pattern seems insane to me.


SQL is specifically designed for "business" computations.


And is wildly misused. (edit: I meant "extra business logic" above, nothing to do with businesses). You commented above with the same thing I said - turns out multiuser DBs are not good for running expensive code beyond the data access stuff, because it interferes with their main job, in ways that are hard to diagnose or mitigate. Aggregation for business computations is fine, cleaning up text with regular expressions is borderline, and procedural code is a disaster waiting to happen. Putting a bunch of functions to hide the SQL in the DB itself to me seems to go against basic norms of how code is developed and deployed as application logic evolves, and not good architecture from risk standpoint either.


It was. But then it turns out databases were bad at handling or were expensive when there were multiple users connected. Setting up database connections didn't perform, so they had to be pooled. Row and field level security wasn't there, or with limitations. So we all migrated away to a different architecture. And perhaps we'll go back again, like we always do..

Chasing our tech-tail.


I really hate the db-as-code paradigm. It leads to magic that is invisible to the app layer, and leads to a lot of unintended consequences.


I’d really like to implement 2. but it’s quite difficult to make the switch to that approach when you already have a ton of tables.


If you are using Postgres, I think you can create a view for each table and put all the views in a schema, then you switch the app all at once by using `SET search_path TO`.


You can, but I'd argue if your API schema is just 1-to-1 views for the underlying tables, you're not getting much value from this approach.


I was thinking of it as the starting point, then you evolve from there.

Tbh, views are a bit of a leaky abstraction (for example when it comes to constraints) so adopting them gradually seems like a good idea.


Until you change the underlying tables, or the api schema. At which point you don’t need to change the other.


If your API view is a direct mapping of an underlying table (e.g. "SELECT *"), and you make a breaking change to that table, the view will also change and your application will break.

What I've done instead is change the underlying table as needed, and ensure the view interface stays the same, by changing the view definition. This lets you refactor the low-level schema without breaking clients, and this migration can be done in a single transaction.


It is difficult, but it's possible, and I've had to take that hard road.

At a high level it looked like

1. create API schema and start filling it out with useful stuff

2. one by one, migrate applications to the new schema. You'll find stuff missing from the API schema, and you'll add it, to support each application. Ideally you'll find commonality across applications, so you end up with a cohesive API schema and not a jumble of application-specific stuff. But this takes discipline!

3. as each application moves over, remove their access from the low-level schema(s)


The amount of times I've been asked to treat a database like any other stateless microservice is crazy. I've been asked to support time machine like behavior where a user can deploy any version of the database (given a list of schemas and other database config variables) and expect it to work even if they were moving from v9 to v3 of the config.


I think #2 is a bit dated approach by now. Certainly there's a place for this approach, when you have a DBA who is acting as gatekeeper to a large DB and the application is stable and is probably a monolith, but this adds friction and comes at a cost. It also tends to encourage a style of design where more business logic goes into the DB instead of stateless services, which eventually bottlenecks your ability to scale.


I last applied this approach a few years ago at an IIOT startup, where we had billions of rows of sensor data. We needed it all in one place, but with a fairly varied set of users, like an application team showing customers various slices of their data and a data science team applying all kinds of analytics to the data.

When I showed up we had a single Postgres database with a half-dozen different clients. One managed the migrations in addition to doing real work, and the others were a mix of "direct" users and API layers used elsewhere. Tons of overlap, tons of repeated logic, tons of inconsistency.

I used this approach to first stem the plague of direct DB access, and later to consolidate our APIs. It worked pretty well, and the folks I left it to seemed to appreciate my approach.

We did have a lot of debates about what is or isn't "business logic", what belongs in this DB or not. We tried to keep it pretty light; our raw data was many independent tables, and our API schema had functions for insertions and views to make that data usable in ways that didn't feel very application-specific.

I was pushing to replace our handwritten API service(s) with something like Hasura, though that didnt happen before I left. I think that's where this approach gets more powerful: manage your data with your database, get a web API for free. Then your data store is encapsulated behind a singular service with a well-defined API.

You're right that it requires some serious attention to the DB. We wrote all migrations in PL/pgSQL (managed with sqitch), giving us a level of control that is hard to get from a language-specific migration framework. That required a lot of upskilling, but I think it was worth it for us.


Yeah, definitely a case-by-case basis situation, but I think with "modern" apps this is not a great approach. The situation you describe walking into is definitely horrible and adding some discipline and especially a solid DBA or lead can mitigate a lot of damage. I've seen plenty of outages caused by some dev not understanding the query their ORM was generating taking down production at the worst possible time, and this approach can help.

This once was the "correct" way to do things, period. At least that's how I learned it originally. If you go a back a couple of decades, "users" in a RDBMS meant literally end users, using a client application to directly query the DB from their desktop machine and whatever permissions they had was what they had. In this environment locking down everything except stored procedures and views was the only way to go. (Interestingly, we've sort of seen a resurgence of this design recently with some of these PaaS based web applications where all the logic is living client-side somewhere in a React app.)

After that indoctrination I remember feeling very resistant initially to a lot of this and in particular the practice of just granting full read/write access to a service account and letting developers write whatever they wanted, but times change. We were working under different constraints at the time. If you have services scoped to a specific responsibility and they completely own their own data and don't have the situation you describe of many clients touching the same DB, a lot of the problems that the proc+views only pattern were intended to solve just go away on their own.


#1 is true. #2 is extremely tedious. I have a hard time imagining this is less work than an extra release.


Rather than 2, isn't it better to let a single application own the database and manage the schema?


That is the point of 2. The single application is the API.


I think GP was talking about a database-level API specifically


I was, but this approach makes it function as a single application, kinda. You're not exposing your underlying data storage directly to consumers.

The weird thing here is your interface is still SQL. Sometimes that's fine, but I think this API-schema approach really takes off when you use something like PostgREST, PostGraphile, or Hasura to automatically turn this API schema into a web API. Lots of nice benefits to those tools, and then you have a single service that is more what users might expect.

I actually discovered this API-schema approach from the docs of one of those tool...I just can't remember which one.


> I actually discovered this API-schema approach from the docs of one of those tool...I just can't remember which one.

Maybe from https://postgrest.org/en/v10.2/schema_structure.html


Yes, I think that was it! Interesting to see they have removed that page in more recent version of the docs.


It wasn't removed, just badly reorganized. It's here now: https://postgrest.org/en/stable/explanations/db_authz.html#s...


The problem I see is that tables and views are a poor general purpose interface: if you don't know what the indexes are, you can't guarantee reasonable performance.


how does this work? you have the "real" database and this "api" database makes queries to the real database? how does that work?


In postgres, every object (table, index, function, view, etc.) lives in a "schema", which is better thought of as a namespace. I put low-level objects like tables into one or more schemas, then create an "API" schema with views, functions, procedures that operate on tables in other schemas. Then I only grant access to that API schema to the application users.


thank you so much!


This is really good.

It covers one of the most common things people miss with regards to running migrations: it isn't possible to atomically deploy both the migration and the application code that uses it. This means if you want to avoid a few seconds/minutes of errors, you need to deploy the migration first in a way that doesn't break existing code, then the application change, and then often a cleanup step to complete the migration in a way that won't break.

Knowing how to do this isn't a common skill. It's probably a good topic for an interview question for senior engineering roles.


Great interview topic, I agree. Candidates should be able to identify any SQL DDL that might break an app, then decompose it into three safe steps as you've described. It's a core skill for working with an RDBMS but rarely explicitly taught.


Something as simple as a field rename can result in downtime done naively, and showing a candidate a badly named field and asking them what they’d do to fix it can be quite illuminating!


This is a fantastic article! It shows that even simple migrations (like adding or removing a column) can be quite tricky to deploy in concert with the application deployement.

We (at Xata) have tried for a while to come up with a generic schema migration system for PostgreSQL that makes this easier. We ended up using views and temporary columns in such a way that we can provide both the "old" and the "new" schema simultaneously. Up/down triggers convert newly inserted data from old to new and the other way around. This also has the advantage the it can do rollbacks instantly by just dropping the "new" view.

We were just planning to announce this as an open source project this week, but actually it is already public, so if you are curious: https://github.com/xataio/pgroll


Such is life. This perspective seems coming from an application developer who is seeing the database as an extension of the app. When you're big enough to have a database ops team, they will come with a different perspective, and most likely be highly skeptical of storing schema changes in the database.

If your're not too cool for MySQL, check out skeema.io for a declarative, platform-agnostic approach to schema management and path to happiness.


I'm not too cool for MySQL, but PostgreSQL has been the default anywhere I've been in fifteen years. Is anybody aware of something equivalent for pg? I see a lot of old projects that aren't getting updates anymore.


Your comment is written like you have a undisclosed stake in skeema.io


Founder of skeema.io here. GP is a fan and does not have a stake in the company.

Skeema is used by several hundred companies, including GitHub, Twilio, and Etsy. We have a lot of fans in the MySQL community, and just because someone enjoys the product does not mean they’re a shill.


Thanks for the clarification, and I wasn’t suggesting he was a shill - it just sounded like it and typically such claims and prefixed with a disclaimer on HN


Nope, it just made my life better and I want others who feel the pain in TFA to know there are options.


Thank you for spreading the word, I greatly appreciate it. Building a bootstrapped product/business in the MySQL space has been quite challenging.

Most newer entrants in the schema management space are VC-funded, and went wide instead of deep in terms of the range of supported DBs. I personally believe in deep expert-level coverage of a specific DB, resulting in better functionality and a safer schema management toolchain. However it does make word-of-mouth more challenging, since MySQL is unpopular here. So often these blog posts don’t mention Skeema, since blog authors coming from other DBs haven’t ever encountered it.


The pain of database migrations is what originally pushed me towards event sourcing. The database is read-only (but immediately mirrors changes applied to the event stream), and I can have multiple databases connected to the same stream, with different schemas.

This makes schema changes easy to perform (just create a new database and let the system populate it from the events), easy to do without downtime (keep the old database available while the new database is building), and easy to roll back (keep the old database available for a while after the migration).

The trade-off is that changing/adding event types now needs to be done carefully (first deploy code able to process the new event types, then deploy the code that can produce the new events), whereas a SQL database supports new UPDATE or INSERT without a schema change.


This assumes your stream has the full event history … or that history/state is irrelevant, correct?

If not, did you leave out a ‘copy history from source to target’ step?


Indeed, the event stream contains the full history of all events that have ever been produced during the lifetime of the application.


I really like the datomic guidelines for change, which negates a category of migration headaches: https://blog.datomic.com/2017/01/the-ten-rules-of-schema-gro...


I still think that for many cases, small to medium enterprises should consider migrations with downtime. If it’s B2B it’s relatively normal to have maintenance periods.


Strong agree, you can usually find some time when the application doesn’t have to be available. It’s so much easier to just shut the service layer down, take a snapshot, migrate, and bring the services back.


Great write up! At PeerDB, we’ve been using refinery https://github.com/rust-db/refinery to handle database migrations of our catalog Postgres database from Rust. It is easy, typesafe and gets the job done. Thought this would be useful for users building apps with Rust!


I think the Heisenberg uncertainty principle applies to database migrations.

You can migrate or you can have no downtime, you cannot do both.


Effectively zero downtime migrations are certainly possible, but they are very complex to implement and require a couple layers of indirection. Rarely worth the cost imo.


Nice writeup. It's one of those things not taught in school and always learn in the hard way.

We at Bytebase also recognize this and have spent over 2 years to build a solution for team to coordinate the database migrations better.


What I want are state based migrations which are smart enough to handle things which are dependent on others.


What if your (read) DB is just a projection?


Insightful. Also, your website design is very clean and nice - I like it.


Very hard to read on mobile though. I either have to scroll left to right for every line, or zoom out and have very small text. Seems like the content div has a min-width or something that prevents the text from wrapping on a narrow screen.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: