Pgroll: zero-downtime, reversible schema migrations for Postgres

dboreham · 2023-10-03T16:55:52.000000Z

For those curious, as I was, how this works, beyond the details in the readme and blog post, note that "schema" in this context is both a loose term (we changed the schema, old schema, new schema) AND a concrete thing in PostgreSQL[0]. It's helpful to know that pgroll implements the first one (let's change the schema) using the second one (make Schema objects in PG [1]). The magic is in creating different views in different Schemas that map appropriately to underlying tables (in a third Schema).

Presumably (didn't see this mentioned in the docs yet) the trick is that you re-deploy the app with a new connection string referencing the new schema (as in [3]), while an old app deployment can keep referencing the old schema with its connection string.

Hopefully I got that right.

[0] https://www.postgresql.org/docs/current/ddl-schemas.html [1] https://github.com/xataio/pgroll/blob/main/pkg/roll/execute.... [3] https://stackoverflow.com/a/67525360

surjection · 2023-10-03T17:19:22.000000Z

You're right. I wish schema wasn't such an overloaded term :)

In order to access either the old or new version of the schema, applications should configure the Postgres `search_path`[0] which determines which schema and hence which views of the underlying tables they see.

This is touched on in the documentation here[1], but could do with further expansion.

[0] - https://www.postgresql.org/docs/current/ddl-schemas.html#DDL... [1] - https://github.com/xataio/pgroll/blob/main/docs/README.md#cl...

dkubb · 2023-10-03T20:29:12.000000Z

I mentally change schema to namespace when thinking about the postgresql feature.

guffins · 2023-10-03T20:41:53.000000Z

You’re not alone. That’s also how PostgreSQL itself thinks about schemas! https://www.postgresql.org/docs/current/catalog-pg-namespace...

fabianlindfors · 2023-10-03T17:34:39.000000Z

You got it right! I wrote a blog post a few years back about how this technique works for anyone curious: https://fabianlindfors.se/blog/schema-migrations-in-postgres...

candiddevmike · 2023-10-03T15:54:18.000000Z

Help me understand the value of undoable migrations. I've always operated under "the only path is forward" and you release a new version that fixes the issue or create a new migration that does some kind of partial rollback if necessary. Once a migration is live, naively rolling things back seems like you're asking for problems.

I also only perform migrations as part of app init, not separately.

exekias · 2023-10-03T16:06:08.000000Z

I believe this is one of the reasons why migrations become scary in many cases. If something goes wrong "the only path is forward". Also, rolling out new versions of the application means either breaking the previous versions (with some instances still running) or doing the migration in several steps.

We believe there is a better way, they way pgroll works, you can start a migration, and keep the old & new schemas working for as long as you need to rollout your app. If the new version of the app/schema doesn't behave as you were expecting, you only need to rollback the commit and undo the migration. pgroll guarantees that the previous version is still working during the whole process.

There is a graph in the readme depicting this concept:

https://github.com/xataio/pgroll/blob/main/docs/img/migratio...

dot5xdev · 2023-10-03T17:09:28.000000Z

> If the new version of the app/schema doesn't behave as you were expecting, you only need to rollback the commit and undo the migration.

If I delete a "last_name" column, apply the migration, and then decide I shouldn't have deleted users' last names. Do I get that data back?

indigo945 · 2023-10-03T17:59:51.000000Z

Just from my understanding from having read the linked website: yes, you do.

"Applying the migration" doesn't actually do anything to the table, it just creates a new schema containing views over the old one, where the view for the table whose column you deleted hides the column. You can then try if your app still works when using accessing this schema instead of the old one. If you're happy, you can "complete" the migration, at which point only the table structure actually gets altered in a non-reversible way.

exekias · 2023-10-03T18:40:43.000000Z

^ this is exactly how it works :)

javaunsafe2019 · 2023-10-03T18:59:30.000000Z

But if it works like that aren’t there schema migration paths that are changing the actual content of a column and are then not undoable?

surjection · 2023-10-03T20:58:45.000000Z

Any pgroll operations[0] that require a change to an existing column, such as adding a constraint, will create a new copy of the column and backfill it using 'up' SQL defined in the migration and apply the change to that new column.

There are no operations that will modify the data of an existing column in-place, as this would violate the invariant that the old schema must remain usable alongside the new one.

[0] - https://github.com/xataio/pgroll/tree/main/docs#operations-r...

darkwater · 2023-10-03T21:21:54.000000Z

Maybe this is explained somewhere in the docs but I'm lazy: how does it cope with possible performance issue in highly trafficked tables? Can you somehow control the backfill speed if it's taking up too much I/O?

surjection · 2023-10-03T21:33:15.000000Z

There's nothing about this in the docs :)

Backfills are done in fixed size batches to avoid taking long-lived row locks on many rows but there is nothing in place to control the overall rate of backfilling.

This would certainly be a nice feature to add soon though.

pcthrowaway · 2023-10-03T20:16:39.000000Z

Also, if the data isn't deleted couldn't this lead to database bloat?

surjection · 2023-10-03T20:51:22.000000Z

The bloat incurred by the extra column is certainly present while the migration is in progress (ie after it's been started with `pgroll start` but before running `pgroll complete`).

Once the migration is completed any extra columns are dropped.

pcthrowaway · 2023-10-03T20:12:49.000000Z

I recently had to do a migration on a timescale hypertable where a "schema" was migrated for a table which had jsonb columns containing arrays of arrays of numbers to a new table containing the same data as two-dimensional postgres arrays of numeric[][] data (better storage characteristics)

Our workflow was something like:

1) Create the new hypertable

2) Create after insert trigger on first table to insert transformed data from first table into second table, and delete from first table (this ensured applications can continue running using first schema/table, without any new data being added to first table after migration)

3) Iterate over first table in time-bucketed batches using a plpgsql block to move chunks of data from first table to second table.

Would pgroll enable a similar workflow? I guess I'm curious if the way pgroll works would similarly create a trigger to allow apps to work with the initial schema as a stopgap... I guess pgroll would perform the whole migration as a series of column updates on a single table, but I'm unclear on whether it attempts to migrate all data in one step (potentially locking the table for longer periods?) while also allowing applications using the old schema to continue working so there is no downtime as changes are rolled out.

Has pgroll been tested with timescaledb at all?

exekias · 2023-10-03T20:32:28.000000Z

To do this with pgroll I would use an alter_column migration, changing the type: https://github.com/xataio/pgroll/tree/main/docs#change-type, this would:

1) Create a new column with the desired type (numeric[][] in your case) 2) Backfill it from the original one, executing the up function to do the casting and any required transformation 3) Install a trigger to execute the up function for every new insert/update happening in the old schema version 4) After complete, remove the old column, as it's no longer needed in the new version of the schema

Backfills are executed in batches, you can check how that works here: https://github.com/xataio/pgroll/blob/main/pkg/migrations/ba...

I don't think any of us has tested pgroll against timescaledb but I would love to know about the results if anyone does!

vlovich123 · 2023-10-03T21:39:39.000000Z

Is my understanding correct that the need to copy columns makes starting a migration potentially extremely expensive on a large database?

surjection · 2023-10-03T21:51:51.000000Z

Yes, for those pgroll migrations that require a new column + backfill, starting the migration can be expensive.

Backfills are done in fixed size batches to avoid long lived row locks, but the operation can still be expensive in terms of time and potentially I/O. Options to control the rate of backfilling could be a useful addition here but they aren't present yet.

claytonjy · 2023-10-03T22:49:21.000000Z

This is almost exactly how I did a similar migration, also in Timescale. I used PL/pgSQL and sqitch, did you use a migration tool?

pcthrowaway · 2023-10-04T03:04:58.000000Z

No, this was all done in handwritten .sql scripts. I don't think it matters too much in this case, but we're using Rust and the sqlx cli for driving the migrations, but that basically just runs the sql migration scripts

mst · 2023-10-03T18:53:56.000000Z

Apologies for the off-topic-ness, but no matter where I've tried putting the mouse focus on the post, Up/Down don't work to scroll (but PgUp/PgDown are fine).

(I very much appreciate the effort to provide tooling that puts all these things together, btw)

exekias · 2023-10-03T19:03:03.000000Z

Thanks for reporting! we will look into it

mst · 2023-10-04T13:25:18.000000Z

Sorry, forgot to say: Win10 / Firefox.

candiddevmike · 2023-10-03T16:25:59.000000Z

That's great that pgroll does this, but the heavy lifting for supporting this comes at a huge cost on the application side, IMO.

surjection · 2023-10-03T16:45:00.000000Z

Do you mean the extra configuration required to make applications use the correct version of the database schema, or something else?

candiddevmike · 2023-10-03T16:51:22.000000Z

Yea, keeping your application consistent with two different schema versions. And I'm not saying from a blue/green standpoint, from whatever pgroll does instead so when the rollback happens you don't lose data.

surjection · 2023-10-03T17:47:20.000000Z

I don't see the need to keep your application consistent with both schema versions. During a migration pgroll exposes two Postgres schema - one for the old version of the database schema and another for the new one. The old version of the application can be ignorant of the new schema and the new version of the application can be ignorant of the old.

pgroll (or rather the database triggers that it creates along with the up and down SQL defined in the migration) does the work to ensure that data written by the old applications is visible to the new and vice-versa.

A rollback in pgroll then only requires dropping the schema that contains the new version of the views on the underlying tables and any new versions of columns that were created to support them.

matsemann · 2023-10-03T17:23:35.000000Z

Eh, isn't this making it easier on the application side? Today, when I make a change in my app that needs a migration, I need multiple steps to make sure I don't break old instances of the app still running. With this it looks like one can avoid all that?

exekias · 2023-10-03T18:38:38.000000Z

That is what this project is trying to achieve. By allowing your client apps to access both the old and the new schema at the same time, you can have the old instances of your application working while the new ones (using the new schema) get deployed.

They can work in parallel for a while until you complete the rollout and call the migration as done.

contravariant · 2023-10-03T16:00:24.000000Z

I don't think 'undoable' is the clearest description, the crux is this:

> Keep two versions of the schema (previous and next) accessible at the same time during the whole migration process

This has some obvious advantages. Like you said you can't easily roll back once a migration is fully live, but it helps a lot if you can cancel a migration once it turns out it doesn't work.

tudorg · 2023-10-03T16:05:25.000000Z

Yes, that's exactly it. I generally agree with "always move forward" but if rolling back is as easy as dropping the "new" view, that makes it a lot less scary.

candiddevmike · 2023-10-03T16:25:13.000000Z

Isn't that what transactions are for?

vanviegen · 2023-10-03T20:44:30.000000Z

That wouldn't allow you to partially roll out your new code base (depending on the altered schema), nor to easily revert such a roll out.

contravariant · 2023-10-03T16:30:30.000000Z

That would be hard to coordinate if your application is even slightly complicated.

candiddevmike · 2023-10-03T20:14:09.000000Z

Not really...

- Start migration, create transaction and locks

- Do migration things

- Do some SELECTs, INSERTS, or w/e in your code to validate the schema matches your model.

- End transaction or rollback if failure

None of this is free, but it's by no means hard IMO. I do something similar in Go: https://github.com/candiddev/shared/blob/main/go/postgresql/...

ttfkam · 2023-10-03T20:35:51.000000Z

> Start migration, create transaction and locks

If you do that on a table or two with millions of rows, you've effectively locked your database from all other access, even SELECTs.

ALTER statements typically grab exclusive locks, blocking both writes and reads. Chain a couple together and you're looking at high CPU, queued queries, and an unresponsive database for hours.

Once you realize what's happening and kill the migration in a panic, it may take tens of minutes or more for the DB to revert back to its initial state, much to everyone's impatient horror.

In simple programs and databases, everything is possible and quick. Complex programs and large databases disabuse you of your naïveté with a quickness.

I love Postgres's transactional DDL, but as the article's mention of renaming columns highlights, one transaction simply isn't enough for the job.

candiddevmike · 2023-10-03T20:44:41.000000Z

The locks I was referring to were more around advisory locks to prevent multiple migrations from running. For the locks you mention, all migrations create some kind of table lock typically, it's up to the dev to make them as small as necessary. I don't follow your strawman.

ttfkam · 2023-10-03T22:43:45.000000Z

The locks I was referring to were a direct result of many (most?) ALTER statements. Multiple clients trying to migrate isn't even in this equation. Just one client migrating can ruin your day with a single ALTER statement given a large enough database.

What's the biggest table you've ever had to modify?

Adding a column with a default is easy. What about deleting a column from a table with 10s of millions of rows? Renaming a column? Changing a column's type? Adding a CHECK constraint? Adding a NOT NULL? Adding a foreign key constraint that someone accidentally removed in the previous migration while data keeps flowing?

Those concerns aren't about your explicit advisory locks. Not by a long shot.

https://www.postgresql.org/docs/current/explicit-locking.htm...

bananapub · 2023-10-03T21:55:53.000000Z

> I don't follow your strawman.

how large are the DB migrations you've done? your experience is seemingly completely different to mine.

Nullabillity · 2023-10-03T16:02:12.000000Z

During development I'll often keep rolling the same change back and forth (with minor changes), rather than recreate the database from scratch/a backup each time.

bityard · 2023-10-03T16:53:35.000000Z

If your organization requires change management (as many are contractually obligated to), then you don't have much of a choice. Every change needs to be tested, every change needs a way to fully and completely roll back the change, and the rollback has to be tested.

Additionally, the people executing the change are not necessarily those who have developed the change. They need two big buttons: Do and Undo. If the change fails or breaks something, they hit the Undo button and tell the developers about it and by the way, here are the logs, please go reproduce it on the test system and try again.

I know this is not "devops," but it's still how a lot of high-availability software deployments work, particularly when hampered by bureaucratically-imposed processes.

Finally, database schema changes are a fine way to irreversibly munge your data if you are not careful. (This goes beyond SQL.) If that happens, there is no such thing as a path forward, the only way to save the ship is to restore from your most recent backup.

nobleach · 2023-10-03T22:43:29.000000Z

A couple of places I've worked lived like this. We used the free version of Flyway with Spring Boot and Quarkus. We got by really well "rolling forward". My next gig used Sequalize and MySQL. Aside from hating those technologies, I hated that false sense of security baked into "oh well, we can always roll back"

cptMayhem · 2023-10-04T17:12:53.000000Z

> I hated that false sense of security baked into "oh well, we can always roll back"

That's why I like pgroll's approach, in that there isn't really a "rollback procedure" that might make things worse during emergencies, but rather old and new schemas that remain working simultaneously until the migration is marked as complete and there are no clients using the old schema. "Rolling back" is actually cancelling the new schema migration and falling back to the previous version that's been working at all times, thus minimizing the risk.

fabianlindfors · 2023-10-03T15:32:01.000000Z

Congrats to the lovely Xata team on another great launch! I'm the creator of Reshape [0], a similar tool which inspired pgroll, and seeing the concept taken even further is really exciting.

[0] https://github.com/fabianlindfors/reshape

menthe · 2023-10-03T17:38:52.000000Z

Been reading the code.. very tidy. However the Complete step (e.g. in op_set_notnull.go) renames the temporary column name to the proper column name on the underlying table.. but while the docs describe the view on the new schema getting updated to refer to the now renamed underlying column, I do not seem to find the step where it happens? Also, shouldn't those two steps be in a transaction to ensure no failed queries in between - otherwise that's enough to be qualified as downtime ihmo? Quite dubious to see that `createView` is only called once, on `Start`, and that there doesn't seem to be locks or transactions.

Unless obviously the view has magic to use either column name based on what's available on the underlying, but I did not see that either on `createView`.

surjection · 2023-10-03T18:00:35.000000Z

There is no code to do this because it's actually a nice feature of postgres - if the underlying column is renamed, the pgroll views that depend on that column are updated automatically as part of the same transaction.

menthe · 2023-10-03T18:01:37.000000Z

Very cool, thank you!

michaeldejong · 2023-10-03T20:00:02.000000Z

Very cool! Congratulations to the authors on the release! I'm the author of a similar (zero-downtime migration) tool for PG called QuantumDB[0]. It was the first (to my knowledge at least) tool to support foreign keys, by creating table copies (keeping those in sync using triggers), and exposing multiple schemas through a custom database driver. I never got to production-ready version unfortunately, but I'm happy this one did. I'm seeing a lot of familiar concepts, and it looks well thought out.

[0] https://github.com/quantumdb/quantumdb

GRBurst · 2023-10-03T18:19:08.000000Z

So during migration both schemas are valid if I understood correctly?! It would be awesome if "during migration" could be lifted to a point where it is possible to keep both schemas (old and new) for as long as I want and do migrations/transformation of incoming request (like queries) on the fly. Then I could map my different api version to different schemas and these on the fly transformation would be able to take care of the rest in many scenarios :-)

exekias · 2023-10-03T18:36:55.000000Z

This is actually the case, old and new schemas are available and working until you complete the migration, and you can run this step whenever you want.

The aim is not to deal with conditional logic in the app dealing with both schemas, but having an old version of the app linked to the old schema and the new one using the other.

GRBurst · 2023-10-03T21:58:02.000000Z

so if I want to sunset my api version X in 1 year for whatever reason and I am able to support an old schema X, which api X maps to, for that time period without any hassle (and not only during migration), this would be a much bigger feature / USP for me then everything else mentioned. I am really curious to look deeper into this :-)

chrisweekly · 2023-10-03T18:35:33.000000Z

I had the same thought; eager to see if anyone can explain why this wouldn't work, or (better) how they're already doing this today.

jedberg · 2023-10-03T16:28:27.000000Z

This is very cool! Schema migrations have always been tough and fraught with peril.

That being said, I've always found that doing the migrations within the app is safer, because rollbacks are tied to the code. For example to change a constraint, we add a new column with the new constraint, and then change the code to read from the new column and old column, take the new value if it exists, otherwise use the old value, perform the operation, then write the new value to the new column and old column. We do this for a while and then do a background migration of the data at the same time, slow enough to not overload the database. At some point the new column is fully populated, and then we can put a new version of the code that only reads the new column. Then you check your stats to make sure there are no reads from the old column, and then you can delete the old one whenever you want, which is a very fast operation. Then you get your space back when you do your vacuum (protip: make sure you have solid vacuuming schedules!).

What are some use cases where you'd want to migrate the schema without also migrating the code at the same time?

skrebbel · 2023-10-03T17:12:51.000000Z

To my reading that’s exactly what this is intended for.

You do “pgroll start”, let it run, and then when that’s done you deploy your new code. Then when you’re confident the new code is fine, you do “pgroll complete”. If at any time you realize you got it wrong, you rollback the code and then you do “pgroll rollback” and it’s af if nothing happened (but data changes that went through the new code and schema are still around, if the change you made allows).

jedberg · 2023-10-03T17:16:22.000000Z

Right, but the difference is that this is done on the backend with no app awareness. Doing it in the app ties the app logic to the database schema. Using pgroll would allow the database and app to get out of sync.

surjection · 2023-10-03T18:14:44.000000Z

Allowing the database and the application to be out of sync (to +1/-1 versions) is really the point of pgroll though.

pgroll presents two versions of the database schema, to be used by the current and vNext versions of the app while syncing data between the two.

An old version of the app can continue to access the old version of the schema until such time as all instances of the application are gracefully shut down. At the same time the new versions of the app can be deployed and run against the new schema.

jedberg · 2023-10-03T18:26:23.000000Z

But then how do avoid errors introduced due to the lack of app awareness? For example the old app can keep writing data with the old schema, whereas the new version makes assumptions based on the new schema, which could be broken by the new data in the old schema that don't follow the same assumptions.

Seems like a dangerous way to introduce very hard to find data inconsistencies.

ris · 2023-10-03T19:06:25.000000Z

> For example the old app can keep writing data with the old schema, whereas the new version makes assumptions based on the new schema

To do safe migrations it's a good idea to avoid rolling out versions that start making assumptions about the new schema until the migration is done and dusted.

This does of course start to create weird territory where migrations cease to become trivially "stackable" if you want to retain the whole zero-downtime thing, but this is true for a lot of "graceful update" patterns in computing (in fact I've not seen it solved properly before).

jedberg · 2023-10-03T19:11:10.000000Z

> To do safe migrations it's a good idea to avoid rolling out versions that start making assumptions about the new schema until the migration is done and dusted.

Or make the app do the migration as part of its normal operation, so you have total control over it. :)

I think we're saying the same thing, I'm just suggesting it's safer to do it in the app itself, because then app changes and database changes are tied together instead of divorced from each other.

If you only have one or two people working on the code that interacts with the table that is being changed, both methods are pretty much the same. But if you have a large team or want to be able to rollback better, then doing it all in the app code feels like the better way to go for me.

exekias · 2023-10-03T19:14:27.000000Z

For migrations altering columns (for instance adding a constraint), data gets upgraded/downgraded between old and new versions trough the up/down functions. These are defined by the migration. They work like this:

- For rows inserted/updated through the old version of the schema, the up function gets executed, copying the resulting value into the new column - For rows inserted/updated through the new version, the down function gets executed, copying the resulting value into the old column

For instance, you can test that up function works before releasing the new app relying on it, just by checking that the values present in the new schema are correct.

jedberg · 2023-10-03T19:26:46.000000Z

What happens if you write into the old table with data that violates the constraint the new schema adds? Does the up function fail?

surjection · 2023-10-03T20:32:48.000000Z

An example would make this more concrete.

This migration[0] adds a CHECK constraint to a column.

When the migration is started, a new column with the constraint is created and values from the old column are backfilled using the 'up' SQL from the migration. The 'up' SQL rewrites values that don't meet the constraint so that they do.

The same 'up' SQL is used 'on the fly' as data is written to the old schema by applications - the 'up' SQL is used (as part of a trigger) to copy data into the new column, rewriting as necessary to ensure the constraint on the new column is met.

As the sibling comment makes clear, it is currently the migration author's responsibility to ensure that the 'up' SQL really does rewrite values so that they meet the constraint.

[0] - https://github.com/xataio/pgroll/blob/main/examples/22_add_c...

skrebbel · 2023-10-04T05:36:38.000000Z

If I may be so free, I read your docs quite carefully last night and it wasn’t immediately clear to me that “up” and “down” are automagically run to keep both the old and the new data in sync, while a migration is pending. To my reading this is a spectacular feature because it solves the “the only way is forward” problem (where you can’t actually ever roll back a migration or risk losing / making inconsistent some recent data) for many kinds of migrations.

May I suggest spelling this out more explicitly in the docs? As far as I can tell it’s only mentioned by example halfway down the tutorial. To me it seems like one of pgroll’s true killer features, worth more coverage :-)

exekias · 2023-10-03T19:31:34.000000Z

Yes, this is the case as of today. It's important to get the up/down functions right.

The good thing is that this will probably be detected during the start phase, as data would fail to be backfilled.

We are thinking of ideas to improve this in the longer term, for example: * Having a dry-run mode where you check that the data present in the DB will stay valid after applying the up function * Optionally allow for this to happen, but "quarantine" the affected rows so they are no longer available in the new schema.

hosh · 2023-10-03T16:40:58.000000Z

I wasn't able to see the blog article (because it 404 by the time I am looking at this). I'm considering introducing this to the eng team I am a part of, because multiple teams and multiple projects often touch the same db.

Anyone know how well this works with very large datasets? The backfill sounds like it would take a while to do.

Does this Go binary need to be continuously running, or does it keep track of migration state in the database?

tudorg · 2023-10-03T16:46:26.000000Z

> Anyone know how well this works with very large datasets? The backfill sounds like it would take a while to do.

It can take a long time, yes. It's somehow similar in that regard with, for example, gh-ost for Mysql that also does backfills. The advantage of Postgres here, is that backfill is required for fewer migration types, and pgroll only does backfills when needed.

> Does this Go binary need to be continuously running, or does it keep track of migration state in the database?

The latter, you only run the Go binary when doing schema changes.

Yeroc · 2023-10-03T19:09:18.000000Z

How can you properly plan for eg. disk storage requirements etc. Does the tool calculate that upfront via some sort of dry-run mode? For companies with larger datasets this would be a rather important consideration. Also, those backfills will generate a lot of network traffic in clustered environments.

exekias · 2023-10-03T19:16:17.000000Z

This is a good point, I believe we can look into trying to estimate storage needs or timings before a migration. It definitely looks like a nice to have.

monocularvision · 2023-10-04T18:43:52.000000Z

When performing a rollout of new code across multiple machines, you can expect to have both your old code and new code running at the same time. That’s why you might want to support both the old and new schema at the same time.

menthe · 2023-10-03T16:34:16.000000Z

Zero-downtime, undoable, schema migrations for Postgres... But definite downtime and undone blog post...

> Page not found

> We're sorry, but the page you requested could not be found. This could be because the page has been moved or deleted. We're tracking these errors so we can fix them in the future.

alexf_19 · 2023-10-03T16:39:36.000000Z

This should be back up now: https://xata.io/blog/pgroll-schema-migrations-postgres

That's what we get for trying to fix some text ;-)

hosh · 2023-10-03T16:36:50.000000Z

I did find this: https://github.com/xataio/pgroll

tudorg · 2023-10-03T16:39:52.000000Z

Sorry about it, small glitch with our blog as we tried to fix something in the title and accidentally broke the metadata for it :)

csmattryder · 2023-10-03T18:18:33.000000Z

When can we expect blogroll to be released??

alexf_19 · 2023-10-03T19:10:40.000000Z

Well played, well played.

aeyes · 2023-10-03T15:16:12.000000Z

This looks very nice indeed but I see a few possible problems which I have seen with pg_repack which might apply to this approach as well:

You can't change table names unless you take a lock. How exactly do you switch the original table to be a view pointing to the original table? The docs don't go into detail how this is done exactly, I'll check the code later.

It looks like the tool maintains two copies of the table but how exactly this copy process is done isn't explained. A potential issue is that you need to have disk space and I/O capacity available to support this.

The copy table + trigger approach might not work for databases of significant size. For example I have seen instances with >50k qps on a table where it is not possible to run pg_repack because it never catches up and it also doesn't ever manage to take the lock which is needed to switch to the new table. This can be simulated with overlapping long running queries.

exekias · 2023-10-03T15:38:09.000000Z

> This looks very nice indeed but I see a few possible problems which I have seen with pg_repack which might apply to this approach as well:

Thank you for your input! I'm one of the pgroll authors :)

> You can't change table names unless you take a lock. How exactly do you switch the original table to be a view pointing to the original table? The docs don't go into detail how this is done exactly, I'll check the code later.

pgroll only performs operations requiring a short lock, like renaming a table. It sets a lock timeout for these operations (500ms by default), to ensure we avoid lock contention if other operations are taking place. We plan to add an automatic retry mechanism for these timeouts so there is no need for manual intervention.

One cool thing about views is that they will automatically get updated when you rename a table/column, so the view keeps working after the rename.

> It looks like the tool maintains two copies of the table but how exactly this copy process is done isn't explained. A potential issue is that you need to have disk space and I/O capacity available to support this. > The copy table + trigger approach might not work for databases of significant size. For example I have seen instances with >50k qps on a table where it is not possible to run pg_repack because it never catches up and it also doesn't ever manage to take the lock which is needed to switch to the new table. This can be simulated with overlapping long running queries.

pgroll doesn't really copy full tables, but individual columns when needed (for instance when there is a constraint change). It is true that I/O can become an issue, backfilling is batched but the system should have enough capacity for it to happen. There are some opportunities to monitor I/O and throttle backfilling based on it.

cpursley · 2023-10-03T16:12:35.000000Z

What I'd love to see is state-based migrations similar to what Prisma offers - but that can handle, views, functions, and complex logic that references other things - and have it be smart enough to change those as well. Or at least walk you through any dependent changes. I'd pay for that.

evanelias · 2023-10-03T19:27:53.000000Z

For things like stored procs, triggers, and views, there's a lot of vendor-specific (e.g. Postgres vs MySQL vs SQL Server) edge cases in syntax, introspection, and operational best practices. That's true of tables too of course, but at least the introspection part tends to be fully functional for tables in all major database systems. For other object types, introspection can be half-baked and things can get painful in general. It's much harder to design a generic declarative tool which works across multiple DBs without making sacrifices in expressiveness, safety, and ergonomics.

So most likely you're going to want a Postgres-specific tool for this, but I'm not sure one exists yet that handles everything you're looking for here.

I'm the author of a product called Skeema which does handle all this (tables, procs/funcs, views, triggers) for MySQL and MariaDB, and in my opinion this is an area where MySQL/MariaDB's relative simplicity -- in things like e.g. lack of transactional DDL -- actually makes this problem easier to solve there. For example Skeema explicitly doesn't handle data migrations because you can't atomically combine schema changes and data changes in MySQL/MariaDB in the first place.

btw when describing/searching for this, I always say "declarative" and never "state-based". "Declarative" is consistent with terminology used by other infra-as-code such as Terraform and Kubernetes. The main places I see calling it "state-based" are marketing blog posts from commercial schema management tools using an imperative migration approach (Liquibase, Bytebase, etc). To me it feels like they say "state-based" in order to make the declarative competition seem more strange/foreign...

tianzhou · 2023-10-04T16:15:38.000000Z

One of Bytebase authors here, we use "state-based" and "declarative" https://www.bytebase.com/blog/database-version-control-state... interchangeably

Our initial launch does use "state-based", but we later change it to "declarative". You can check the product here https://demo.bytebase.com/project/blog-102#gitops

Of course, it's always a bit confusing to have multiple ways to reference the same thing. So be it.

evanelias · 2023-10-04T18:40:27.000000Z

I was referring to blog posts, for example [1], in which you admittedly use both terms in the text, but you focus more on "state-based" for example in the headline and image. You also erroneously conflate declarative migrations with somehow involving `mysqldump` for some reason? This is what I mean when I say posts like this feel like they're designed to make declarative migrations seem strange and foreign.

I realize this post is two years old, and you're understandably not going to mention a competing product like mine. But it feels like a disingenuous strawman argument to claim that declarative schema management requires `mysqldump`, considering that Skeema (my product) and SQLDef (another declarative tool) were both released in 2016 and are both widely used.

[1] https://www.bytebase.com/blog/database-version-control-state...

tianzhou · 2023-10-05T04:35:21.000000Z

We stated the reality at the time of writing that migration-based approach is more common. I think that still holds true today. Meanwhile, there are more solutions introducing the state-based solution (including Bytebase itself).

>> it feels like a disingenuous strawman argument to claim that declarative schema management requires `mysqldump`

The article explicitly said it's a hybrid approach and never intended to claim this way. Otherwise, it would be an obvious mis-statement.

evanelias · 2023-10-05T05:47:56.000000Z

> We stated the reality at the time of writing that migration-based approach is more common.

Yes, and I have not disputed this at any point here.

> The article explicitly said it's a hybrid approach and never intended to claim this way. Otherwise, it would be an obvious mis-statement.

Your blog post said "State-based approach stores the desired end state of the entire schema in the code repository. For MySQL, it means to store the schema dump created by mysqldump." I think that is indeed an obvious mis-statement. It would give a casual unfamiliar reader the impression that declarative schema management somehow involves/requires mysqldump.

cpursley · 2023-10-03T23:36:31.000000Z

I’m 100% in on Postgres but what you describe sounds awesome.

peterldowns · 2023-10-04T00:00:12.000000Z

If you’re seriously willing to pay money for this, send me an email — I’m considering implementing this in my pgmigrate tool but not sure if it’s worth the development time. I have all the groundwork done, probably achievable in about 10 hours of concerted effort.

t1mmen · 2023-10-03T17:38:02.000000Z

I’ve looked everywhere for this in NodeJS & adjacent stacks; almost all migration tools seem to focus on tables, columns and rows. None seem to deal with views, functions, triggers.

I only got back into Postgres this year, after almost a decade away from SQL. It’s kind of bizarre to me that the migration tooling is still at the stage where a 1 line change to eg a Postgres function requires a the whole function to be dropped and re-created?

I understand this is needed at the db level, but surely a “definition” that generates the final migration is doable; it would make such a huge difference in code reviews and to understand how a function/etc changed over time.

Am I just looking in the wrong place? Does this exist? If not, how come? Is it really that hard to do?

peterldowns · 2023-10-03T17:40:05.000000Z

I believe Migra can generate those changes for you via diffing, not sure how well it handles dependent views/functions/etc

t1mmen · 2023-10-03T18:17:23.000000Z

Thanks! I’ll give it a look (their docs are offline atm)

DrizzleKit and several others do this for table changes, but nothing I’ve found (possibly excluding Flyway and other Java options) do views/functions/etc.

peterldowns · 2023-10-04T00:02:25.000000Z

The migra maintainer has abandoned it for greener pastures. If you’re interested in sponsoring an alternative, send me an email. My pgmigrate tool has all the groundwork necessary to make this possible but I have held off implementing this because I am not personally interested in using it.

t1mmen · 2023-10-04T15:45:11.000000Z

Oh, that’s unfortunate.

I’m not too keen on sponsoring an alternative when the dev doesn’t particularly care about the feature — it didn’t work out last I tried something similar. But, perhaps there’s another path: Supabase.

They are in (imo) desperate need of better migration tooling, and they sponsor several open source projects that boost their over-all offering.

AFAIK, they haven’t done much in the migration space yet (aside from their alpha db branching feature), so I expect they’ll co-opt an open source solution at some point soon. May be worth pinging them? Seems it could be a win-win there.

Guillaume86 · 2023-10-03T21:43:30.000000Z

SSDT for mssql can do it, I have my schema as SQL files in the repo, the tooling can diff and migrate between schema versions.

pierat · 2023-10-03T15:29:52.000000Z

"Undoable" in this case means "possible to undo", not irreversible!

wccrawford · 2023-10-03T15:48:03.000000Z

It wouldn't mean "irreversible" regardless. It would mean "not possible to do" (ie impossible) or "possible to undo".

bitslayer · 2023-10-03T15:39:05.000000Z

That is a funny word. I am a big proponent of in-word dashes, which in this case could help clarify. It is "undo-able", not "un-doable".

murkt · 2023-10-03T15:38:08.000000Z

Yeah, needs a dash there. Undo-able.

exekias · 2023-10-03T15:53:39.000000Z

Thank you for noticing this! We are looking into changing the wording :)

jandrese · 2023-10-03T15:37:36.000000Z

Thanks, I was wondering why they were advertising the process to be impossible. A better term would have been "reversible".

canadiantim · 2023-10-03T15:37:22.000000Z

Seems like very unfortunate wording then as irreversible is exactly what I thought

hamilyon2 · 2023-10-03T16:06:15.000000Z

"With easy rollback", because rollback is well understood

systems · 2023-10-03T15:49:02.000000Z

yes, horrible choice of word, i read it to mean one way schema migration (but zero downtime so maybe worth the risk)

i think he should use the obvious word if this is what it means: reversible

ris · 2023-10-04T00:04:14.000000Z

This does look awesome, though I think I'd need a lot of trust in it before I went down this route. It seems to take a pretty drastic approach and it's unclear how complex a schema it would be able to handle. e.g. how well would its use of triggers to synchronize fields work with a schema that itself uses triggers in various ways? I can imagine some weird interactions that could take place.

I'd also be a bit nervous that any of these tricks may themselves cause (perhaps temporary) performance problems.

nojvek · 2023-10-04T03:54:42.000000Z

There is no magic bullet to database migration unless your app is deployed with the database itself and can atomically switch to new version as a DDL change is committed, or you take down the whole ship in maintenance mode and bring it back up once app + DB is migrated.

The only way to do it without downtime is to make the app forward compatible before making DB changes, and make it backward compatible before undoing the DB changes.

There will always be a short period where the app version and DB version will be out of sync.

tryithard · 2023-10-03T20:31:44.000000Z

Cool, We use liquibase I wonder how this compares to it?

Also, how do you handle the back filling on columns, how you make sure you don't miss any data before dropping the old column?

exekias · 2023-10-03T20:36:28.000000Z

I don't know much about Liquidbase, but I believe it doesn't support accessing both the old and the new schema versions at the same time? (I could be wrong here)

Backfilling happens in batches, we use the PK of the table to update all rows, a trigger is also installed so any new insert/update executes the backfill mechanism to update any new column.

More details can be found here: https://github.com/xataio/pgroll/blob/main/pkg/migrations/ba...

kakoni · 2023-10-03T17:51:11.000000Z

Using this thread to ask; Is there some simple sql based migration tool for psql?

a8m · 2023-10-03T20:14:29.000000Z

Check out: https://github.com/ariga/atlas. It’s like Terraform for databases.

(I'm one of the authors of this project).

nextaccountic · 2023-10-03T22:45:52.000000Z

Forr postgres, how does the schema diffing aspect compare to migra?

https://github.com/djrobstep/migra

I'm asking because, although migra is excellent and there are multiple migrations tools based on it (at least https://github.com/bikeshedder/tusker and https://github.com/blainehansen/postgres_migrator), issues are piling up but development seem to be slowing down

rotemtam · 2023-10-04T05:43:54.000000Z

Hey,

I'm @a8m's co-founder and also a maintainer of Atlas. I don't have an exact feature comparision but Atlas (and esp the Postgres driver) are heavily maintained with advanced features like views and materialized views (functions and procedures coming up, followed by triggers and more).

What features specifically are you missing in migra? Perhaps we can prioritize them. Also feel free to join our Discord https://discord.gg/zZ6sWVg6NT

tudorg · 2023-10-03T15:02:27.000000Z

hey HN, repo is here: https://github.com/xataio/pgroll

Would love to hear your thoughts!

gvkhna · 2023-10-03T15:27:31.000000Z

Great to see more innovation in this space! How does this compare to?

https://github.com/shayonj/pg-osc

exekias · 2023-10-03T15:51:34.000000Z

Hi there, I'm one of the pgroll authors :)

I could be mistaken here, but I believe that pg-osc and pgroll use similar approaches to ensuring no locking or how backfilling happens.

While pg-osc uses a shadow table and switches to it at the end of the process, pgroll creates shadow columns within the existing table and leverages views to expose old and new versions of the schema at the same time. Having both versions available means you can deploy the new version of the client app in parallel to the old one, and perform an instant rollback if needed.

gvkhna · 2023-10-04T02:37:29.000000Z

Thanks for the reply, a write up on pros/cons in these approaches would be fantastic. I have no clue which is better but I believe pgosc is heavily inspired by github/gh-ost, their tool for online schema change for mysql.

brycethornton · 2023-10-03T16:17:18.000000Z

Does pgroll have any process to address table bloat after the migration? One of the (many) nice things about pg-osc is that it results in a fresh new table without bloat.

surjection · 2023-10-03T17:03:20.000000Z

Another pgroll author here :)

I'm not very familiar with pg-osc, but migrations with pgroll are a two phase process - an 'in progress' phase, during which both old and new versions of the schema are accessible to client applications, and a 'complete' phase after which only the latest version of the schema is available.

To support the 'in progress' phase, some migrations (such as adding a constraint) require creating a new column and backfilling data into it. Triggers are also created to keep both old and new columns in sync. So during this phase there is 'bloat' in the table in the sense that this extra column and the triggers are present.

Once completed however, the old version of this column is dropped from the table along with any triggers so there there is no bloat left behind after the migration is done.

brycethornton · 2023-10-03T17:21:41.000000Z

Thanks for the reply. My question was specifically about the MVCC feature that creates new rows for updates like this. If you're backfilling data into a new column then you'll likely end up creating new rows for the entire table and the space for the old rows will be marked for re-use via auto-vacuuming. Anyway, bloat like this is a big pain for me when make migrations on huge tables. It doesn't sound like this type of bloat cleanup is a goal for pgroll. Regardless, it's always great to have more options in this space. Thanks for your work!

pritambaral · 2023-10-03T17:18:42.000000Z

> ... so there there is no bloat left behind after the migration is done.

This is only true after all rows are rewritten after the old column is dropped. In standard, unmodified Postgres, DROP COLUMN does not rewrite existing tuples.

nwhnwh · 2023-10-03T15:28:26.000000Z

Is it possible to make it work using SQL only in the future?

Also, what about if the user can just maintain one schema file (no migrations), and the lib figures out the change and applies it?

surjection · 2023-10-03T15:56:32.000000Z

Hi, one of the authors of pgroll here.

Migrations are JSON format as opposed to pure SQL for at least a couple of reasons:

1. The need to define up and down SQL scripts that are run to backfill a new column with values from an old column (eg when adding a constraint).

2. Each of the supported operation types is careful to sequence operations in such a way to avoid taking long-lived locks (eg, initially creating constraints as NOT VALID). A pure SQL solution would push this kind of responsibility onto migration authors.

A state-based approach to infer migrations based on schema diffs is out of scope for pgroll for now but could be something to consider in future.

aseering · 2023-10-03T16:19:28.000000Z

Thanks for releasing this tool! I actually interpreted the question differently: Rather than manipulating in SQL, would you consider exposing it as something like a stored procedure? Could still take in JSON to describe the schema change, and would presumably execute multiple transactions under the hood. But this would mean I can invoke a migration from my existing code rather than needing something out-of-band, I can use PG’s existing authn/authz in a simple way, etc.

gorkish · 2023-10-03T15:58:37.000000Z

> Also, what about if the user can just maintain one schema file (no migrations), and the lib figures out the change and applies it?

Because that only solves the DDL issues and not the DML. It is still useful though.

I use a schema comparison tool that does exactly this to assist in building my migration and rollback plans, but when simply comparing two schemas there is no way to tell the difference (for example) between a column rename and a drop column/add column. The tooling provides a great scaffold and saves a ton of time.

potamic · 2023-10-03T16:32:04.000000Z

The link to the introductory blog post here appears to be broken

https://xata.io/blog/pgroll-schema-migrations-postgres

aschleck · 2023-10-03T15:05:06.000000Z

Cool stuff! Do you have any thoughts about how this compares to https://github.com/fabianlindfors/reshape?

tudorg · 2023-10-03T15:11:23.000000Z

Great question! Reshape was definitely a source of inspiration, and in fact, our first PoC version was based on it.

We decided to start a new project for a couple of reasons. First, we preferred it to have it in Go, so we can integrated it easier in Xata. And we wanted to push it further, based on our experience, to also deal with constraints (with Reshape constraints are shared between versions).

jmccarthy · 2023-10-03T17:21:40.000000Z

Thank you for the project! I did a brief scan for an explanation of the PG 14+ version constraint. Which 14-specific features are you relying on?

exekias · 2023-10-03T18:29:54.000000Z

We use a few statements that are not supported by previous versions. For instance `CREATE OR REPLACE TRIGGER`.

Supporting previous versions would be possible, but we went for the most recent ones to reduce complexity.

Already__Taken · 2023-10-03T17:36:53.000000Z

the 3 schemas is new to me and a cool idea but, how about a migration that removes a column into a few linking tables. what does that look like? I've changed note table colmun into a note-type and it's attached ID so you can add a note to anything. initially with notes from the original table colmun. how does that undo

exekias · 2023-10-03T18:33:41.000000Z

I didn't fully understand the change that you are explaining, but in general having old & new schemas working relies on the migration definition having the proper `up` & `down` functions defined. These are postgres functions so you can fit any logic in them to ensure that a column deleted in the new schema gets properly backfilled to the old one.

I wonder if there will be cases where this gets too complex or otherwise hits any limits. It may be possible that for those cases a raw SQL migration is required.

ps256 · 2023-10-03T17:25:21.000000Z

How do people typically do migrations in production anyway - what tools are used? what are the best practices?

peterldowns · 2023-10-03T17:38:32.000000Z

Flyway is generally the most popular. Django/alembic in the python world.

I think most migration tools are missing a lot of features that would make them safer and easier to use. Modern teams merge a lot and deploy frequently, but most migration tools are built for a db admin manually running commands.

I wrote a migration cli tool (and golang library) that I think is more suited to modern workflows, if you check it out I’d appreciate any feedback!

https://github.com/peterldowns/pgmigrate

claytonjy · 2023-10-03T19:17:44.000000Z

I used sqitch in a past job and loved it, but I had to basically implement what pgroll does automatically in order to ensure smooth migrations, and I hit plenty of issues along the way. Learned a lot, but pgroll looks much friendlier.

lazyant · 2023-10-03T17:32:06.000000Z

Flyway and Liquibase are typical migration tools.

Best practice is to have your new code backwards-compatible with the old schema so you can do blue-green deployments.

meowtimemania · 2023-10-03T17:31:24.000000Z

At my job we do database migrations 1 week and then deploy code the next. That way if any problems happen in either, it’s easier to roll back.

jonny_eh · 2023-10-03T16:20:52.000000Z

In one of the diagrams: "Rename column lastname", did they mean "add column lastname"?

tudorg · 2023-10-03T16:22:02.000000Z

Ah yes, good catch! Fixing..

sidcool · 2023-10-03T16:07:52.000000Z

Why does a migration tool need to be subscription based? One time fee should be enough.

exekias · 2023-10-03T16:09:56.000000Z

Hi, one of the authors here!

The tool is open source and doesn't require any subscription :)

happytoexplain · 2023-10-03T16:34:03.000000Z

To add context for others:

Reading the homepage as a total DB noob, all signs point to this product being a CLI tool. However, when you get to the bottom, you see that it is subscription-based with a free tier. Reading the feature list ("availability zones", "storage", etc), it sounds like this is in fact a service with a CLI frontend? But again, this is a layman perspective, and some clarity from the authors (or other commenters) might help.

Update: I think I see what happened. Xata is a serverless DB service, and this is a tool they wrote that can be used independently of their service (I assume). The subscription options presented to the user look like they are related to this CLI tool, but they are in fact for the broader Xeta service.

tudorg · 2023-10-03T16:50:51.000000Z

> Update: I think I see what happened. Xata is a serverless DB service, and this is a tool they wrote that can be used independently of their service (I assume). The subscription options presented to the user look like they are related to this CLI tool, but they are in fact for the broader Xeta service.

Yes, that's right. Xata is a Postgres-based service and we're working on exposing the Postgres DB directly and unrestricted to our users. As part of this, we're also open-sourcing parts of the platform. We'll have more such open source projects soon.

This is not exactly the case with pgroll, as its approach is different from what we do today in Xata, but we'll be incorporating pgroll in Xata soon.

tibanne · 2023-10-03T15:53:44.000000Z

If I'm using alembic for schema migrations already, how do I make use of this?

surjection · 2023-10-03T16:21:03.000000Z

We are looking to build integrations with other tools but for now isn't recommended to use pgroll alongside another migration tool.

To try out pgroll on a database with an existing schema (whether created by hand or by another migration tool), you should be able to have pgroll infer the schema when you run your first migration.

You could try this out in a staging/test environment by following the docs to create your first migration with pgroll. The resulting schema will then contain views for all your existing tables that were created with alembic. Subsequent migrations could then be created with pgroll.

It would be great to try this out and get some feedback on how easy it is to make this switch; it may be that the schema inference is incomplete in some way.

canadiantim · 2023-10-03T21:31:58.000000Z

Any chance of doing something similar with sqlite?

moltar · 2023-10-03T22:44:23.000000Z

Are there any plans for an abstraction over JSON?

tlarkworthy · 2023-10-03T21:53:21.000000Z

How would I fit this into an alembic workflow?

jakswa · 2023-10-03T20:08:24.000000Z

if you're in rails, then in my experience you just add `safe-pg-migrations` gem and call it a day :D

nerdyadventurer · 2023-10-06T12:01:20.000000Z

Is this ready for prod use?

1ba9115454 · 2023-10-03T17:54:54.000000Z

I didn't look too deep at this as soon as it said JSON. I was gone.

I've been using dbmate which uses SQL and works really well.

totallywrong · 2023-10-04T04:50:12.000000Z

This looks amazing, thank you!