Hacker News new | past | comments | ask | show | jobs | submit login

You could certainly get the same effect without downtime prior to this. It just needed a few extra steps and some cooperation by the application: adding a null able column without default has always been instantaneous, so you would do that, then update all the rows that are missing the value.

Finally, you add the default to the schema which will now be in the millisecond range because all/most rows have a default.

And then you add the not null constraint which will also be in the ms range.

You can do all of this in a transaction too because postgres' Schema modifications are transactional.

This is how I have added countless columns with default values without downtime over the last few years.




This is incredibly slow updating a billion rows to give a value to a new column as a blanket update will still be run as an implicit transaction. As does the constraint check before the default is applied. It can also use a LOT of disk space to do this as well.


We always did that in batches about 1k to 10k rows. Takes a while but at least doesn't hammer the DB.


If you are doing this, pay attention to the amount of dead tuples being accumulated over batches. We had to vaccuum analyze after each batch at least on one occasion where the index stopped being used by the query planner after a certain threshold of dead tuples. Even though there was no table lock, the performance hit made the application unusable from long running queries.

The following query will show dead tuple count for the table: select tuple_count, dead_tuple_count from pgstattuple('table_name');


The article does address this, it still has to check the DB for NULLs after you're done. Less downtime, but for huge tables it does take a significant amount of time.


This optimization applies only for new columns with a stable default. Therefore a table scan is not required to check for NULLs, as either the default is NULL or not.


I was talking about the way to do it before. You could create a NULL-able column, add the default in batches, add the default/make it non-NULL-able. But, the last part took some time (not in the range of ms) for large tables.


> But, the last part took some time (not in the range of ms) for large tables

create a temporary index covering that column. Then it will be in the range of ms.


you can create a temporary conditional index on that table, then the check will be instantaneous. There is a `create index concurrently` that won't cause locks on the table.

With this the re-check for nulls when adding the not null constraint will not take any time at all.


Sorry, I should have clarified “accidentally downtime”. This doesn’t make anything new possibly, but will definitely eliminate a common way to shoot one’s self in the foot.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: