Important PostgreSQL 14 update to avoid silent corruption of indexes

mattashii · on June 9, 2022

It is noted in the post, I'll repeat it to be clear:

This corruption can only occur _during_ a (re)index with CONCURRENTLY specified, on rows that are modified during the reindex operation, and only for that index.

No other indexes are impacted, and an index can only be impacted when the updates on the table don't update indexed columns.

Nevertheless, if you frequently run CIC, you could be having this issue -- right now you can detect the issue with amcheck, and fix it with a non-concurrent REINDEX of the index (yes, this locks the table).

Alternatively (not listed in the blog post, but possible if you can't afford table locks), you should be able to safely CIC (without corruption) by doing manual snapshot control while concurrently reindexing the index (takes 3 database sessions):

In session 1 start a REPEATABLE READ read-only transaction. In session 2, start the concurrent (re)index. In session 3, monitor pg_stat_progress_create_index for session 2 to get to a 'waiting for ...' phase.

When you detect the (re)index session arrives in a waiting phase, start a REPEATABLE READ transaction in this session, and then halt (commit or rollback doesn't matter) the transaction in session 1. Now, we switch roles of sessions 1 and 3, and repeat this lock-step while the session that is (re)indexing hasn't completed.

throwawaydingus · on June 10, 2022

What? That seems absurdly apologist even for me and I love postgres.

Everyone seems to be downplaying this corruption issue saying it only happens when x or y and it can be avoided by doing complex operation z but… if build anything that isn’t a toy or demo on Postgres you NEED to build indexes concurrently.

They shipped optimization to a feature to let indexes be built concurrently yet failed to perform thorough enough testing to ensure concurrent updates don’t disrupt the correctness of the index? Yikes, doesn't exactly make me confident in using any new Postgres features.

Further, there’s no workaround that wouldn’t cause a production outage due to requiring a table lock until they release the new build which is somehow not as soon as it’s available.

How are we not more alarmed by the impact of this?

mattashii · on June 10, 2022

> Everyone seems to be downplaying this corruption issue saying it only happens when x or y and it can be avoided by doing complex operation z but…

I don't say corruption is not happening, but I clarify that you might not be affected: this corruption is not as "silent" as the title of the original post would make you think (it only occurs during concurrent reindex/index creation, which is not something that occurs naturally).

> if build anything that isn’t a toy or demo on Postgres you NEED to build indexes concurrently

I fully disagree. Many databases of customers I've seen are <5GB in size (largest table ~800MB) and used by latency-sensitive workloads only 8 hours each day, while also being mostly read-only.

Locking DDL outside that 8-hour window would be perfectly fine, and even within that window non-concurrent reindex generally would have been OK - sorting 800MB does take some 10 seconds, but that is only a small hiccup for data modifying workloads.

> Further, there’s no workaround that wouldn’t cause a production outage due to requiring a table lock until they release the new build which is somehow not as soon as it’s available.

As my comment describes, that is incorrect. If you carefully maintain snapshots in concurrent connections on that database, you can safely reindex the table concurrently while having as many locks on the table as PG13 had. These connections do not need to hold a lock on the target table, they only need to register and hold their snapshot at the right times.

tremon · on June 10, 2022

this corruption is not as "silent"

Whether corruption is silent or not has no relation to how it can be caused. It just means that the corruption is not automatically detected and/or resolved, and the system will happily use the corrupted data. Which seems to be the case here, as you admit that both detection and fixing require manual intervention.

mattashii · on June 10, 2022

The title implies that data is silently being corrupted, which is true, but in well-known and detectable workloads (CIC/RIC).

However, contrary to what the title would make you think, once they are created the indexes do not get more corrupt over time; the corruption does not spread. Instead, they get less and less corrupt over time as the rows missing from the index are either updated in a non-HOT manner, or deleted.

Someone · on June 10, 2022

> They shipped optimization to a feature to let indexes be built concurrently yet failed to perform thorough enough testing to ensure concurrent updates don’t disrupt the correctness of the index?

From what I can tell, a problem _may_ occur in those circumstances, but there’s no guarantee it _will_ occur. That can make finding it in testing difficult.

A performant ACID database is a very complex piece of software. It is a pipe dream to think about any such feature would ship without bugs.

See for example https://support.microsoft.com/en-us/topic/fix-data-corruptio..., which shows a similar issue on SQL Server.

atombender · on June 10, 2022

There's nothing apologist about the comment you're replying to. The parent is not defending the behaviour; they are explaining which scenarios are affected, and what the appropriate mitigations are. Nobody here is denying that it's a serious issue.

rastignack · on June 10, 2022

Those type of issues happened with Oracle database, Microsoft sql server, MySQL.

That’s the reason why you should always be at least one major release behind the latest.

baq · on June 10, 2022

> if build anything that isn’t a toy or demo on Postgres you NEED to build indexes concurrently.

except when you can't, because you have so many writes the reindex doesn't ever finish and the DB grinds down to a halt with 300 load. don't ask me how I know.

claytongulick · on June 10, 2022

Software has bugs.

Sometimes they're missed in testing.

Do you hold the software that you write to the same standard?

rattray · on June 10, 2022

To be fair, I think most users of Postgres hold it to a higher standard than the software they write themselves, and reasonably so.

Postgres has developed a sterling reputation for reliability, and this does tarnish it somewhat. (Not by much in my personal view, but not zero either).

snuxoll · on June 10, 2022

Not the first time there’s been a PostgreSQL bug like this, like always it will get a regression test and the PGDG will be better for it.

It’s not like other database products haven’t had very similar issues, either [https://support.microsoft.com/en-us/topic/fix-data-corruptio...]

Hell, last night I literally had a MSSQL server where Windows Server Failover Clustering randomly decided to delete all of the AlwaysOn Availability Group configurations from the node; along with random NTFS corruption (no unclean shutdown, SAN is fine) resulting in us having to restore master and msdb from backups before we could rejoin it to those AGs.

Bugs happen.

rattray · on June 10, 2022

Sure.

But notice how we're now comparing Postgres to MSSQL? Generally I think of it as head-and-shoulders better than it. I still do, but by a little less. That's okay.

Did they act responsibly, fix it quickly, and communicate effectively? Yes, tremendously so – which makes me feel overall better about using Postgres generally. But, I'll feel ever so slightly more hesitant about adopting a major version shortly after it comes out next time. Again, that's okay.

robertlagrant · on June 10, 2022

> Again, that's okay.

Don't need to be reassuring to HN commenters :)

RedShift1 · on June 10, 2022

So out of all the bugs you choose this one for tarnishing its reputation? Don't go looking for postgresql and fsync then...

merpkz · on June 10, 2022

Why do you say that anything but toy and demo NEED concurrent indexes? So far I have built toys and demos and never needed concurrent indexes, are they a must to dramatically speed up production workloads or am I missing something?

411111111111111 · on June 10, 2022

It's just a question of locking.

You'll be fighting tooth and nail against anything larger then a row level lock if you're targeting the consumer internet and allow user generated data/user writes.

you'll probably never care if that's not the case, which is arguably the usual situation. Even b2c Websites often don't have user generated content to speak of, so they're pretty much just talking from a bubble (I do get where they're coming from, I couldnt have done them either at some of my previous employers)

blep_ · on June 10, 2022

Non-concurrent indexing locks the table until it's done creating the index. If it's a large table, it locks it for a long time. This makes things trying to use that table time out.

This is a non-issue on things that don't have real traffic going to them, or on small enough tables that the indexing time is trivial.

mattashii · on June 9, 2022

I'll note that this process is basically keeping a snapshot that ensures that the data CIC/RIC needs is not cleaned up during the critical phases of the reindex process by keeping the cleanup horizon at a point where CIC/RIC can see all the versions of the rows that it needs to create the full index.

For PG14 the maintaining of that horizon was disabled for backends that run CIC/RIC, resulting in this bug. If you manually keep that horizon from moving using other backends on the same database, the old versions that CIC/RIC expects will not be removed, and thus no corruption will occur.

ants_a · on June 10, 2022

Simpler fix is to upgrade to fixed version and reindex concurrently any indexes that could have been affected.

mattashii · on June 10, 2022

That fixed version is not yet released (that is planned for 2022-06-16, in 6 days), and database upgrades are not trivial.

Sure, it's easier to "just go to the version that has this fixed", but if that's not possible, this might just be the next best thing.

Tostino · on June 10, 2022

Minor version upgrades are trivial, just a binary swap and restart away.

x4m · on June 10, 2022

Just revert d9d0762 or use actual REL_14_STABLE.

msk20 · on June 9, 2022

Looking at the file with changes https://github.com/postgres/postgres/blob/master/src/backend... , I have to say this source code repository is so well documented/commented and structured, I really gives you a huge trust in postgres to be used in your stack.

SonOfLilit · on June 10, 2022

Large C codebases _have_ to be exceptionally nice, or they immediately collapse under their own weight. As a dev team, the language teaches you this the hard way. I've never seen a terrible huge C codebase (but have seen many in other languages).

bigiain · on June 10, 2022

> I've never seen a terrible huge C codebase

I have 100% confidence they exist. They just don't get uploaded to Github out of shame or embarrasment.

obudo · on June 10, 2022

Can confirm they do. I worked at a company that was producing control systems for electric engines. Great environment and fun job but the code was beyond redemption. 15k line files with 2k+ lines #ifdef statements that ran different code for different customers, some variable names were just curses against pushy clients, not a single abstraction in sight.

Not only they exist, they power massive machines that could crush a person in the blink of an eye.

urthor · on June 10, 2022

I think OP just found a really well done C file.

You can definitely see many famous projects on GitHub, where even though the code is logical, you can't help feel it's a bit all over the place.

cpython isn't all sun and roses.

A broad generalization, but with C and C++ I'd say the standards for "good code" have dramatically risen in the last 10 years.

Variable names seem to have doubled in size.

e12e · on June 10, 2022

> I've never seen a terrible huge C codebase

libssl/openssl (before the rewrite/fixes especially)? Maybe doesn't qualify as huge?

See eg: https://www.youtube.com/watch?v=GnBbhXBDmwU "LibreSSL with Bob Beck" (the first 30 days)

fathyb · on June 10, 2022

Proprietary device drivers come to mind..

SonOfLilit · on June 10, 2022

They don't tend to be huge && in C

ventuspilot · on June 10, 2022

AMD's linux graphics driver has entered the chat

doliveira · on June 10, 2022

Thankfully they don't follow the whole mindset of "self-documenting code"

chungy · on June 10, 2022

PostgreSQL code quality is exceedingly fine indeed.

colordrops · on June 10, 2022

File could definitely be broken up a bit. Over 5000 lines! Just a nitpick though.

gnulinux · on June 10, 2022

Quite the opposite, I hate it when projects have dozens upon dozens of modules with 1 function. Multiple huge files are the best sweet spot. (Only crazy adn exceptional things like putting everything into a single file damages the readability imho)

colordrops · on June 10, 2022

Files with one function vs files with 5000 lines are not the only two options.

fjfbsufhdvfy · on June 10, 2022

I've got it! We could use files with one function that spans 5000 lines.

colordrops · on June 10, 2022

How about a function that spans 5000 files

pcthrowaway · on June 10, 2022

Why even use 5000 lines when you you can often combine them all into one line.

pmontra · on June 10, 2022

That's the job of the compiler, then we can use a disassembler to read the actual optimized source code.

osigurdson · on June 10, 2022

300 to 1000 lines per file is best IMO.

wahern · on June 10, 2022

IMO, paintings with the color blue are the best.

What does number of lines have to do with anything?

rgoulter · on June 10, 2022

At each extreme:

A source code file with very few lines will mean more cognitive load is required to remember which file (or package) some functionality is, and likely more effort to maintain the code.

Whereas, source code files which are very large mean more cognitive effort to remember where in the file some function is. In many languages, variables can be local to a file, which means use of such variables is riskier, etc.

It may be desirable to combine smaller files into a more coherent whole, or to split up an overly complicated large file into several smaller files.

Sure, without seeing code, I don't think you can come up with a concrete rule which exists in all cases. Rules of thumb can still be useful as an indication of maintenance effort.

jvolkman · on June 10, 2022

I get the impression that long files are culturally acceptable in systems-level C code. E.g., just a cherry-picked file from Linux: kernel/sched/core.c is over 11k lines.

https://github.com/torvalds/linux/blob/master/kernel/sched/c...

bombcar · on June 10, 2022

I feel the long file issue is mostly gone - that the problem isn’t the file length but spaghetti code. If it makes sense to be in one file it should be in one file. Breaking it up simply to reduce file length is counter productive.

cperciva · on June 10, 2022

There are 18609 .c files in my checked-out copy of the FreeBSD src tree. The median length is 258 lines; 90% are 1373 lines or shorter; 99% are 5241 lines or shorter.

The statistics for the 6071 .c files in the FreeBSD kernel are somewhat higher -- median is 460 lines; 90th percentile is 2070 lines; 99th percentile is 7678 lines -- but your example of a 11133 line file is definitely at the extreme high end.

jvolkman · on June 10, 2022

I didn't run any stats when I found that file. Just clicked around in github a handful of times looking for something that seemed like it'd be complex.

cozzyd · on June 10, 2022

yes are people using editors that don't let them have multiple views of the same file or something?

colordrops · on June 10, 2022

A filepath is an index and a hiearchy that adds information and structure. It can't be completely replaced by editor affordances.

carlhjerpe · on June 10, 2022

Depends on the language. C# somewhat replaces paths with namespaces, then you navigate classes and methods with editor tooling. I remember back when I was on Visual Studio writing C++ that they did something similar.

RcouF1uZ4gsC · on June 10, 2022

Actually I don’t mind big files. It is simpler scanning through it or doing a quick search than if you had a bunch of smaller files. And 5000 lines is not awkward for most editors, especially as many editors have the ability to collapse functions.

eyelidlessness · on June 10, 2022

This feels like a good area for tooling (editors, source hosts, SCM extensions) to improve experience. I don’t always mind large source files (and sometimes may prefer them over large file system hierarchies), but the can be a pain to navigate in some circumstances.

As an example, making several related changes in very different parts of a file, where you need to cross-reference between them. The changes themselves might be small, but it’s a huge cognitive burden to alternate/iterate through them. I’d love to have a view which temporarily projects those targets as if they’re isolated files without changing the actual structure on disk. I’d love it so much I actually do this manually for a lot of tasks, creating temp files to prepare edits for related areas of code. But then I lose a lot of the benefits of tools which understand what’s being referenced. It would be great to just type a quick command (or click or whatever) to say “don’t refactor this function to another file, but let’s pretend you did, for a while”.

BlueDingo · on June 10, 2022

This is how older editors like emacs work. You interact with views/windows/tabs called buffers and those buffers can have files loaded into them. Multiple buffers can reference the same code file but view different sections simultaneously. So you can investigate or edit different parts of one huge file the same way you would smaller ones.

wahern · on June 10, 2022

You just blew my mind.

I use JOE and JOE also supports multiple views into the same file. In fact, to open another file is two commands: the open view/window (^KO) command followed by edit file command (^KE). I've always used this facility for as long as I can remember and it never really occurred to me until now that people using more modern editors and especially GUI editors may not enjoy this same convenience--either not possible or no simple chain of command inputs to get there. And it's not like I don't use GUI editors, just not in situations where I would realize this feature was missing.

joking · on June 10, 2022

You can do this in vscode as well, even have it side by side. I’m missing something?

qw · on June 10, 2022

I just opened IntelliJ and it has this feature too, where you can "Split right" or "Split down" and have multiple views of the same file. Thanks for letting me know this is something editors might support.

For some reason I have not noticed that feature before. It's not like it is hidden either, as it is in the "right click" context menu that I use daily. I guess I need to learn the tool, so that I don't miss useful features like this.

mjw1007 · on June 10, 2022

For the "temporarily projects those targets as if they’re isolated files" part, you want something like narrow-indirect or edit-indirect.

https://www.emacswiki.org/emacs/NarrowIndirect

https://github.com/Fanael/edit-indirect

djsavvy · on June 10, 2022

> The changes themselves might be small, but it’s a huge cognitive burden to alternate/iterate through them. I’d love to have a view which temporarily projects those targets as if they’re isolated files without changing the actual structure on disk.

This is exactly how vim buffers work (for instance in a split) work.

eyelidlessness · on June 10, 2022

I’m glad to see I’ve brought the emacs/vim people together for a common purpose!

cozzyd · on June 10, 2022

this is what :[v]split is for.. .

plq · on June 10, 2022

While I'm not sure that was a consideration here, sometimes C compilers produce better machine code when they got access to more function definitions. Eg. Sqlite recommends that embedders use the single ~10mb sqlite.c file[1] for both ease of use and performance reasons.

[1]: https://www.sqlite.org/amalgamation.html

colordrops · on June 10, 2022

Seems like something the compiler should take care of.

fjfbsufhdvfy · on June 10, 2022

It can when you use LTO, but that tends to be very slow for large programs.

plq · on June 10, 2022

It is slow. Our (C++) project's MSVC release build ends with a glorious 2-minute run of link.exe with lto (/LTCG /O2) and aggressive inlining (/Ob3) enabled.

Limiting unit size in C++ helps with faster edit/compile/run cycles as well, which doesn't seem to be concern for C codebases in 21st century.

anarazel · on June 10, 2022

You hit most of the same issues (and some additional ones) when compiling the same code within one translation unit...

throwawaylinux · on June 10, 2022

5000 lines looks like half are comments or whitespace, none of the functions look more than a couple hundred lines or a few levels of control flow depth. Pretty harsh nitpick.

colordrops · on June 10, 2022

The point of calling it a nitpick was to indicate precisely that my comment wasn't meant to be taken harshly.

throwawaylinux · on June 10, 2022

But nitpick still means it is a nit, I don't really think it is at all. And you're saying it in response to someone who said the code is very clean, which is quite petty.

colordrops · on June 10, 2022

> nitpick: engage in fussy or pedantic fault-finding.

Yes, I'm fully aware that the comment is petty. The point is to communicate that I don't disagree that the code is clean and well written.

cesarb · on June 9, 2022

A more official announcement seems to be at https://www.postgresql.org/message-id/165473835807.573551.15...

taspeotis · on June 9, 2022

I think this is the fix: https://github.com/postgres/postgres/commit/e28bb885196916b0...

Discussion: https://www.postgresql.org/message-id/17485-396609c6925b982d...

boiler_up800 · on June 10, 2022

It’s impressive that this is a single revert. That speaks to how the development of Postgres is done atomically.

Also not surprised to see it was the EDB team with the expertise to fix it. Their model is a little outdated but they have a lot of experts working there.

gurjeet · on June 10, 2022

I'm not sure who you're referring to as the EDB team. Please share what make you think that.

The commit [1] (dug up by someone else in this discussion), references 4 people (including the committer), and only one of them seems to be EDB employee. (Info gleaned from their respective LinkedIn profiles, and from email signatures sent to pgsql-hackers list)

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

mattashii · on June 10, 2022

All of those mentioned in the commit helped in the diagnosis, while Andres was the one to detail the full chain of events that describes how the corruption happened, in [0].

[0] https://www.postgresql.org/message-id/20220524190133.j6ee7zh...

optimalsolver · on June 10, 2022

I'm running 12.11. Do I need to do anything?

x4m · on June 10, 2022

You are running on up to date version which seems to have no known critical bugs.

hipokampa · on June 10, 2022

an upgrade

optimalsolver · on June 10, 2022

The folks on 14 probably wish they held off for a bit. :-D

avi_vallarapu · on June 9, 2022

pg_amcheck can be helpful for validation if its a b-tree index

ioltas · on June 10, 2022

FWIW, amcheck can be useful but I am not 100% sure that it can detect all the broken cases either.

cfernhout · on June 10, 2022

Awesome update!

TedShiller · on June 10, 2022

I love Postgres. But when I look at the bug list of every release it makes me scared. The types of bugs they have are indicative of a poor development process.

avi_vallarapu · on June 10, 2022

I worked on Oracle and MySQL for ages and seen tons of bugs. As a support company, the number of bugs we encounter with Postgres are none or hardly 1 in an year. I am still curios to see what those bugs are that made one scary !!!

jabiko · on June 10, 2022

Yeah, ORA-00600 comes to mind. At my previous company we had to restore from backups once because of this error.

bzzzt · on June 10, 2022

ORA-600 just means an Oracle process crashed while executing a query. Could be a bug in the query optimizer or a dead disk controller. Just restoring a backup without understanding the root cause seems like setting up the next occurrence...

jtc331 · on June 10, 2022

No, it's indicative of the complexity of the system.

Please show other similarly complex systems with fewer bugs: basically doesn't happen.

Underphil · on June 10, 2022

Yep. We all have dirty laundry, it's just that we can see it open-source projects.

wiz21c · on June 10, 2022

And at least, we see it. It's much easier to appreciate the threat when you see the submerged part of the iceberg :-)

Avamander · on June 10, 2022

Don't look at MariaDB or MySQL changelog or issue tracker then. You'd get a heart attack.

avi_vallarapu · on June 10, 2022

Can you name a few such bugs please?

vfclists · on June 10, 2022

Surprising how people feel the need to respond to a post by someone with the handle "TedShiller".

Is that a real name?

Sorry if it is and I hurt your feelings.

throwawaylinux · on June 10, 2022

What types of bugs are those? Can you explain a bit more?