Hacker News new | past | comments | ask | show | jobs | submit login
Important PostgreSQL 14 update to avoid silent corruption of indexes (migops.com)
330 points by avi_vallarapu on June 9, 2022 | hide | past | favorite | 93 comments



It is noted in the post, I'll repeat it to be clear:

This corruption can only occur _during_ a (re)index with CONCURRENTLY specified, on rows that are modified during the reindex operation, and only for that index.

No other indexes are impacted, and an index can only be impacted when the updates on the table don't update indexed columns.

Nevertheless, if you frequently run CIC, you could be having this issue -- right now you can detect the issue with amcheck, and fix it with a non-concurrent REINDEX of the index (yes, this locks the table).

Alternatively (not listed in the blog post, but possible if you can't afford table locks), you should be able to safely CIC (without corruption) by doing manual snapshot control while concurrently reindexing the index (takes 3 database sessions):

In session 1 start a REPEATABLE READ read-only transaction. In session 2, start the concurrent (re)index. In session 3, monitor pg_stat_progress_create_index for session 2 to get to a 'waiting for ...' phase.

When you detect the (re)index session arrives in a waiting phase, start a REPEATABLE READ transaction in this session, and then halt (commit or rollback doesn't matter) the transaction in session 1. Now, we switch roles of sessions 1 and 3, and repeat this lock-step while the session that is (re)indexing hasn't completed.


What? That seems absurdly apologist even for me and I love postgres.

Everyone seems to be downplaying this corruption issue saying it only happens when x or y and it can be avoided by doing complex operation z but… if build anything that isn’t a toy or demo on Postgres you NEED to build indexes concurrently.

They shipped optimization to a feature to let indexes be built concurrently yet failed to perform thorough enough testing to ensure concurrent updates don’t disrupt the correctness of the index? Yikes, doesn't exactly make me confident in using any new Postgres features.

Further, there’s no workaround that wouldn’t cause a production outage due to requiring a table lock until they release the new build which is somehow not as soon as it’s available.

How are we not more alarmed by the impact of this?


> Everyone seems to be downplaying this corruption issue saying it only happens when x or y and it can be avoided by doing complex operation z but…

I don't say corruption is not happening, but I clarify that you might not be affected: this corruption is not as "silent" as the title of the original post would make you think (it only occurs during concurrent reindex/index creation, which is not something that occurs naturally).

> if build anything that isn’t a toy or demo on Postgres you NEED to build indexes concurrently

I fully disagree. Many databases of customers I've seen are <5GB in size (largest table ~800MB) and used by latency-sensitive workloads only 8 hours each day, while also being mostly read-only.

Locking DDL outside that 8-hour window would be perfectly fine, and even within that window non-concurrent reindex generally would have been OK - sorting 800MB does take some 10 seconds, but that is only a small hiccup for data modifying workloads.

> Further, there’s no workaround that wouldn’t cause a production outage due to requiring a table lock until they release the new build which is somehow not as soon as it’s available.

As my comment describes, that is incorrect. If you carefully maintain snapshots in concurrent connections on that database, you can safely reindex the table concurrently while having as many locks on the table as PG13 had. These connections do not need to hold a lock on the target table, they only need to register and hold their snapshot at the right times.


this corruption is not as "silent"

Whether corruption is silent or not has no relation to how it can be caused. It just means that the corruption is not automatically detected and/or resolved, and the system will happily use the corrupted data. Which seems to be the case here, as you admit that both detection and fixing require manual intervention.


The title implies that data is silently being corrupted, which is true, but in well-known and detectable workloads (CIC/RIC).

However, contrary to what the title would make you think, once they are created the indexes do not get more corrupt over time; the corruption does not spread. Instead, they get less and less corrupt over time as the rows missing from the index are either updated in a non-HOT manner, or deleted.


> They shipped optimization to a feature to let indexes be built concurrently yet failed to perform thorough enough testing to ensure concurrent updates don’t disrupt the correctness of the index?

From what I can tell, a problem _may_ occur in those circumstances, but there’s no guarantee it _will_ occur. That can make finding it in testing difficult.

A performant ACID database is a very complex piece of software. It is a pipe dream to think about any such feature would ship without bugs.

See for example https://support.microsoft.com/en-us/topic/fix-data-corruptio..., which shows a similar issue on SQL Server.


There's nothing apologist about the comment you're replying to. The parent is not defending the behaviour; they are explaining which scenarios are affected, and what the appropriate mitigations are. Nobody here is denying that it's a serious issue.


Those type of issues happened with Oracle database, Microsoft sql server, MySQL.

That’s the reason why you should always be at least one major release behind the latest.


> if build anything that isn’t a toy or demo on Postgres you NEED to build indexes concurrently.

except when you can't, because you have so many writes the reindex doesn't ever finish and the DB grinds down to a halt with 300 load. don't ask me how I know.


Software has bugs.

Sometimes they're missed in testing.

Do you hold the software that you write to the same standard?


To be fair, I think most users of Postgres hold it to a higher standard than the software they write themselves, and reasonably so.

Postgres has developed a sterling reputation for reliability, and this does tarnish it somewhat. (Not by much in my personal view, but not zero either).


Not the first time there’s been a PostgreSQL bug like this, like always it will get a regression test and the PGDG will be better for it.

It’s not like other database products haven’t had very similar issues, either [https://support.microsoft.com/en-us/topic/fix-data-corruptio...]

Hell, last night I literally had a MSSQL server where Windows Server Failover Clustering randomly decided to delete all of the AlwaysOn Availability Group configurations from the node; along with random NTFS corruption (no unclean shutdown, SAN is fine) resulting in us having to restore master and msdb from backups before we could rejoin it to those AGs.

Bugs happen.


Sure.

But notice how we're now comparing Postgres to MSSQL? Generally I think of it as head-and-shoulders better than it. I still do, but by a little less. That's okay.

Did they act responsibly, fix it quickly, and communicate effectively? Yes, tremendously so – which makes me feel overall better about using Postgres generally. But, I'll feel ever so slightly more hesitant about adopting a major version shortly after it comes out next time. Again, that's okay.


> Again, that's okay.

Don't need to be reassuring to HN commenters :)


So out of all the bugs you choose this one for tarnishing its reputation? Don't go looking for postgresql and fsync then...


Why do you say that anything but toy and demo NEED concurrent indexes? So far I have built toys and demos and never needed concurrent indexes, are they a must to dramatically speed up production workloads or am I missing something?


It's just a question of locking.

You'll be fighting tooth and nail against anything larger then a row level lock if you're targeting the consumer internet and allow user generated data/user writes.

you'll probably never care if that's not the case, which is arguably the usual situation. Even b2c Websites often don't have user generated content to speak of, so they're pretty much just talking from a bubble (I do get where they're coming from, I couldnt have done them either at some of my previous employers)


Non-concurrent indexing locks the table until it's done creating the index. If it's a large table, it locks it for a long time. This makes things trying to use that table time out.

This is a non-issue on things that don't have real traffic going to them, or on small enough tables that the indexing time is trivial.


I'll note that this process is basically keeping a snapshot that ensures that the data CIC/RIC needs is not cleaned up during the critical phases of the reindex process by keeping the cleanup horizon at a point where CIC/RIC can see all the versions of the rows that it needs to create the full index.

For PG14 the maintaining of that horizon was disabled for backends that run CIC/RIC, resulting in this bug. If you manually keep that horizon from moving using other backends on the same database, the old versions that CIC/RIC expects will not be removed, and thus no corruption will occur.


Simpler fix is to upgrade to fixed version and reindex concurrently any indexes that could have been affected.


That fixed version is not yet released (that is planned for 2022-06-16, in 6 days), and database upgrades are not trivial.

Sure, it's easier to "just go to the version that has this fixed", but if that's not possible, this might just be the next best thing.


Minor version upgrades are trivial, just a binary swap and restart away.


Just revert d9d0762 or use actual REL_14_STABLE.


Looking at the file with changes https://github.com/postgres/postgres/blob/master/src/backend... , I have to say this source code repository is so well documented/commented and structured, I really gives you a huge trust in postgres to be used in your stack.


Large C codebases _have_ to be exceptionally nice, or they immediately collapse under their own weight. As a dev team, the language teaches you this the hard way. I've never seen a terrible huge C codebase (but have seen many in other languages).


> I've never seen a terrible huge C codebase

I have 100% confidence they exist. They just don't get uploaded to Github out of shame or embarrasment.


Can confirm they do. I worked at a company that was producing control systems for electric engines. Great environment and fun job but the code was beyond redemption. 15k line files with 2k+ lines #ifdef statements that ran different code for different customers, some variable names were just curses against pushy clients, not a single abstraction in sight.

Not only they exist, they power massive machines that could crush a person in the blink of an eye.


I think OP just found a really well done C file.

You can definitely see many famous projects on GitHub, where even though the code is logical, you can't help feel it's a bit all over the place.

cpython isn't all sun and roses.

A broad generalization, but with C and C++ I'd say the standards for "good code" have dramatically risen in the last 10 years.

Variable names seem to have doubled in size.


> I've never seen a terrible huge C codebase

libssl/openssl (before the rewrite/fixes especially)? Maybe doesn't qualify as huge?

See eg: https://www.youtube.com/watch?v=GnBbhXBDmwU "LibreSSL with Bob Beck" (the first 30 days)


Proprietary device drivers come to mind..


They don't tend to be huge && in C


AMD's linux graphics driver has entered the chat


Thankfully they don't follow the whole mindset of "self-documenting code"


PostgreSQL code quality is exceedingly fine indeed.


File could definitely be broken up a bit. Over 5000 lines! Just a nitpick though.


Quite the opposite, I hate it when projects have dozens upon dozens of modules with 1 function. Multiple huge files are the best sweet spot. (Only crazy adn exceptional things like putting everything into a single file damages the readability imho)


Files with one function vs files with 5000 lines are not the only two options.


I've got it! We could use files with one function that spans 5000 lines.


How about a function that spans 5000 files


Why even use 5000 lines when you you can often combine them all into one line.


That's the job of the compiler, then we can use a disassembler to read the actual optimized source code.


300 to 1000 lines per file is best IMO.


IMO, paintings with the color blue are the best.

What does number of lines have to do with anything?


At each extreme:

A source code file with very few lines will mean more cognitive load is required to remember which file (or package) some functionality is, and likely more effort to maintain the code.

Whereas, source code files which are very large mean more cognitive effort to remember where in the file some function is. In many languages, variables can be local to a file, which means use of such variables is riskier, etc.

It may be desirable to combine smaller files into a more coherent whole, or to split up an overly complicated large file into several smaller files.

Sure, without seeing code, I don't think you can come up with a concrete rule which exists in all cases. Rules of thumb can still be useful as an indication of maintenance effort.


I get the impression that long files are culturally acceptable in systems-level C code. E.g., just a cherry-picked file from Linux: kernel/sched/core.c is over 11k lines.

https://github.com/torvalds/linux/blob/master/kernel/sched/c...


I feel the long file issue is mostly gone - that the problem isn’t the file length but spaghetti code. If it makes sense to be in one file it should be in one file. Breaking it up simply to reduce file length is counter productive.


There are 18609 .c files in my checked-out copy of the FreeBSD src tree. The median length is 258 lines; 90% are 1373 lines or shorter; 99% are 5241 lines or shorter.

The statistics for the 6071 .c files in the FreeBSD kernel are somewhat higher -- median is 460 lines; 90th percentile is 2070 lines; 99th percentile is 7678 lines -- but your example of a 11133 line file is definitely at the extreme high end.


I didn't run any stats when I found that file. Just clicked around in github a handful of times looking for something that seemed like it'd be complex.


yes are people using editors that don't let them have multiple views of the same file or something?


A filepath is an index and a hiearchy that adds information and structure. It can't be completely replaced by editor affordances.


Depends on the language. C# somewhat replaces paths with namespaces, then you navigate classes and methods with editor tooling. I remember back when I was on Visual Studio writing C++ that they did something similar.


Actually I don’t mind big files. It is simpler scanning through it or doing a quick search than if you had a bunch of smaller files. And 5000 lines is not awkward for most editors, especially as many editors have the ability to collapse functions.


This feels like a good area for tooling (editors, source hosts, SCM extensions) to improve experience. I don’t always mind large source files (and sometimes may prefer them over large file system hierarchies), but the can be a pain to navigate in some circumstances.

As an example, making several related changes in very different parts of a file, where you need to cross-reference between them. The changes themselves might be small, but it’s a huge cognitive burden to alternate/iterate through them. I’d love to have a view which temporarily projects those targets as if they’re isolated files without changing the actual structure on disk. I’d love it so much I actually do this manually for a lot of tasks, creating temp files to prepare edits for related areas of code. But then I lose a lot of the benefits of tools which understand what’s being referenced. It would be great to just type a quick command (or click or whatever) to say “don’t refactor this function to another file, but let’s pretend you did, for a while”.


This is how older editors like emacs work. You interact with views/windows/tabs called buffers and those buffers can have files loaded into them. Multiple buffers can reference the same code file but view different sections simultaneously. So you can investigate or edit different parts of one huge file the same way you would smaller ones.


You just blew my mind.

I use JOE and JOE also supports multiple views into the same file. In fact, to open another file is two commands: the open view/window (^KO) command followed by edit file command (^KE). I've always used this facility for as long as I can remember and it never really occurred to me until now that people using more modern editors and especially GUI editors may not enjoy this same convenience--either not possible or no simple chain of command inputs to get there. And it's not like I don't use GUI editors, just not in situations where I would realize this feature was missing.


You can do this in vscode as well, even have it side by side. I’m missing something?


I just opened IntelliJ and it has this feature too, where you can "Split right" or "Split down" and have multiple views of the same file. Thanks for letting me know this is something editors might support.

For some reason I have not noticed that feature before. It's not like it is hidden either, as it is in the "right click" context menu that I use daily. I guess I need to learn the tool, so that I don't miss useful features like this.


For the "temporarily projects those targets as if they’re isolated files" part, you want something like narrow-indirect or edit-indirect.

https://www.emacswiki.org/emacs/NarrowIndirect

https://github.com/Fanael/edit-indirect


> The changes themselves might be small, but it’s a huge cognitive burden to alternate/iterate through them. I’d love to have a view which temporarily projects those targets as if they’re isolated files without changing the actual structure on disk.

This is exactly how vim buffers work (for instance in a split) work.


I’m glad to see I’ve brought the emacs/vim people together for a common purpose!


this is what :[v]split is for.. .


While I'm not sure that was a consideration here, sometimes C compilers produce better machine code when they got access to more function definitions. Eg. Sqlite recommends that embedders use the single ~10mb sqlite.c file[1] for both ease of use and performance reasons.

[1]: https://www.sqlite.org/amalgamation.html


Seems like something the compiler should take care of.


It can when you use LTO, but that tends to be very slow for large programs.


It is slow. Our (C++) project's MSVC release build ends with a glorious 2-minute run of link.exe with lto (/LTCG /O2) and aggressive inlining (/Ob3) enabled.

Limiting unit size in C++ helps with faster edit/compile/run cycles as well, which doesn't seem to be concern for C codebases in 21st century.


You hit most of the same issues (and some additional ones) when compiling the same code within one translation unit...


5000 lines looks like half are comments or whitespace, none of the functions look more than a couple hundred lines or a few levels of control flow depth. Pretty harsh nitpick.


The point of calling it a nitpick was to indicate precisely that my comment wasn't meant to be taken harshly.


But nitpick still means it is a nit, I don't really think it is at all. And you're saying it in response to someone who said the code is very clean, which is quite petty.


> nitpick: engage in fussy or pedantic fault-finding.

Yes, I'm fully aware that the comment is petty. The point is to communicate that I don't disagree that the code is clean and well written.


A more official announcement seems to be at https://www.postgresql.org/message-id/165473835807.573551.15...



It’s impressive that this is a single revert. That speaks to how the development of Postgres is done atomically.

Also not surprised to see it was the EDB team with the expertise to fix it. Their model is a little outdated but they have a lot of experts working there.


I'm not sure who you're referring to as the EDB team. Please share what make you think that.

The commit [1] (dug up by someone else in this discussion), references 4 people (including the committer), and only one of them seems to be EDB employee. (Info gleaned from their respective LinkedIn profiles, and from email signatures sent to pgsql-hackers list)

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...


All of those mentioned in the commit helped in the diagnosis, while Andres was the one to detail the full chain of events that describes how the corruption happened, in [0].

[0] https://www.postgresql.org/message-id/20220524190133.j6ee7zh...


I'm running 12.11. Do I need to do anything?


You are running on up to date version which seems to have no known critical bugs.


an upgrade


The folks on 14 probably wish they held off for a bit. :-D


pg_amcheck can be helpful for validation if its a b-tree index


FWIW, amcheck can be useful but I am not 100% sure that it can detect all the broken cases either.


Awesome update!


I love Postgres. But when I look at the bug list of every release it makes me scared. The types of bugs they have are indicative of a poor development process.


I worked on Oracle and MySQL for ages and seen tons of bugs. As a support company, the number of bugs we encounter with Postgres are none or hardly 1 in an year. I am still curios to see what those bugs are that made one scary !!!


Yeah, ORA-00600 comes to mind. At my previous company we had to restore from backups once because of this error.


ORA-600 just means an Oracle process crashed while executing a query. Could be a bug in the query optimizer or a dead disk controller. Just restoring a backup without understanding the root cause seems like setting up the next occurrence...


No, it's indicative of the complexity of the system.

Please show other similarly complex systems with fewer bugs: basically doesn't happen.


Yep. We all have dirty laundry, it's just that we can see it open-source projects.


And at least, we see it. It's much easier to appreciate the threat when you see the submerged part of the iceberg :-)


Don't look at MariaDB or MySQL changelog or issue tracker then. You'd get a heart attack.


Can you name a few such bugs please?


Surprising how people feel the need to respond to a post by someone with the handle "TedShiller".

Is that a real name?

Sorry if it is and I hurt your feelings.


What types of bugs are those? Can you explain a bit more?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: