Hacker News new | past | comments | ask | show | jobs | submit login

From reading your vadosware.io notes, I'm intrigued that replacing fdatasync with fsync is supposed to make a difference to durability at the device level. Both functions are supposed to issue a FLUSH to the underlying device, after writing enough metadata that the file contents can be read back later.

If fsync works and fdatasync does not, that strongly suggests a kernel or filesystem bug in the implementation of fdatasync that should be fixed.

That said, I looked at the logs you showed, and those "Bad Address" errors are the EFAULT error, which only occurs in buggy software, or some issue with memory-mapping. I don't think you can conclude that NVMe writes are going missing when the pg software is having EFAULTs, even if turning off the NVMe write cache makes those errors go away. It seems likely that that's just changing the timing of whatever is triggering the EFAULTs in pgbench.




> From reading your vadosware.io notes, I'm intrigued that replacing fdatasync with fsync is supposed to make a difference to durability at the device level. Both functions are supposed to issue a FLUSH to the underlying device, after writing enough metadata that the file contents can be read back later.

Yeah I thought the same initially which is why I was super confused --

> If fsync works and fdatasync does not, that strongly suggests a kernel or filesystem bug in the implementation of fdatasync that should be fixed.

Gulp.

> That said, I looked at the logs you showed, and those "Bad Address" errors are the EFAULT error, which only occurs in buggy software, or some issue with memory-mapping. I don't think you can conclude that NVMe writes are going missing when the pg software is having EFAULTs, even if turning off the NVMe write cache makes those errors go away. It seems likely that that's just changing the timing of whatever is triggering the EFAULTs in pgbench.

It looks like I'm going to have to do some more experimentation on this -- maybe I'll get a fresh machine and try to reproduce this issue again.

What led me to NVMe as dropping write was the complete lack of errors on the pg and OS side (dmesg, etc).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: