Yep this would be fantastic. The ironic thing about it is that a fine-grained wr...

hyc_symas · on April 3, 2023

Write barriers have been discussed for years and years. Personally I prefer grouped writes. https://www.spinics.net/lists/linux-fsdevel/msg70047.html

The underlying SCSI and SATA protocols would support this with command queueing. I would have envisioned using an fcntl() to set the current group ID on an fd.

josephg · on April 3, 2023

Grouped writes seem more or less equivalent to write barriers in terms of their semantics, but probably with 1 less syscall - which is nice. But I'd take any of these approaches over years of discussion & no solutions.

I wonder what it would take to actually implement this stuff in linux. A lot of work, sure - but far from impossible.

Thanks for lmdb btw! The design is lovely. I've used it in a couple projects, and I had a read through your code for inspiration while I was designing out a little storage engine for something I've been working on. Its delightful.

hyc_symas · on April 3, 2023

> Thanks for lmdb btw!

Thanks, that's great feedback! Happy to hear that you're both using it successfully and that you actually read the code and learned from it.

eru · on April 3, 2023

Yes, POSIX compatibility is the culprit for many of the issues we have with filesystems today.

The POSIX requirements were made in and for a different world.

For example they also completely break down for distributed network filesystems. (Or at least destroy your hope of performance.)

josephg · on April 3, 2023

POSIX doesn't stop linux from introducing new, non-posix APIs. Eg, io_uring isn't part of POSIX but linux has it anyway. As far as I can tell, an API for write barriers, grouped writes, transactional writes or iocp should be able to work alongside the POSIX API just fine. Its just another API for writing files which user programs can opt in to using.

eru · on April 3, 2023

Yes, though for most practical file systems you still get hit by essentially having to also provide a POSIX compatible API.

bheadmaster · on April 3, 2023

They break down performance for programs that expect zero-latency filesystem and access it in C-style sequential fashion. If we rewrite all UNIX utilities to be concurrent by default, the file API wouldn't make things that much worse.

transactional · on April 3, 2023

It’s interesting to me that in the original proposal for fixing fsync to actually be durable (https://lwn.net/Articles/270891/), there was thought given to the desire for a non-flushing write barrier (ctrl-f SYNC_FILE_RANGE_NO_FLUSH), but it appears that it never got implemented.