*Cloud Storage FUSE does not support overwriting in the middle of a file. Only s...

8organicbits · on May 2, 2023

One challenge with writes in the middle is that it changes the file hash. Cloud services typically expose the object hash, so changing any bit of a 1TB file would require a costly read of the whole object to compute the new hash.

You could spilt the file into smaller chunks and reassemble at the application layer. That way you limit the cost of changing any byte to the chunk size.

That could also support inserting or removing a byte. You'd have a new chunk of DEFUALT_CHUNK_SIZE+1 (or -1). Split and merge chunks when they get too large or small.

Of course at some point if you are using a file metaphor you want a real file system.

throwawaaarrgh · on May 2, 2023

pretty standard limitation of object storage services iirc

jefftk · on May 2, 2023

Doesn't this mean that most programs you might want to use with the FUSE API won't actually work? They'll do fine for a while, until they try to seek, and then they'll get an error?

Or is there a large group of programs that only ever write sequentially?

jsnell · on May 2, 2023

I'd think non-appending writes are quite rare in practice, other than databases. Even when the application is logically overwriting data, in other kinds of programs it's almost always implemented as writing to a new file + an atomic rename, not in-place modification.

dontlaugh · on May 2, 2023

Even databases tend to do sequential writes, whether to a WAL or LSM tree.

hawski · on May 2, 2023

Most programs either write a full file every time and replace the old file by a single move or append to an old file. Writting in the middle could happen in a program writting to some kind of archive or disk image. There is probably a whole group of programs that do this I'm not familiar with, but I'm pretty sure of my first sentence.

jefftk · on May 2, 2023

I'm not completely confident (I tried looking in the source and it wasn't immediately obvious) but I think emacs does small in-place edits when you're working with very large files.

formerly_proven · on May 2, 2023

sqlite, dbm and similars, various zip libraries and the file formats build on that

throwawaaarrgh · on May 2, 2023

well yeah, but there's a lot of things FUSE makes easier. no need to implement a client library, no need to write some custom wrapper or rsync thing to sync files to the bucket or bucket to local system, etc. it won't work for every app but for the ones it does support it saves a ton of extra work and maintenance.

scoobydoobydrew · on May 2, 2023

This works, there is nothing stopping it, but just like all cloud object storage it will trigger a complete re-write of the object when saved.