One challenge with writes in the middle is that it changes the file hash. Cloud services typically expose the object hash, so changing any bit of a 1TB file would require a costly read of the whole object to compute the new hash.
You could spilt the file into smaller chunks and reassemble at the application layer. That way you limit the cost of changing any byte to the chunk size.
That could also support inserting or removing a byte. You'd have a new chunk of DEFUALT_CHUNK_SIZE+1 (or -1). Split and merge chunks when they get too large or small.
Of course at some point if you are using a file metaphor you want a real file system.
Doesn't this mean that most programs you might want to use with the FUSE API won't actually work? They'll do fine for a while, until they try to seek, and then they'll get an error?
Or is there a large group of programs that only ever write sequentially?
I'd think non-appending writes are quite rare in practice, other than databases. Even when the application is logically overwriting data, in other kinds of programs it's almost always implemented as writing to a new file + an atomic rename, not in-place modification.
Most programs either write a full file every time and replace the old file by a single move or append to an old file. Writting in the middle could happen in a program writting to some kind of archive or disk image. There is probably a whole group of programs that do this I'm not familiar with, but I'm pretty sure of my first sentence.
I'm not completely confident (I tried looking in the source and it wasn't immediately obvious) but I think emacs does small in-place edits when you're working with very large files.
well yeah, but there's a lot of things FUSE makes easier. no need to implement a client library, no need to write some custom wrapper or rsync thing to sync files to the bucket or bucket to local system, etc. it won't work for every app but for the ones it does support it saves a ton of extra work and maintenance.
This seems like a big limitation?