Hacker News new | past | comments | ask | show | jobs | submit login

Cloud Storage FUSE does not support overwriting in the middle of a file. Only sequential writes are supported.

This seems like a big limitation?




One challenge with writes in the middle is that it changes the file hash. Cloud services typically expose the object hash, so changing any bit of a 1TB file would require a costly read of the whole object to compute the new hash.

You could spilt the file into smaller chunks and reassemble at the application layer. That way you limit the cost of changing any byte to the chunk size.

That could also support inserting or removing a byte. You'd have a new chunk of DEFUALT_CHUNK_SIZE+1 (or -1). Split and merge chunks when they get too large or small.

Of course at some point if you are using a file metaphor you want a real file system.


pretty standard limitation of object storage services iirc


Doesn't this mean that most programs you might want to use with the FUSE API won't actually work? They'll do fine for a while, until they try to seek, and then they'll get an error?

Or is there a large group of programs that only ever write sequentially?


I'd think non-appending writes are quite rare in practice, other than databases. Even when the application is logically overwriting data, in other kinds of programs it's almost always implemented as writing to a new file + an atomic rename, not in-place modification.


Even databases tend to do sequential writes, whether to a WAL or LSM tree.


Most programs either write a full file every time and replace the old file by a single move or append to an old file. Writting in the middle could happen in a program writting to some kind of archive or disk image. There is probably a whole group of programs that do this I'm not familiar with, but I'm pretty sure of my first sentence.


I'm not completely confident (I tried looking in the source and it wasn't immediately obvious) but I think emacs does small in-place edits when you're working with very large files.


sqlite, dbm and similars, various zip libraries and the file formats build on that


well yeah, but there's a lot of things FUSE makes easier. no need to implement a client library, no need to write some custom wrapper or rsync thing to sync files to the bucket or bucket to local system, etc. it won't work for every app but for the ones it does support it saves a ton of extra work and maintenance.


This works, there is nothing stopping it, but just like all cloud object storage it will trigger a complete re-write of the object when saved.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: