Is using flatbuffers as the on-disk storage format for an application a hare-brained idea?
If yes, is it a less hare-brained idea than using the ctypes Python module to mmap a file as a C struct? That's what I'm currently doing to get 10x speedup relative to SQLite for an application bottlenecked on disk bandwidth, but it's unergonomic to say the least.
Flatbuffers look like a way to get the same performance with better ergonomics, but maybe there's a catch. (E.g. I thought the same thing about Apache Arrow before, but then I realized it's basically read-only. I don't expect to need to resize my tables often, but I do need to be able to twiddle individual values inside the file.)
I don't think it's hare-brained, I think it'd be great. No less hare-brained than storing stuff to disk in any other format like json or yaml.
That said, the ergonomics are absolutely awful for modifying existing objects; you can't modify an existing object, you need to serialize a whole new object.
There's also a schemaless version (flexbuffers) which retains a number of the flatbuffers benefits (zero-copy access to data, compact binary representation), but is also a lot easier to use for ad-hoc serialization and deserialization; you can `loads`/`dumps` the flexbuffer objects, for example.
It really depends on what you're storing and how you need to access the content to meet performance requirements. A simple example of where flatbuffers shines is in TCP flows or in Kafka where each message is a flatbuffer. In Kafka, the message size and type can be included in metadata. In TCP, framing is your responsibility. Serializing a message queue to a flat file is reasonable and natural.
Regarding files-as-C-structs: That isn't (necessarily) harebrained (if you can trust the input), Microsoft Word .DOC files were just memory dumps. However, twiddling bits inside a flatbuffer isn't recommended per the documentation; rather, the guidance is to replace the entire buffer. If you don't want to manage the IO yourself, then a key/value store that maps indices to flatbuffers is entirely possible. I'd suggest a look at Redis.
Read only is a good case, afaik one of the usecases of flatbuffers is that you can mmap a huge flatbuffer file and then randomly access the data quickly without paying a huge deserialization cost.
If yes, is it a less hare-brained idea than using the ctypes Python module to mmap a file as a C struct? That's what I'm currently doing to get 10x speedup relative to SQLite for an application bottlenecked on disk bandwidth, but it's unergonomic to say the least.
Flatbuffers look like a way to get the same performance with better ergonomics, but maybe there's a catch. (E.g. I thought the same thing about Apache Arrow before, but then I realized it's basically read-only. I don't expect to need to resize my tables often, but I do need to be able to twiddle individual values inside the file.)