Is using flatbuffers as the on-disk storage format for an application a hare-bra...

maximilianburke · on Jan 17, 2023

I don't think it's hare-brained, I think it'd be great. No less hare-brained than storing stuff to disk in any other format like json or yaml.

That said, the ergonomics are absolutely awful for modifying existing objects; you can't modify an existing object, you need to serialize a whole new object.

There's also a schemaless version (flexbuffers) which retains a number of the flatbuffers benefits (zero-copy access to data, compact binary representation), but is also a lot easier to use for ad-hoc serialization and deserialization; you can `loads`/`dumps` the flexbuffer objects, for example.

nerpderp82 · on Jan 17, 2023

> ctypes Python module to mmap a file as a C struct

Tell me more! Is your data larger than memory? You need persistence?

You might take a look at Aerospike, even on a single node if you need low latency persistence.

politician · on Jan 17, 2023

It really depends on what you're storing and how you need to access the content to meet performance requirements. A simple example of where flatbuffers shines is in TCP flows or in Kafka where each message is a flatbuffer. In Kafka, the message size and type can be included in metadata. In TCP, framing is your responsibility. Serializing a message queue to a flat file is reasonable and natural.

Regarding files-as-C-structs: That isn't (necessarily) harebrained (if you can trust the input), Microsoft Word .DOC files were just memory dumps. However, twiddling bits inside a flatbuffer isn't recommended per the documentation; rather, the guidance is to replace the entire buffer. If you don't want to manage the IO yourself, then a key/value store that maps indices to flatbuffers is entirely possible. I'd suggest a look at Redis.

chillacy · on Jan 18, 2023

Read only is a good case, afaik one of the usecases of flatbuffers is that you can mmap a huge flatbuffer file and then randomly access the data quickly without paying a huge deserialization cost.