Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is using flatbuffers as the on-disk storage format for an application a hare-brained idea?

If yes, is it a less hare-brained idea than using the ctypes Python module to mmap a file as a C struct? That's what I'm currently doing to get 10x speedup relative to SQLite for an application bottlenecked on disk bandwidth, but it's unergonomic to say the least.

Flatbuffers look like a way to get the same performance with better ergonomics, but maybe there's a catch. (E.g. I thought the same thing about Apache Arrow before, but then I realized it's basically read-only. I don't expect to need to resize my tables often, but I do need to be able to twiddle individual values inside the file.)



I don't think it's hare-brained, I think it'd be great. No less hare-brained than storing stuff to disk in any other format like json or yaml.

That said, the ergonomics are absolutely awful for modifying existing objects; you can't modify an existing object, you need to serialize a whole new object.

There's also a schemaless version (flexbuffers) which retains a number of the flatbuffers benefits (zero-copy access to data, compact binary representation), but is also a lot easier to use for ad-hoc serialization and deserialization; you can `loads`/`dumps` the flexbuffer objects, for example.


> ctypes Python module to mmap a file as a C struct

Tell me more! Is your data larger than memory? You need persistence?

You might take a look at Aerospike, even on a single node if you need low latency persistence.


It really depends on what you're storing and how you need to access the content to meet performance requirements. A simple example of where flatbuffers shines is in TCP flows or in Kafka where each message is a flatbuffer. In Kafka, the message size and type can be included in metadata. In TCP, framing is your responsibility. Serializing a message queue to a flat file is reasonable and natural.

Regarding files-as-C-structs: That isn't (necessarily) harebrained (if you can trust the input), Microsoft Word .DOC files were just memory dumps. However, twiddling bits inside a flatbuffer isn't recommended per the documentation; rather, the guidance is to replace the entire buffer. If you don't want to manage the IO yourself, then a key/value store that maps indices to flatbuffers is entirely possible. I'd suggest a look at Redis.


Read only is a good case, afaik one of the usecases of flatbuffers is that you can mmap a huge flatbuffer file and then randomly access the data quickly without paying a huge deserialization cost.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: