I really like the look of vortex[1]! One of my industry pet peeves is all the useless utf-8 server log bytes. I'd like to log data in a sane, schemaful, binary format and this looks like it could be a good way to do that. Bonus points if we can wire this up as a physical layer for e.g. datafusion[2] so I can analyze my logs with the dataframe abstraction.
EDIT: Question about FSST--lets say I build a strings table like:
I've worked on a database that considered FSST as a way to query over compressed log lines. We found that the compression ratio was highly dependent on how repetitive the data was. In the end segmenting by service (Apache, our go stuff, our rust stuff, etc) yielded pretty good results and log lengths of ~200 bytes were pretty well compressed.
We ended up not using it in production because the worst cases were absolutely terrible compared to our dumber skippable zstd.
EDIT: Question about FSST--lets say I build a strings table like:
Is there some optimal length for compressed given the 255 symbols limit?[1] https://github.com/spiraldb/vortex [2] https://github.com/apache/datafusion