Hacker News new | past | comments | ask | show | jobs | submit login

The only benefit this format provides is the ability to read some columns without needing to read all columns. Unfortunately it is not a seekable format. That's a pretty big miss.

It also wouldn't be that hard to make it seekable. All you would have to do is make each tsv file two columns: record-id, value.




What do you mean it's not seekable?

> ZIP files are a collection of individually compressed files, with a directory as a footer to the file, which makes it easy to seek to a specific file without reading the whole file... The nature of .zip files makes it possible to seek and read just the columns required without having to read/decode the other columns.


Seeking within a column


There's two ways to limit the number of column-rows you have to read. One is by file partitioning, that is having many ZSV files rather than one giant one, ideally organized by partitioning key field(s). The other way is mentioned as an extension to the format itself which functions much like rowgroups do in Parquet. https://github.com/Hafthor/zsvutil?tab=readme-ov-file#row-gr...

Thanks for taking a look.


Wouldn’t be too hard to add a secondary “file” in the zip with an extra index




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: