The only benefit this format provides is the ability to read some columns withou...

karaterobot · 2024-04-13T23:30:41 1713051041

What do you mean it's not seekable?

> ZIP files are a collection of individually compressed files, with a directory as a footer to the file, which makes it easy to seek to a specific file without reading the whole file... The nature of .zip files makes it possible to seek and read just the columns required without having to read/decode the other columns.

thehappypm · 2024-04-13T23:38:32 1713051512

Seeking within a column

hafthor · 2024-04-14T15:52:53 1713109973

There's two ways to limit the number of column-rows you have to read. One is by file partitioning, that is having many ZSV files rather than one giant one, ideally organized by partitioning key field(s). The other way is mentioned as an extension to the format itself which functions much like rowgroups do in Parquet. https://github.com/Hafthor/zsvutil?tab=readme-ov-file#row-gr...

Thanks for taking a look.

pbnjay · 2024-04-14T00:14:48 1713053688

Wouldn’t be too hard to add a secondary “file” in the zip with an extra index