Very cool, I stand corrected. I hope one day I have another opportunity to play ...

gricardo99 · on May 10, 2016

Do you really mean flat CSV text files? I get the simplicity of that, but it seems really expensive (speed and size). But I'm used to tables with more than a dozen columns, and with kdb+ you only be pull in the columns of interest, and the rows of interest (due to on-disk sorting and grouping), which is a smaller subset, often much smaller.

yummyfajitas · on May 11, 2016

By number, my data sets are usually in CSV. I could probably get some additional advantage via HDF5, but a gzipped CSV is usually good enough and simpler. By volume (i.e. on my 2 or 3 biggest data sets) I'll probably be mostly HDF5. I haven't tried feather yet but it looks pretty nice.

KDB would probably be better, but don't underestimate what you can do with just a bunch of files.