I dream of a database which would use CSV (with metadata/indices in CSV comment ...

sundarurfriend · on Jan 11, 2022

recfiles [1] are a sort of poor man's plaintext database that you can edit with any text editor. I found manually entering the data mildly annoying and repetitive, but visidata [2] supports the format, so I've been meaning to learn to use that for easier data entry and for tabular visualization.

[1] https://www.gnu.org/software/recutils/manual/recutils.html#I... [2] https://www.visidata.org/

Bonus: A nice concise intro that someone wrote last week: https://chrismanbrown.gitlab.io/28.html

LittlePeter · on Jan 11, 2022

In PostgreSQL you can use file_fdw extension:

https://www.postgresql.org/docs/current/file-fdw.html

bachmeier · on Jan 11, 2022

I think the parent comment was about reading and writing csv files. The documentation you linked says "Access to data files is currently read-only."

LittlePeter · on Jan 11, 2022

I missed that. You are right.

bachmeier · on Jan 11, 2022

What I've done in the past is:

Column 1 is the creation timestamp

Column 2 is the modification timestamp

On read, you update if the creation timestamp exists but the modification timestamp is later, otherwise you insert. Your app(s) can do a simple file append for writes. You even have the full version history at every point in time.

I did a lot of looking but didn't find a command-line tool that automated this process. It works fine for small projects of e.g. 100,000 records. Wouldn't work well for things like a notes app, because you'd be storing every modification as a new entry.

remcinerney · on Jan 11, 2022

That's interesting. I'm trying to imagine your workflow, and thinking about what serverless SQL platforms like Amazon Athena let you do now - i.e., you can more or less dump CSV files in blob storage and query them. Is that what you meant?

laumars · on Jan 11, 2022

AWS Athena can do this. Dump your CSVs in S3 and query them in Athena.

Personally I use sqlite for smaller datasets and wrap that around a CSV importer.

thriftwy · on Jan 11, 2022

Why dump CSVs when you can outright store in them?

laumars · on Jan 11, 2022

Sorry, I don’t understand your question. But just in case this answers it:

S3 is cloud storage on AWS. Athena can work directly off the CSVs stored on S3.

Where I said “dump” it was just a colourful way of saying “transfer your files to…”. I appreciate “dump” can also mean different things with databases so maybe that wasn’t the best choice of word in my part. Sorry for any confusion there.

cgio · on Jan 12, 2022

Dump is the proper term in this case, though, given S3 limitations (I.e. no append to file) which means you need to either create new file for each insert (very expensive) or append replace file for update. So practically it’s a workable read only replica with dumps of updates. For reasonably small datasets it can potentially work otherwise you look at rebalancing, partitions etc. and probably you’re better off with parquet,avro etc. given you are usually at the stage of introducing spark etc.

cerved · on Jan 11, 2022

I'm not sure I share that dream..

CSV is so horribly non-standardized and horrible to parse. JSON appears a much more suitable candidate

mr_toad · on Jan 11, 2022

PrestoDB. Or one of the many SaaS offerings like Athena, BigQuery or USQL.