I'd like to see it speed tested against an instance of Postgres using the file F...

ftisiot · on Jan 4, 2024

Just modified the original post to add the file_fdw. Again, none of the instances (PG or ClickHouse) were optimised for the workload https://ftisiot.net/posts/1brows/

rustforlinux · on Jan 7, 2024

>The result is 44.465s!

Very nice! Is this time for the first run or second? Is there a big difference between the first and second run, please?

ftisiot · on Jan 8, 2024

first run! I just rerun the experiment: - ~46 secs on the first run - ~22 secs on the following runs

yencabulator · on Jan 5, 2024

Not including this in the benchmark time is cheating:

    \copy TEST(CITY, TEMPERATURE) FROM 'measurements.txt' DELIMITER ';' CSV;

ftisiot · on Jan 7, 2024

The loading time it's included in both examples

In the first one, the table is dropped, recreated, populated and queries. In the second example, the table is created from a file FDW to the CSV file. In both examples the loading time is included in the total time

DiggyJohnson · on Jan 4, 2024

For some reason I'm impressed that the previous page in the documentation is https://www.postgresql.org/docs/current/earthdistance.html. Lotta fun stuff in Postgres these days.

nathancahill · on Jan 4, 2024

Man, Postgres is so cool and powerful.

aargh_aargh · on Jan 4, 2024

Using FDWs on a daily basis I fully realize their power and appeal but at this exclamation I paused and thought - is this really how we think today? That reading a CSV file directly is a cool feature, state of the art? Sure, FDWs are much more than that, but I would assume we could achieve much more with Machine Learning, and not even just the current wave of LLMs.

Why not have the machine consider the data it is currently seeing (type, even actual values), think about what end-to-end operation is required, how often it needs to be repeated, make a time estimate (then verify the estimate, change it for the next run if needed, keep a history for future needs), choose one of methods it has at its disposal (index autogeneration, conversion of raw data, denormalization, efficient allocation of memory hierarchy, ...). Yeah, I'm not focusing on this specific one billion rows challenge but rather what computers today should be able to do for us.