> But my post is a frustration with the sheer amount of tooling and knowledge re...

bradford · 2024-05-09T18:37:45 1715279865

Not OP, but I'd guess there's greater industry awareness of relational DBs than there are of parquet files. I've been on the receiving end of a Parquet file that I didn't know how to crack open the ambiguity on how to proceed was frustrating.

wenc · 2024-05-09T18:45:50 1715280350

This is true. There are two tools you need to know for this: duckdb and visidata. With these tools, Parquet is almost as easy as CSVs (but a few orders of magnitude more powerful and faster)

Parquet is also usable in polars and pandas, and Apache Spark too but that’s getting into complicated territory.

DuckDB it’s literally just

   Select * from ‘s3://bucket/*.parquet’

add-sub-mul-div · 2024-05-09T19:00:21 1715281221

Relational database technology is more highly proven, stable, documented, and consistent than the constellation of big data solutions. Learning about indexes etc. 20 years ago would still help you today. Learning about this year's big data stack may not even help you next year.