Hacker News new | past | comments | ask | show | jobs | submit login

> But my post is a frustration with the sheer amount of tooling and knowledge required to get started - for example, the case where non-partitioned data is foisted upon non-experts.

How is that different from throwing a massive relational database at someone who doesn't know how to manage indexes and other optimizations?




Not OP, but I'd guess there's greater industry awareness of relational DBs than there are of parquet files. I've been on the receiving end of a Parquet file that I didn't know how to crack open the ambiguity on how to proceed was frustrating.


This is true. There are two tools you need to know for this: duckdb and visidata. With these tools, Parquet is almost as easy as CSVs (but a few orders of magnitude more powerful and faster)

Parquet is also usable in polars and pandas, and Apache Spark too but that’s getting into complicated territory.

DuckDB it’s literally just

   Select * from ‘s3://bucket/*.parquet’


Relational database technology is more highly proven, stable, documented, and consistent than the constellation of big data solutions. Learning about indexes etc. 20 years ago would still help you today. Learning about this year's big data stack may not even help you next year.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: