Hacker News new | past | comments | ask | show | jobs | submit login

Real-world performance is complicated since data science covers a lot of use cases.

If you're just reading a small CSV to do analysis on it, then there will be no human-perceptible difference between Polars and Pandas. If you're reading a larger CSV with 100k rows, there still won't be much of a perceptible difference.

Per this (old) benchmark, there are differences once you get into 10 million rows/500MB+ territory: https://h2oai.github.io/db-benchmark/




DuckDB is publishing updates to the H20.ai benchmark: https://duckdb.org/2023/11/03/db-benchmark-update.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: