I have not worked with Spark, but I have used Athena/Trino and BigQuery extensiv...

simicd · 2024-11-16T08:14:31 1731744871

I'm using both Spark and polars, to me the appeal of polars is additionally it is also much faster and easier to set up.

Spark is great if you have large datasets since you can easily scale as you said. But if the dataset is small-ish (<50 million rows) you hit a lower bound in Spark in terms of how fast the job can run. Even if the job is super simple it take 1-2 minutes. Polars on the other hand is almost instantaneous (< 1 second). Doesn't sound like much but to me makes a huge difference when iterating on solutions.

fastasucan · 2024-11-16T08:44:04 1731746644

>With a single node engine you have a ceiling how good it can get.

Well, you are a lot closer to that with Polars than with Pandas at least.