I have not worked with Spark, but I have used Athena/Trino and BigQuery extensively.
For me I don't really understand the hype around Polars, other than that it fixes some annoying issues with the Pandas API by sacrificing backwards compatibility.
With a single node engine you have a ceiling how good it can get.
With Spark/Athena/BigQuery the sky is the limit. It is such a freedom to not be limited by available RAM or CPU. They just scale to what they need. Some queryies squeeze in CPU-days in just a few minutes.
I'm using both Spark and polars, to me the appeal of polars is additionally it is also much faster and easier to set up.
Spark is great if you have large datasets since you can easily scale as you said. But if the dataset is small-ish (<50 million rows) you hit a lower bound in Spark in terms of how fast the job can run. Even if the job is super simple it take 1-2 minutes. Polars on the other hand is almost instantaneous (< 1 second). Doesn't sound like much but to me makes a huge difference when iterating on solutions.
For me I don't really understand the hype around Polars, other than that it fixes some annoying issues with the Pandas API by sacrificing backwards compatibility.
With a single node engine you have a ceiling how good it can get.
With Spark/Athena/BigQuery the sky is the limit. It is such a freedom to not be limited by available RAM or CPU. They just scale to what they need. Some queryies squeeze in CPU-days in just a few minutes.