I had a similar experience with spark, especially in the Scala API it felt very ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

banku_brougham 48 days ago | parent | context | favorite | on: Non-elementary group-by aggregations in Polars vs ...

I had a similar experience with spark, especially in the Scala API it felt very expressive and concise once I got used to certain idioms. Also +1 on duckdb which is excellent.

There are some frustrations in spark however, I remember getting stuck on Winsorizing over groups. Hilariously there are identical functions called `percentile_approx` and `approx_percentile` and it wasn't clear from the docs they were the same or at least did the same thing.

Given all that, the ergonomics of Julia for general purpose data handling is really unmatched IMO. I've got a lot of clean and readable data pipeline and shaping code that I revisited a couple years later and could easily understand. And making updates with new more type-generic functions is a breeze. Very enjoyable.

appplication 48 days ago [–]

Spark docs are way too minimal for my taste, at least the API docs.

banku_brougham 48 days ago | [–]

yeah i couldnt get it done in spark api had to combine spark and spark sql bc the window function i needed was (probably) not available in spark. it was inelegant i thought.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact