Hacker Newsnew | past | comments | ask | show | jobs | submit | more Nihilartikel's commentslogin

I did non trivial work with apache spark dataframes and came to appreciate them before ever being exposed to Pandas. After spark, pandas just seemed frustrating and incomprehensible. Polars is much more like spark and I am very happy about that.

DuckDb even goes so far as to include a clone of the pyspark dataframe API, so somebody there must like it too.


I had a similar experience with spark, especially in the Scala API it felt very expressive and concise once I got used to certain idioms. Also +1 on duckdb which is excellent.

There are some frustrations in spark however, I remember getting stuck on Winsorizing over groups. Hilariously there are identical functions called `percentile_approx` and `approx_percentile` and it wasn't clear from the docs they were the same or at least did the same thing.

Given all that, the ergonomics of Julia for general purpose data handling is really unmatched IMO. I've got a lot of clean and readable data pipeline and shaping code that I revisited a couple years later and could easily understand. And making updates with new more type-generic functions is a breeze. Very enjoyable.


Spark docs are way too minimal for my taste, at least the API docs.


yeah i couldnt get it done in spark api had to combine spark and spark sql bc the window function i needed was (probably) not available in spark. it was inelegant i thought.


I have not worked with Spark, but I have used Athena/Trino and BigQuery extensively.

For me I don't really understand the hype around Polars, other than that it fixes some annoying issues with the Pandas API by sacrificing backwards compatibility.

With a single node engine you have a ceiling how good it can get.

With Spark/Athena/BigQuery the sky is the limit. It is such a freedom to not be limited by available RAM or CPU. They just scale to what they need. Some queryies squeeze in CPU-days in just a few minutes.


I'm using both Spark and polars, to me the appeal of polars is additionally it is also much faster and easier to set up.

Spark is great if you have large datasets since you can easily scale as you said. But if the dataset is small-ish (<50 million rows) you hit a lower bound in Spark in terms of how fast the job can run. Even if the job is super simple it take 1-2 minutes. Polars on the other hand is almost instantaneous (< 1 second). Doesn't sound like much but to me makes a huge difference when iterating on solutions.


>With a single node engine you have a ceiling how good it can get.

Well, you are a lot closer to that with Polars than with Pandas at least.


I don't know how well the polars implementation works, but what I love about PySpark is that sometimes spark is able to push those groupings down to the database. Not always, but sometimes. However I imagine that many people love polars/pandas performance for transactional queries (from start to finish get me a result in less than a second (as long as the number of underlying rows is not greater than 20k-ish). Pyspark will never be super great for that.


I thought the same thing about Spark, coming from R and later Pandas.


I had just been playing the splendid voxel game teardown recently, and noticed on reading this headline that characters in the game are named after Amanatides and Woo! What a fun Easter egg.


This looks pretty good! Results of roughly this caliber are already really common with local, and freely usable tools and models though. Picking one randomly: https://github.com/jtscmw01/ComfyUI-DiffBIR

The Reddit StableDiffusion and related groups have a ton of upscaling workflows that use diffusion models, GANS and the like to dream up the additional pixels for extreme zoom-and-enhance use cases.


Weeellll. Not every forum has a dang. Just saying.


Almost every one does.


Most are nowhere near as thoughtful and effective as he is, though.


I'm a big fan of the film grain modelling in HEVC over h264 at almost any bit rate. It just makes everything look way less 'compressed' to me.


I'm reminded fondly of the opening chapter to QNTM's Fine Structure, which sets up the story with a similarly baffling (yet self consistent, in my opinion) but wildly evocative narration of a blazing dogfight across a multiverse of hyperspaces.


One of the ways I failed to enjoy Max Gladstone's Empress of Forever was that it seemed hellbent on coining a significant new glossary worth of vocabulary along the way. I had to be well rested and attentive to muddle through with some marginal comprehension of any given chapter's plot, I certainly couldn't read it while drowsy or distracted.


I run a physical Trello of post it notes running across the bottom of my ultra wide monitor.

Priority descends from left to right.

When finishing a task, I pick the next one from the left that fits the time/attention/deadline budget I have available.

It works mostly, at the expense of some crumpled up paper.


Same with my kindergartener! Like, what's their use if I have to phrase everything as an imperative command?


Much like the LLMs, in a few years their capabilities will be much improved and you won't have to.


I've found the Unicode cat emoji to be an effective delimiter to avoid escaping more common chars in my cat-separated-value artifacts.

Of course the cat emoji is escaped by the puppy emoji if it occurs in a value. The puppy emoji escapes itself when needed.


There is https://github.com/SixArm/usv which is exactly that, but with special unicode characters


In the 80's i thought we should have an entire character set just for code. While never implemented the idea arguably aged well.

I also considered a dedicated keyboard like apl just to be dense about it.

Have each character signed by the keyboard so that we have proof by whoem it was typed and when.

People who dont work here don't get to write code. It just wont happen. haha


> In the 80's i thought we should have an entire character set just for code.

APL got pretty close.


Instructions unclear, my puppy emoji is now chasing its own tail


last line unclear ⬛ an example would be great!


I read that as the puppy emoji escapes itself as two characters print a single character, similar to \ in python strings using \\ to print \


Think backlashes in shell. \$ is just $, \\$ is literal ‘\$’


I hear what you're saying - though I've had good luck (i.e. haven't had to give it a thought in two years) with Syncthing pumping my vaults (and all business files for that matter) to phone, laptop, and backup nas.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: