More

Nihilartikel · on Nov 15, 2024

I did non trivial work with apache spark dataframes and came to appreciate them before ever being exposed to Pandas. After spark, pandas just seemed frustrating and incomprehensible. Polars is much more like spark and I am very happy about that.

DuckDb even goes so far as to include a clone of the pyspark dataframe API, so somebody there must like it too.

banku_brougham · on Nov 16, 2024

I had a similar experience with spark, especially in the Scala API it felt very expressive and concise once I got used to certain idioms. Also +1 on duckdb which is excellent.

There are some frustrations in spark however, I remember getting stuck on Winsorizing over groups. Hilariously there are identical functions called `percentile_approx` and `approx_percentile` and it wasn't clear from the docs they were the same or at least did the same thing.

Given all that, the ergonomics of Julia for general purpose data handling is really unmatched IMO. I've got a lot of clean and readable data pipeline and shaping code that I revisited a couple years later and could easily understand. And making updates with new more type-generic functions is a breeze. Very enjoyable.

appplication · on Nov 16, 2024

Spark docs are way too minimal for my taste, at least the API docs.

banku_brougham · on Nov 16, 2024

yeah i couldnt get it done in spark api had to combine spark and spark sql bc the window function i needed was (probably) not available in spark. it was inelegant i thought.

fifilura · on Nov 16, 2024

I have not worked with Spark, but I have used Athena/Trino and BigQuery extensively.

For me I don't really understand the hype around Polars, other than that it fixes some annoying issues with the Pandas API by sacrificing backwards compatibility.

With a single node engine you have a ceiling how good it can get.

With Spark/Athena/BigQuery the sky is the limit. It is such a freedom to not be limited by available RAM or CPU. They just scale to what they need. Some queryies squeeze in CPU-days in just a few minutes.

simicd · on Nov 16, 2024

I'm using both Spark and polars, to me the appeal of polars is additionally it is also much faster and easier to set up.

Spark is great if you have large datasets since you can easily scale as you said. But if the dataset is small-ish (<50 million rows) you hit a lower bound in Spark in terms of how fast the job can run. Even if the job is super simple it take 1-2 minutes. Polars on the other hand is almost instantaneous (< 1 second). Doesn't sound like much but to me makes a huge difference when iterating on solutions.

fastasucan · on Nov 16, 2024

>With a single node engine you have a ceiling how good it can get.

Well, you are a lot closer to that with Polars than with Pandas at least.

coding123 · on Nov 16, 2024

I don't know how well the polars implementation works, but what I love about PySpark is that sometimes spark is able to push those groupings down to the database. Not always, but sometimes. However I imagine that many people love polars/pandas performance for transactional queries (from start to finish get me a result in less than a second (as long as the number of underlying rows is not greater than 20k-ish). Pyspark will never be super great for that.

nerdponx · on Nov 16, 2024

I thought the same thing about Spark, coming from R and later Pandas.

Nihilartikel · on Nov 1, 2024

I had just been playing the splendid voxel game teardown recently, and noticed on reading this headline that characters in the game are named after Amanatides and Woo! What a fun Easter egg.

Nihilartikel · on Oct 22, 2024

This looks pretty good! Results of roughly this caliber are already really common with local, and freely usable tools and models though. Picking one randomly: https://github.com/jtscmw01/ComfyUI-DiffBIR

The Reddit StableDiffusion and related groups have a ton of upscaling workflows that use diffusion models, GANS and the like to dream up the additional pixels for extreme zoom-and-enhance use cases.

Nihilartikel · on Oct 19, 2024

Weeellll. Not every forum has a dang. Just saying.

immibis · on Oct 19, 2024

Almost every one does.

kelnos · on Oct 20, 2024

Most are nowhere near as thoughtful and effective as he is, though.

Nihilartikel · on Sept 30, 2024

I'm a big fan of the film grain modelling in HEVC over h264 at almost any bit rate. It just makes everything look way less 'compressed' to me.

Nihilartikel · on Sept 17, 2024

I'm reminded fondly of the opening chapter to QNTM's Fine Structure, which sets up the story with a similarly baffling (yet self consistent, in my opinion) but wildly evocative narration of a blazing dogfight across a multiverse of hyperspaces.

mtndew4brkfst · on Sept 18, 2024

One of the ways I failed to enjoy Max Gladstone's Empress of Forever was that it seemed hellbent on coining a significant new glossary worth of vocabulary along the way. I had to be well rested and attentive to muddle through with some marginal comprehension of any given chapter's plot, I certainly couldn't read it while drowsy or distracted.

Nihilartikel · on Sept 7, 2024

I run a physical Trello of post it notes running across the bottom of my ultra wide monitor.

Priority descends from left to right.

When finishing a task, I pick the next one from the left that fits the time/attention/deadline budget I have available.

It works mostly, at the expense of some crumpled up paper.

Nihilartikel · on Aug 27, 2024

Same with my kindergartener! Like, what's their use if I have to phrase everything as an imperative command?

lemming · on Aug 27, 2024

Much like the LLMs, in a few years their capabilities will be much improved and you won't have to.

Nihilartikel · on Aug 2, 2024

I've found the Unicode cat emoji to be an effective delimiter to avoid escaping more common chars in my cat-separated-value artifacts.

Of course the cat emoji is escaped by the puppy emoji if it occurs in a value. The puppy emoji escapes itself when needed.

exidex · on Aug 2, 2024

There is https://github.com/SixArm/usv which is exactly that, but with special unicode characters

theendisney4 · on Aug 2, 2024

In the 80's i thought we should have an entire character set just for code. While never implemented the idea arguably aged well.

I also considered a dedicated keyboard like apl just to be dense about it.

Have each character signed by the keyboard so that we have proof by whoem it was typed and when.

People who dont work here don't get to write code. It just wont happen. haha

acuozzo · on Aug 2, 2024

> In the 80's i thought we should have an entire character set just for code.

APL got pretty close.

Hackbraten · on Aug 2, 2024

Instructions unclear, my puppy emoji is now chasing its own tail

geekodour · on Aug 2, 2024

last line unclear ⬛ an example would be great!

ok_computer · on Aug 2, 2024

I read that as the puppy emoji escapes itself as two characters print a single character, similar to \ in python strings using \\ to print \

TylerE · on Aug 2, 2024

Think backlashes in shell. \$ is just $, \\$ is literal ‘\$’

Nihilartikel · on July 28, 2024

I hear what you're saying - though I've had good luck (i.e. haven't had to give it a thought in two years) with Syncthing pumping my vaults (and all business files for that matter) to phone, laptop, and backup nas.