Above that it says “DataFrames for a new era” hidden in their graphics. I believ...

CobrastanJorji · 2024-01-09T00:19:04.000000Z

It seems like it's a disease endemic to data products. Everybody, the big cloud providers and the small data products, build something whose selling point is "I'm the same as Apache X but better." But if you don't know what Apache X is, you have to go read up on that, and its website might say "I'm the same as Whatever Else but better," and you have to go read up on that. I don't want to figure out what a product does by walking a "like X but better" chain and applying diffs in my head. Just tell me what it does!

I get that these are general purpose tools with a lot of use cases, but some real quick examples of "this is a good use case" and "this is a bad use case, maybe prefer SQL/nosql/quasisql/hadoop/a CSV file and sed" would be really helpful, please.

sanderjd · 2024-01-09T04:16:14.000000Z

I dunno, I get the criticism, but also, every field assumes a large amount of "lingua franca" in order to avoid documenting foundational things over and over again.

Programming language documentation doesn't all start with "programming languages are used to direct computers to do things"; it is assumed the target audience knows that. Database documentation similarly doesn't start out with discussing what it means to store and access data and why you'd want to do that.

It's always hard to know where to draw this line, and the early iterations of a new idea really do need to put more time into describing what they are from first principles.

I remember this from the early days of "NoSQL" databases. They spilled lots of ink on what they even were trying to do and why.

But in my view this isn't one of those times. I think "DataFrames" are well within a "lingua franca" that is reasonable to expect the audience of this kind of tool to understand. This is not an early iteration of a concept that is not widely familiar, it is an iteration of an old, mature, and foundational concept with essentially universal penetration in the field where it is relevant.

Having said all that, I came across this "what is mysql" documentation[0] which does explain what a relational database is for. It's not the main entry point to the docs, but yeah, sure, it's useful to put that somewhere!

0: https://dev.mysql.com/doc/refman/8.0/en/what-is-mysql.html

esafak · 2024-01-09T03:10:43.000000Z

If you don't know what the comparison product is either then you are not the target customer. This is a library for analyzing and transforming (mostly numerical) data in memory. Data scientists use it.

makapuf · 2024-01-09T06:29:40.000000Z

See also: Is it pokemon or big data https://pixelastic.github.io/pokemonorbigdata/

ryandrake · 2024-01-09T01:15:29.000000Z

I run into the same problem. I don't know what Pandas are (besides the bears) and at some point up the "it's like X" chain, I guess you have to stop and admit you're just not the target user of this tech product.

selcuka · 2024-01-09T02:47:26.000000Z

> I guess you have to stop and admit you're just not the target user of this tech product.

On the other hand, how can you become a target user if you don't know that a product category exists?

sanderjd · 2024-01-09T04:31:06.000000Z

This project is a solution to a particular kind of problem. The way you become a target user of that solution is by first having the problem it's a solution to.

If you have the problem "I want to analyze a bunch of tabular data", you'll start researching and asking around about it, and you'll quickly discover a few things: 1. people do this with (usually columnar / "OLAP") sql query interfaces, 2. people usually end up augmenting that with some in memory analyses in a general purpose programming environment, 3. people often choose R or python for this, 4. both of those languages lean heavily on a concept they both call "data frames", 5. in python, this is most commonly done using the pandas library, which is pervasive in the python data science / engineering world.

Once you've gotten to that point, you'll be primed for new solutions to the new problems you now have, one of which is that pandas is old and pretty creaky and does some things in awkward and suboptimal ways that can be greatly improved upon with new iterations of the concept, like polars.

But if you don't have these problems, then the solution won't make much sense.

esafak · 2024-01-09T03:08:49.000000Z

That's on you. If you want to become a data engineer and data scientist -- the two software positions most likely to use polars -- get learning. Or don't: learn it when you need it.