> What Pandas does is notoriously hard to fit into a compile-time type system. Certainly too hard to go into the brains of scientists who didn't grow up coding.
I'm not sure if that's true. Doesn't Pandas handle ETL and some anaysis? There is nothing inherent to ETL that makes it a hard problem with compiled languages.
In your opinion, what does Pandas do that's hard to do with compile-time languages?
I didn't say "with compile-time languages" but "with compile-time type systems". And many similar tools in a statically typed language will necessarily create a way to have one static type that doesn't care what the data inside actually looks like.
This even starts with basic Numpy and handling tensor objects. It's not easy for a type checker to understand what operations you can do with what shape of tensor. Worse, most often you don't know (or want to know) some of the dimensions or even dimensionality of some of the objects. Then it is impossible to check all of this at compile time.
> This even starts with basic Numpy and handling tensor objects. It's not easy for a type checker to understand what operations you can do with what shape of tensor.
That doesn't sound like a Python problem.
Instead, it sounds like the natural consequence of numpy being designed in a way where their data types aren't organized into subtypes, and leave that as runtime properties. This is a natural reflection of numpy's take on vectors, matrices, and tensors, which in terms of types are just big arrays with runtime properties.
To put things in perspective, in C++, Eigen supports static dense vectors and matrices whose size is specified and known at compile-time. I'm sure Python doesn't impose addition static type constraints than C++.
Of course it's not a Python problem, all similar tools have the same "problem" that they can't easily fit that stuff into their type systems, so they invent some way to not care about it.
It isn't a matter of "compile time": explicit type declarations and definitions can often be formally sound but practically worthless.
Significant types in ETL-style applications typically come from outside (e.g. a certain CSV column in the input file contains a date in YYYYMMDD format, or maybe YYYYDDMM, figure it out, and don't forget time zones or your accounting will go wrong).
Then types are mostly complex but obvious and easily deducted (e.g. multiplying matrices of compatible shapes necessarily gives a matrix of a certain shape, why should the program say anything more detailed or lower-level than "do a matrix multiplication" or "do a tensor product"?); they are an often dynamic and unpredictable property of the data, not a useful abstraction.
The source code shouldn't need to say anything about the type of the resulting matrix explicitly, perhaps. But why shouldn't the type system keep track of shapes and deduce the accurate type for the result of said multiplication?
I'm not sure if that's true. Doesn't Pandas handle ETL and some anaysis? There is nothing inherent to ETL that makes it a hard problem with compiled languages.
In your opinion, what does Pandas do that's hard to do with compile-time languages?