> What Pandas does is notoriously hard to fit into a compile-time type system. C...

bayesian_horse · on April 22, 2022

I didn't say "with compile-time languages" but "with compile-time type systems". And many similar tools in a statically typed language will necessarily create a way to have one static type that doesn't care what the data inside actually looks like.

This even starts with basic Numpy and handling tensor objects. It's not easy for a type checker to understand what operations you can do with what shape of tensor. Worse, most often you don't know (or want to know) some of the dimensions or even dimensionality of some of the objects. Then it is impossible to check all of this at compile time.

arinlen · on April 22, 2022

> This even starts with basic Numpy and handling tensor objects. It's not easy for a type checker to understand what operations you can do with what shape of tensor.

That doesn't sound like a Python problem.

Instead, it sounds like the natural consequence of numpy being designed in a way where their data types aren't organized into subtypes, and leave that as runtime properties. This is a natural reflection of numpy's take on vectors, matrices, and tensors, which in terms of types are just big arrays with runtime properties.

To put things in perspective, in C++, Eigen supports static dense vectors and matrices whose size is specified and known at compile-time. I'm sure Python doesn't impose addition static type constraints than C++.

bayesian_horse · on April 23, 2022

Of course it's not a Python problem, all similar tools have the same "problem" that they can't easily fit that stuff into their type systems, so they invent some way to not care about it.

HelloNurse · on April 22, 2022

It isn't a matter of "compile time": explicit type declarations and definitions can often be formally sound but practically worthless.

Significant types in ETL-style applications typically come from outside (e.g. a certain CSV column in the input file contains a date in YYYYMMDD format, or maybe YYYYDDMM, figure it out, and don't forget time zones or your accounting will go wrong).

Then types are mostly complex but obvious and easily deducted (e.g. multiplying matrices of compatible shapes necessarily gives a matrix of a certain shape, why should the program say anything more detailed or lower-level than "do a matrix multiplication" or "do a tensor product"?); they are an often dynamic and unpredictable property of the data, not a useful abstraction.

int_19h · on April 23, 2022

The source code shouldn't need to say anything about the type of the resulting matrix explicitly, perhaps. But why shouldn't the type system keep track of shapes and deduce the accurate type for the result of said multiplication?

bayesian_horse · on April 23, 2022

Because the shape can be dynamic, for example.