> Pandas does is notoriously hard to fit into a compile-time type system What, m...

qsort · on April 22, 2022

Not hard per se, but extremely unergonomic.

A pandas dataframe types to something like Iterable[SomeKindOfProductType[int, str, str, ... (78 other columns)]]. The formal type of a dataframe in the middle of a transformation is... not very useful to know.

smallerfish · on April 22, 2022

You wouldn't typically type tabular data in any language. Give it (at most) a DataFrame<StoreTransaction> type if you must, where StoreTransaction describes the structure of a row - maybe declaring only columns that model generation code was doing typed operations on (e.g. numerics vs strings) to avoid the need for reflection.

bayesian_horse · on April 22, 2022

Either you type the tabular data at compile time or you don't get type checking of tabular data at compile time.

The number and types of the columns aren't necessarily known at compile time. Which leads to runtime errors. Even in a statically typed language, such dataframes are a kind of "dynamic typing escape hatch". As complexity of a software increases, such mechanismus of dynamic typing and throwing runtime errors creep up all over the place.

smallerfish · on April 22, 2022

Sure. If you're dealing with untyped data at runtime you can't type it at compile time. Not a new issue, and handled all the time in otherwise typed languages.

int_19h · on April 23, 2022

It's not very useful to type, but that's why we have type inference.

It's certainly very useful to know, even if that knowledge is indirect via the type of the resulting dataframe after all the transforms.

bayesian_horse · on April 23, 2022

From dabbling in F# I have a feeling such "information" would be somewhat more annoying than useful.