The premise is that Python's type hints don't provide enough information about things like numpy arrays. Fixing that correctly is hard because things that matter in that sort of code include facts which could hypothetically be encoded in a type system:
1. What's the datatype of the array elements
2. Does this array alias other memory
3. Is the access pattern I want to do contiguous in memory
4. If you track the provenance of an array, does it include something like a "width" dimension and a "height" dimension
5. How many dimensions are there
6. What's a good semantic description (type) for each dimension
7. As an exact integer (or modulo some power of 2 or whatever), how big is each dimension
And on and on and on. An honest-to-goodness type hint capturing that sort of crap in a way that's statically analyzable is a nightmare, and it wouldn't be totally trivial to even write the code to make a type hint like that reasonable to read and write. Even if you could, it'd probably generate a lot of noise that for any particular use of an array would distract you from the aspects you care about.
A nice hybrid solution IMO is found in that Python allows arbitrary objects to be used as type hints, and a string description of the aspects you're using/providing on a particular array works as decent documentation for other developers. For a few examples:
1. The array should describe a typical 24-bit 3-channel image. You might use a type hint like 'u8:(w,h,3)' to indicate that it's a 0-255 integer field rather than a 0-1 float field, which dimensions have width/height/channels, and that it's a 3-channel image. It'd probably be good to also label those channels with a convention like 'u8:(w,h,(rgb))', like 'u8:(w,h,3):rgb', with hungarian typing, or something (no particular recommendations on my end since I'm not usually working with heterogeneous data like that, but choosing the wrong encoding or even the wrong coordinate space for RGB or whatever is a big deal, so you'd probably want to represent that somehow).
2. You have a function signature with multiple inputs, and the computation is mostly arbitrary, but it's important some dimensions align. Then label them the same. Something like matmul(left: '(n, d)', right: '(d, k)') -> '(n, k)'.
3. You're doing some ML thing on some time-series medical data, and it's common t' have giant dense tensors floating around. Label semantically what all the dimensions are with a type like `(batch, r, a, s, channel, t)', or using longer names as appropriate depending on your audience and the background knowledge you can assume.
Libraries like einops and functions like `np.einsum` take that a step further and require stringified descriptions of the operation you're trying to do. They can have a learning curve, but the crux of the idea is that instead of writing garbage like `arr[3,6,-4:,np.newaxis,...].T.reshape(4 n, -1)` or God-forbid some sort of roll/transpose logic, you have a higher-level description.
A couple examples with einsum:
1. The dot product of v and w is `np.dot(v, w)` or `np.sum(v w) # imagine there's an asterisk; HN's parser is smarter than me`, and it's also `np.einsum('d,d', v, w)`. Arguably einsum is a bit of syntactic noise for such a simple example, but if v and w have different shapes than you think then the simpler solutions will silently produce garbage (e.g., the first option will do matrix multiplications sometimes, and the second is arguably closer to correct most of the time, but if you think you're operating on 1D objects and actually do want a channeled operation like matrix multiplication when the input isn't 1D then the sum of products is wrong and not captured in the type system), but einsum will just barf if the stated dimensions don't match your expectations. Moreover, with optimize=True it'll actually fall back to whichever of the simpler solutions is fastest.
2. Imagine you have a matrix A of shape (n, n) and a matrix X of shape (n, d) and want to compute something like A @ v @ A.T for each column v of X. You can write it via standard numpy operators, but it looks like garbage and kind of hides what's actually happening. The einsum solution is just `np.einsum('vw,wd,nv->nd', A, X, A)`. You're contracting over `v` and `w` and left with `n` and `d`. It's not perfect since you just get single-letter names to work with, but it's a hell of a lot better than equivalent options, and much easier to make suitably fast (just pass optimize=True).
And then einops is even better because roll/transpose logic is incredibly fiddly and prone to off-by-one errors in your choice of dimension or needing to deeply understand how the function works to not make footgun-style mistakes. An API like `swapaxes(arr, 'batch', 'time')` is 10x easier to use than `swapaxes(arr, 0, 5)` -- like, imagine somebody adding an extra dimension in a world where positions are absolutely referenced and where if you get it wrong the program will still run and produce interesting-looking garbage because the definitions of `np.dot` and everything else in the library depend on the shape of the inputs.
1. What's the datatype of the array elements
2. Does this array alias other memory
3. Is the access pattern I want to do contiguous in memory
4. If you track the provenance of an array, does it include something like a "width" dimension and a "height" dimension
5. How many dimensions are there
6. What's a good semantic description (type) for each dimension
7. As an exact integer (or modulo some power of 2 or whatever), how big is each dimension
And on and on and on. An honest-to-goodness type hint capturing that sort of crap in a way that's statically analyzable is a nightmare, and it wouldn't be totally trivial to even write the code to make a type hint like that reasonable to read and write. Even if you could, it'd probably generate a lot of noise that for any particular use of an array would distract you from the aspects you care about.
A nice hybrid solution IMO is found in that Python allows arbitrary objects to be used as type hints, and a string description of the aspects you're using/providing on a particular array works as decent documentation for other developers. For a few examples:
1. The array should describe a typical 24-bit 3-channel image. You might use a type hint like 'u8:(w,h,3)' to indicate that it's a 0-255 integer field rather than a 0-1 float field, which dimensions have width/height/channels, and that it's a 3-channel image. It'd probably be good to also label those channels with a convention like 'u8:(w,h,(rgb))', like 'u8:(w,h,3):rgb', with hungarian typing, or something (no particular recommendations on my end since I'm not usually working with heterogeneous data like that, but choosing the wrong encoding or even the wrong coordinate space for RGB or whatever is a big deal, so you'd probably want to represent that somehow).
2. You have a function signature with multiple inputs, and the computation is mostly arbitrary, but it's important some dimensions align. Then label them the same. Something like matmul(left: '(n, d)', right: '(d, k)') -> '(n, k)'.
3. You're doing some ML thing on some time-series medical data, and it's common t' have giant dense tensors floating around. Label semantically what all the dimensions are with a type like `(batch, r, a, s, channel, t)', or using longer names as appropriate depending on your audience and the background knowledge you can assume.
Libraries like einops and functions like `np.einsum` take that a step further and require stringified descriptions of the operation you're trying to do. They can have a learning curve, but the crux of the idea is that instead of writing garbage like `arr[3,6,-4:,np.newaxis,...].T.reshape(4 n, -1)` or God-forbid some sort of roll/transpose logic, you have a higher-level description.
A couple examples with einsum:
1. The dot product of v and w is `np.dot(v, w)` or `np.sum(v w) # imagine there's an asterisk; HN's parser is smarter than me`, and it's also `np.einsum('d,d', v, w)`. Arguably einsum is a bit of syntactic noise for such a simple example, but if v and w have different shapes than you think then the simpler solutions will silently produce garbage (e.g., the first option will do matrix multiplications sometimes, and the second is arguably closer to correct most of the time, but if you think you're operating on 1D objects and actually do want a channeled operation like matrix multiplication when the input isn't 1D then the sum of products is wrong and not captured in the type system), but einsum will just barf if the stated dimensions don't match your expectations. Moreover, with optimize=True it'll actually fall back to whichever of the simpler solutions is fastest.
2. Imagine you have a matrix A of shape (n, n) and a matrix X of shape (n, d) and want to compute something like A @ v @ A.T for each column v of X. You can write it via standard numpy operators, but it looks like garbage and kind of hides what's actually happening. The einsum solution is just `np.einsum('vw,wd,nv->nd', A, X, A)`. You're contracting over `v` and `w` and left with `n` and `d`. It's not perfect since you just get single-letter names to work with, but it's a hell of a lot better than equivalent options, and much easier to make suitably fast (just pass optimize=True).
And then einops is even better because roll/transpose logic is incredibly fiddly and prone to off-by-one errors in your choice of dimension or needing to deeply understand how the function works to not make footgun-style mistakes. An API like `swapaxes(arr, 'batch', 'time')` is 10x easier to use than `swapaxes(arr, 0, 5)` -- like, imagine somebody adding an extra dimension in a world where positions are absolutely referenced and where if you get it wrong the program will still run and produce interesting-looking garbage because the definitions of `np.dot` and everything else in the library depend on the shape of the inputs.