There's also pandas_exercises by Guilherme Samora (https://github.com/guipsamora/pandas_exercises) which is very good - it's split across multiple notebooks and is more extensive than my repo.
One thing that is holding me back in numpy is not knowing the runtime complexity of operations—of course I can profile code, but I should have better awareness when writing code in the first place. Without an algorithms background, I don't have strong intuitions on the runtime complexity of the primitives (np.unique). Any suggestions?
If one is working on small (<= 15 by 15) matrices, the StaticArrays module [1] is also native Julia and is much faster than Base.Array. Since a StaticArray knows its own size after type inference, they are allocated on the stack, which is nice.
One downside is that unless you're doing BLAS-style operations, writing non-trivial transformations of StaticArrays always seems to require generated functions.
Anyway, I think this is a feature that numpy doesn't provide.
That only works for functions written in pure Python though, right? Although having said that I'm not sure how many of the functions that you'd actually want to look at the source for are written in C/Cython/Fortran.
What other library tells you about complexity? And as you tell, if you don't know algorithns well, I'm pretty sure your implementations won't have better complexity.
C++'s standard library containers & algorithms have strict algorithmic complexity requirements & guarantees.
For example from std::vector::insert [1]:
Complexity
1-2) Constant plus linear in the distance between pos and end of the container.
3) Linear in count plus linear in the distance between pos and end of the container.
4) Linear in std::distance(first, last) plus linear in the distance between pos and end of the container.
5) Linear in ilist.size() plus linear in the distance between pos and end of the container.
Well, often there are multiple ways of using numpy operations to do what you want, so it's good to have an idea of what numpy is doing under the hood so you can use the right functionality for the job at hand.
For example, np.einsum for all its greatness in the past wasn't faster than np.tensordot, but it was more flexible. One can tell einsum to try and use the same underlying BLAS functions that tensordot uses (which can parallelise the computation) if applicable, and it will likely be default for einsum to perform this optimisation automatically once the devs iron out some bugs. But for now, it pays to know how the two methods are different.
Not odd at all, actually. Numpy might implement certain functions differently from other libraries. With a background in algorithms, you could make an educated guess as to complexity, but without knowing the exact implementation it's still a guess.
For #15, if the number of elements is large, the speed will be slower than we expected, since maxx function is writren in pure python.
But in my experience, it is much faster than for loop in pure python.
oh wow I wish I knew about r_ and c_ a few months ago! I'm still annoyed with numpy for being more clunky than Matlab for linear algebra, but resources like this are good for verifying that I'm doing stuff in a numpy-ic way. Thanks!
(Also numpy has some really nice features over Matlab, like [None,:] broadcasting and being able to index a parenthesized expression or function output without naming it. Ok, the latter is not really a feature, more of an example of how Matlab is broken as a language)
I've found the following to be quite helpful but would love to know if anyone knows of other resources in a similar vein: https://pandas.pydata.org/pandas-docs/stable/cookbook.html