A really great way to learn more about numpy is with Math Inspector[1]. It creates a block coding environment which works with the entire numpy/scipy stack, it also has an interactive 2d and 3d plotting library that updates the functionality in mathplotlib. Also, I made it =) Just spent the past 3 months working day and night and released a massive update a few days ago. Check out the youtube video on the bottom of the page for more in depth information.
Could you elaborate more on Jupyter? I use it almost daily, and despite its warts (such as lack of source control and a lack of explicit dependencies), it's pretty darn useful.
No doubt useful, I just find that it’s failings are whisked away as “you don’t need to worry about source control because it won’t work!” It prevents users from learning things that would make their lives easier because it can’t do those things, it gets them stuck at a local optimum. It’s great for what it’s great for but as soon as you leave its usefulness domain jupyter becomes a burden to integrate even in a simple way.
There are ide plugins to use the jupyter kernels thought. Atoms hydrogen[1] or vscodes Python Support [2] comes to mind. Jetbrain IDEs have similar plugins.
They enable pretty much both package managers and scm without any overhead to speak of.
This is quite amazing! I might use it when I give lectures.
How abstracted is the interface? Will there be an API that let's me plug in another language, for instance? I would be incredibly surprised if that is already the case, but I thought it would not hurt to ask ;)
Python is very well suited for being mapped to a block coding environment because it has such a large number of helper functions which enumerate all the things each module/object/function can do.
I have considered doing a similar project for javascript, but I had assumed it would require an entire new codebase for each programming language given how different every language is under the hood.
There's always been that extra little step of going from working code to pretty graphs/animations that acts as a barrier to entry to where I almost never do that part. It's not that it's hard, it's just always a PITA with a bunch of required boilerplate and always requires that I dig through the API.
This streamlines that process in a way I didn't think would be possible.
That said.... maybe I'm doing something really dumb, but it's basically unusable for me right now on windows since every time I type the letter "d" it just highlights the line in the interpreter instead of typing the letter. I see this issue has been mentioned already in github though.
I'd love to keep playing around with this some more. I'm also SUUUUUPER curious about doing some GIS integrations using the qgis python libraries.
oh whoops! I saw that error earlier and forgot to fix it. The idea was to replicate sublime text's feature for highlighting words with ctrl+d, but there is some kind of error for detecting when ctrl is being used as a modifier key in windows. I'm going to fix that later today and upload a new build. Thanks for the heads up
EDIT: problem should be fixed now. I have just uploaded a new windows build to the website and pushed the changes to master on GitHub. Tested it on my windows machine and it looks like the fix is working properly. Re-downloading the installer file and installing again should resolve the issue.
Great question! The plotting library is it's own stand alone module, and there is a `plot` function which accepts an arbitrary number of arguments. This is how the app is able to plot multiple graphs at the same time.
Using tuple/list for points/lines makes it easy to plot points and lines at the same time, with a single function call, without any nasty keyword arguments.
For example
>>> plot((0,0), (1,1), [(0,0),(1,1)])
The other reason was it makes the code a lot nicer in some places. I tried using a keyword argument at first and there was a lot of additional logic to keep track of.
wow this is fantastic. not just the product itself but the website. I can tell you put a lot of love into this project and I'm downloading it as I type this.
There is! I didn't build and codesign the linux version yet (just ran out of steam and needed a few days of rest), but I set up everything in a VM and got it working on my mac through x11.
If you follow the instructions in the install.sh[1] script (which is for doing VM testing with Vagrant), it should hopefully be simple to install from the source code on your system.
NumPy is incredible to me: it serves not only as a critical bare-metal layer, but also an essential tool in its own right, and I can’t imagine using Python without it. I do numerical work—ML/data science/statistics—and I simply couldn’t accomplish any of what I do without the functionality provided by NumPy. np.array alone is worth all the king’s gold.
It is interesting to me that in many respects it serves as “guts”. I definitely drop down to use numpy directly with regularity. But it’s also possible to do 80-90% of the job without ever explicitly using the module itself. It’s baked in everywhere, to the point that it feels like just another standard library.
Exciting to see progress! Keep up the good work, numpy team :)
Heh, I’m probably misusing that term—forgive me, I’m just a script kiddy! :D
I meant that it can serve the role of low-level, nitty gritty machinery, entirely separate from its use case as a module unto its own right.
Eg, pandas (in some sense) is just a convenience layer on top of numpy—but to me that’s like saying any piece of software is “just a convenience layer on top of python.” It’s partly a means to an end! Not just an end unto itself.
Numpy enables a whole new class of functionality, independent of its direct use as part of my software. Is there a better word for that?
Quick summary: "This NumPy release is the largest so made to date, some 684 PRs contributed by 184 people have been merged."
> Annotations for NumPy functions. This work is ongoing and improvements can be expected pending feedback from users.
> Wider use of SIMD to increase execution speed of ufuncs. Much work has been done in introducing universal functions that will ease use of modern features across different hardware platforms. This work is ongoing.
>Preliminary work in changing the dtype and casting implementations in order to provide an easier path to extending dtypes. This work is ongoing but enough has been done to allow experimentation and feedback.
> Extensive documentation improvements comprising some 185 PR merges. This work is ongoing and part of the larger project to improve NumPy’s online presence and usefulness to new users.
> Further cleanups related to removing Python 2.7. This improves code readability and removes technical debt.
> Preliminary support for the upcoming Cython 3.0.
Type annotations seem the biggest deal to me. I'd say if you care a lot about SIMD and the performance issues, you should be thinking of moving to Julia: it's still a valuable technical achievement.
I would rephrase your statement in : "If you care about SIMD, performance issues and type annotations, you should look into Julia".
Numpy is an incredible piece of software and provides performance for one of the most mainstream language. It has been one of the main building block in the python takeover in data science, ml, etc. But if I had the choice, I would have move to Julia during my precedent work/projects as soon as it reached v1.
The type annotation story is indeed better with Julia, but having type annotations for NumPy is beneficial for many users for whom Julia isn't a win, where number crunching isn't the main thing going on and Python's better library situation is important and you want to avoid the complication of calling Python from Julia.
We should be careful to appreciate the “types” in Julia for what they are at their essence: a way to direct dispatch of methods. Certainly the Julia compiler reasons about types in order to generate efficient code. But Julia types also affect the meaning of programs, not just the performance. In fact, the presence or absence of “type annotations” on arguments in a method definition don’t affect the code that that method generates. It simply affects whether that method gets dispatched to or not for a given function call.
I think it’s helpful to consider that this particular use of types is morally different from when type annotations are used to 1. Document intent, 2. Run programs faster, 3. Reason about correctness statically.
"sliding_window_view" method will be a great addition. Sliding windows are used quite frequently when analysing data and currently you can kind of achieve it using "as_strided" method but that method is very cumbersome to use. But from the examples given, "sliding_window_view" is much easier to use.
Basically, across projects, I've been reusing a snippet that uses some as_strided magic for years now. The snippet looks seriously deranged, it will be great to refer to something built in... also for my colleagues who now have to understand my as_strided shits.
I _love_ numpy, and I am getting excited about jax, too.
However, I do have one request for it. Getting the argmax of a multi-dimensional array, in terms of the array's dimensions, is difficult for new users.
np.argmax(np.array([[1,2,3],[1,9,3],[1,2,3]])) is 4, rather than (1,1). I understand why, but it seems strange to me that argmax cannot return a value the user can use to index their array.
Having to then feed that `4` into unravel_index() with the array's shape as a parameter seems less elegant than say passing a parameter of "as_index=True" to the argmax.
NumPy/Pandas is one of the reasons I'm stuck on Python. I'd prefer a strongly typed language like go/java/rust but there isn't the same library or community. Any recommendations? Its for applications not just DS/ML so Julia isn't really in scope.
Because for some reason data scientists have managed to misuse the python syntax. E.g. `np.ogrid[ -200000000:200000000:100j,-500000:500000:100j]` it's completely unexplainable what this does. They have managed to overload index/slice operator and imaginary numbers to produce two arrays.
I've been getting some odd warnings (thousands of them) from CI runs against 1.20, typical:
some-file.py:83: DeprecationWarning: `np.bool` is a
deprecated alias for the builtin `bool`. To silence
this warning, use `bool` by itself. Doing this will
not modify any behavior and is safe. If you specifically
wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance:
https://numpy.org/devdocs/release/1.20.0/notes.html#deprecations
return self.z[j][i]
But all of the lines mentioned in these warning do not reference np.bool
I haven't seriously used Python for quite a while so I don't know if this is possible/usual, maybe there's some kind of code generation happening at runtime that would throw off line numbers or added/removed/substituted lines compared to the on-disk script files?
That's what I meant by code generation, that maybe NumPy at runtime add/removes/substitutes some code to optimize dynamically, although I don't know if that's a thing in Python/NumPy.
TF-Numpy is not a competitor to NumPy. It just introduces a small subset of NP API to TF codebase. In TF you still have immutable tensors, so e.g. sth like: tensor[mask] = new_value doesn't work.
[1] https://mathinspector.com/