NumPy 1.20

calhoun137 · on Jan 31, 2021

A really great way to learn more about numpy is with Math Inspector[1]. It creates a block coding environment which works with the entire numpy/scipy stack, it also has an interactive 2d and 3d plotting library that updates the functionality in mathplotlib. Also, I made it =) Just spent the past 3 months working day and night and released a massive update a few days ago. Check out the youtube video on the bottom of the page for more in depth information.

[1] https://mathinspector.com/

ksm1717 · on Jan 31, 2021

Fantastic. Jupyter is a handicap disguised as dataman’s best friend. This is actually intuitive

calhoun137 · on Jan 31, 2021

Thank you!!! I worked so hard on this project, and really appreciate the kind words

Scene_Cast2 · on Jan 31, 2021

Could you elaborate more on Jupyter? I use it almost daily, and despite its warts (such as lack of source control and a lack of explicit dependencies), it's pretty darn useful.

ksm1717 · on Jan 31, 2021

No doubt useful, I just find that it’s failings are whisked away as “you don’t need to worry about source control because it won’t work!” It prevents users from learning things that would make their lives easier because it can’t do those things, it gets them stuck at a local optimum. It’s great for what it’s great for but as soon as you leave its usefulness domain jupyter becomes a burden to integrate even in a simple way.

411111111111111 · on Jan 31, 2021

There are ide plugins to use the jupyter kernels thought. Atoms hydrogen[1] or vscodes Python Support [2] comes to mind. Jetbrain IDEs have similar plugins.

They enable pretty much both package managers and scm without any overhead to speak of.

> 1: https://blog.nteract.io/hydrogen-interactive-computing-in-at...

> 2: https://devblogs.microsoft.com/python/data-science-with-pyth...

dunefox · on Jan 31, 2021

There's Pluto for Julia: https://github.com/fonsp/Pluto.jl

throwaway10102 · on Jan 31, 2021

You can use it with source control, I do it for about 18 notebooks I use on a daily basis:

https://github.com/kynan/nbstripout

krastanov · on Jan 31, 2021

This is quite amazing! I might use it when I give lectures.

How abstracted is the interface? Will there be an API that let's me plug in another language, for instance? I would be incredibly surprised if that is already the case, but I thought it would not hurt to ask ;)

calhoun137 · on Jan 31, 2021

Python is very well suited for being mapped to a block coding environment because it has such a large number of helper functions which enumerate all the things each module/object/function can do.

I have considered doing a similar project for javascript, but I had assumed it would require an entire new codebase for each programming language given how different every language is under the hood.

jarmitage · on Jan 31, 2021

Great project - you're on the front page: https://news.ycombinator.com/item?id=25978962

calhoun137 · on Jan 31, 2021

getting on the front page like this was quite unexpected and has completely made my day =) =) =)

Enginerrrd · on Jan 31, 2021

Dude, this is a GREAT concept!

There's always been that extra little step of going from working code to pretty graphs/animations that acts as a barrier to entry to where I almost never do that part. It's not that it's hard, it's just always a PITA with a bunch of required boilerplate and always requires that I dig through the API.

This streamlines that process in a way I didn't think would be possible.

That said.... maybe I'm doing something really dumb, but it's basically unusable for me right now on windows since every time I type the letter "d" it just highlights the line in the interpreter instead of typing the letter. I see this issue has been mentioned already in github though.

I'd love to keep playing around with this some more. I'm also SUUUUUPER curious about doing some GIS integrations using the qgis python libraries.

calhoun137 · on Jan 31, 2021

oh whoops! I saw that error earlier and forgot to fix it. The idea was to replicate sublime text's feature for highlighting words with ctrl+d, but there is some kind of error for detecting when ctrl is being used as a modifier key in windows. I'm going to fix that later today and upload a new build. Thanks for the heads up

EDIT: problem should be fixed now. I have just uploaded a new windows build to the website and pushed the changes to master on GitHub. Tested it on my windows machine and it looks like the fix is working properly. Re-downloading the installer file and installing again should resolve the issue.

gugagore · on Jan 31, 2021

I really want more interfaces like this, with diagram representations of source code, and convenient and interactive plotting. Thank you!

Can you speak to why a scatter plot and a line plot are distinguished using `(xs, ys, zs)` and `[xs, ys, zs]`?

calhoun137 · on Jan 31, 2021

Great question! The plotting library is it's own stand alone module, and there is a `plot` function which accepts an arbitrary number of arguments. This is how the app is able to plot multiple graphs at the same time.

Using tuple/list for points/lines makes it easy to plot points and lines at the same time, with a single function call, without any nasty keyword arguments.

For example

>>> plot((0,0), (1,1), [(0,0),(1,1)])

The other reason was it makes the code a lot nicer in some places. I tried using a keyword argument at first and there was a lot of additional logic to keep track of.

cultofmetatron · on Jan 31, 2021

wow this is fantastic. not just the product itself but the website. I can tell you put a lot of love into this project and I'm downloading it as I type this.

sandGorgon · on Jan 31, 2021

No love for linux? :(

calhoun137 · on Jan 31, 2021

There is! I didn't build and codesign the linux version yet (just ran out of steam and needed a few days of rest), but I set up everything in a VM and got it working on my mac through x11.

If you follow the instructions in the install.sh[1] script (which is for doing VM testing with Vagrant), it should hopefully be simple to install from the source code on your system.

https://github.com/MathInspector/MathInspector/blob/master/i...

michaericalribo · on Jan 31, 2021

NumPy is incredible to me: it serves not only as a critical bare-metal layer, but also an essential tool in its own right, and I can’t imagine using Python without it. I do numerical work—ML/data science/statistics—and I simply couldn’t accomplish any of what I do without the functionality provided by NumPy. np.array alone is worth all the king’s gold.

It is interesting to me that in many respects it serves as “guts”. I definitely drop down to use numpy directly with regularity. But it’s also possible to do 80-90% of the job without ever explicitly using the module itself. It’s baked in everywhere, to the point that it feels like just another standard library.

Exciting to see progress! Keep up the good work, numpy team :)

gspr · on Jan 31, 2021

I agree with everything you say, but I don't understand this part:

> it serves not only as a critical bare-metal layer

What's bare-metal about NumPy??

michaericalribo · on Jan 31, 2021

Heh, I’m probably misusing that term—forgive me, I’m just a script kiddy! :D

I meant that it can serve the role of low-level, nitty gritty machinery, entirely separate from its use case as a module unto its own right.

Eg, pandas (in some sense) is just a convenience layer on top of numpy—but to me that’s like saying any piece of software is “just a convenience layer on top of python.” It’s partly a means to an end! Not just an end unto itself.

Numpy enables a whole new class of functionality, independent of its direct use as part of my software. Is there a better word for that?

m3at · on Jan 31, 2021

I think you sum it up nicely! The better word you're looking for might be cornerstone?

moreati · on Jan 31, 2021

perhaps foundation, or base layer.

michaericalribo · on Jan 31, 2021

Sidebar: I originally used the word “gunmetal” and that...was even less correct.

Get this guy a dictionary!

sandGorgon · on Jan 31, 2021

Curious question - do you not use dataframes instead ? Is that not much nicer as a UX ?

michaericalribo · on Jan 31, 2021

Yeah! That’s what I mean...under the hood data frames use numpy arrays, but the abstractions of pandas are much easier to work with.

chalst · on Jan 31, 2021

Quick summary: "This NumPy release is the largest so made to date, some 684 PRs contributed by 184 people have been merged."

   > Annotations for NumPy functions. This work is ongoing and improvements can be expected pending feedback from users.

   > Wider use of SIMD to increase execution speed of ufuncs. Much work has been done in introducing universal functions that will ease use of modern features across different hardware platforms. This work is ongoing.

    >Preliminary work in changing the dtype and casting implementations in order to provide an easier path to extending dtypes. This work is ongoing but enough has been done to allow experimentation and feedback.

  >  Extensive documentation improvements comprising some 185 PR merges. This work is ongoing and part of the larger project to improve NumPy’s online presence and usefulness to new users.

   > Further cleanups related to removing Python 2.7. This improves code readability and removes technical debt.

    > Preliminary support for the upcoming Cython 3.0.

Type annotations seem the biggest deal to me. I'd say if you care a lot about SIMD and the performance issues, you should be thinking of moving to Julia: it's still a valuable technical achievement.

notagoodidea · on Jan 31, 2021

I would rephrase your statement in : "If you care about SIMD, performance issues and type annotations, you should look into Julia".

Numpy is an incredible piece of software and provides performance for one of the most mainstream language. It has been one of the main building block in the python takeover in data science, ml, etc. But if I had the choice, I would have move to Julia during my precedent work/projects as soon as it reached v1.

chalst · on Jan 31, 2021

The type annotation story is indeed better with Julia, but having type annotations for NumPy is beneficial for many users for whom Julia isn't a win, where number crunching isn't the main thing going on and Python's better library situation is important and you want to avoid the complication of calling Python from Julia.

gugagore · on Jan 31, 2021

We should be careful to appreciate the “types” in Julia for what they are at their essence: a way to direct dispatch of methods. Certainly the Julia compiler reasons about types in order to generate efficient code. But Julia types also affect the meaning of programs, not just the performance. In fact, the presence or absence of “type annotations” on arguments in a method definition don’t affect the code that that method generates. It simply affects whether that method gets dispatched to or not for a given function call.

I think it’s helpful to consider that this particular use of types is morally different from when type annotations are used to 1. Document intent, 2. Run programs faster, 3. Reason about correctness statically.

ZuLuuuuuu · on Jan 31, 2021

"sliding_window_view" method will be a great addition. Sliding windows are used quite frequently when analysing data and currently you can kind of achieve it using "as_strided" method but that method is very cumbersome to use. But from the examples given, "sliding_window_view" is much easier to use.

isoprophlex · on Jan 31, 2021

Yes! This is a big win for me!

Basically, across projects, I've been reusing a snippet that uses some as_strided magic for years now. The snippet looks seriously deranged, it will be great to refer to something built in... also for my colleagues who now have to understand my as_strided shits.

jphoward · on Jan 31, 2021

I _love_ numpy, and I am getting excited about jax, too.

However, I do have one request for it. Getting the argmax of a multi-dimensional array, in terms of the array's dimensions, is difficult for new users.

np.argmax(np.array([[1,2,3],[1,9,3],[1,2,3]])) is 4, rather than (1,1). I understand why, but it seems strange to me that argmax cannot return a value the user can use to index their array.

Having to then feed that `4` into unravel_index() with the array's shape as a parameter seems less elegant than say passing a parameter of "as_index=True" to the argmax.

kakadzhun · on Feb 12, 2021

Consider this:

  In [1]: np.argmax(np.array([[1,2,3],[1,9,3],[1,2,3]]).flat)
  Out[2]: 4

montebicyclelo · on Jan 31, 2021

Alternatively you could use flat:

a = np.array([[1,2,3],[1,9,3],[1,2,3]])

idx = np.argmax(a)

a.flat[idx] # 9

6gvONxR4sf7o · on Jan 31, 2021

Does that work the same way with strided arrays?

montebicyclelo · on Jan 31, 2021

Assuming you mean what I think you mean, it does work.

e.g. a[::2, ::3].flat[idx], where idx is from 0 to width*height of the view

(idx can also be a NumPy array, for getting multiple values)

u678u · on Jan 31, 2021

NumPy/Pandas is one of the reasons I'm stuck on Python. I'd prefer a strongly typed language like go/java/rust but there isn't the same library or community. Any recommendations? Its for applications not just DS/ML so Julia isn't really in scope.

optimalsolver · on Jan 31, 2021

Python is strongly typed. Did you mean statically typed?

You can use optional type-hinting in Python now, and install MyPy as a type-checker.

throwaway10102 · on Jan 31, 2021

Latest Python (> 3.7?) typing combined with mypy --strict is the best of both worlds. Highly suggest you try it out.

eigenspace · on Jan 31, 2021

Julia is not just a datascience and ML language. It's a very nice general purpose programming language.

throwaway9930 · on Jan 31, 2021

If you know Python, then trying NumPy is pain.

Because for some reason data scientists have managed to misuse the python syntax. E.g. `np.ogrid[ -200000000:200000000:100j,-500000:500000:100j]` it's completely unexplainable what this does. They have managed to overload index/slice operator and imaginary numbers to produce two arrays.

(Example taken from here https://asecuritysite.com/comms/plot06 )

jjgreen · on Jan 31, 2021

I've been getting some odd warnings (thousands of them) from CI runs against 1.20, typical:

    some-file.py:83: DeprecationWarning: `np.bool` is a
    deprecated alias for the builtin `bool`. To silence 
    this warning, use `bool` by itself. Doing this will 
    not modify any behavior and is safe. If you specifically
    wanted the numpy scalar type, use `np.bool_` here.
    Deprecated in NumPy 1.20; for more details and guidance: 
    https://numpy.org/devdocs/release/1.20.0/notes.html#deprecations

    return self.z[j][i]

But all of the lines mentioned in these warning do not reference np.bool

Lev1a · on Jan 31, 2021

I haven't seriously used Python for quite a while so I don't know if this is possible/usual, maybe there's some kind of code generation happening at runtime that would throw off line numbers or added/removed/substituted lines compared to the on-disk script files?

jjgreen · on Jan 31, 2021

I don't actually use "np.bool" anywhere in the project :-|

Lev1a · on Jan 31, 2021

That's what I meant by code generation, that maybe NumPy at runtime add/removes/substitutes some code to optimize dynamically, although I don't know if that's a thing in Python/NumPy.

ssl232 · on Jan 31, 2021

Are you using libraries that use `np.bool`? These could trigger warnings to stderr that you would see.

jjgreen · on Feb 1, 2021

That was the issue: https://github.com/numpy/numpy/issues/18281

user2049 · on Jan 31, 2021

It is great to see type annotations! A huge step in the right direction.

enriquto · on Jan 31, 2021

we are slowly approaching the ease of use and efficiency of fortran. In a few decades more we'll be there!

klyrs · on Jan 31, 2021

Modern fortran has classes, too! I've occasionally joked that fortran 25 and python 4 are to be the same language...

teddyh · on Feb 1, 2021

“I don't know what the language of the [future] will look like, but I know it will be called Fortran.”

— Tony Hoare, 1982

gugagore · on Jan 31, 2021

I assume you say it at least a bit in jest, and with a good deal of sincerity.

Did you ever know an interactive environment for producing Fortran code? Anything like a REPL?

cycomanic · on Jan 31, 2021

In development, but there is lfortran: https://lfortran.org/

jabl · on Jan 31, 2021

The next step after type annotations will be having the type specified by the first character in the variable name.

mrcactu5 · on Feb 1, 2021

There's a random shuffle function. Instead of writing one out using an algorithms textbook. Lots of data types get randomly shuffled during coding.

Shuffles arrays and lists. https://numpy.org/doc/stable/reference/random/generated/nump...

sandGorgon · on Jan 31, 2021

The most interesting question here is the newest competitor to Numpy - Tensorflow - https://www.tensorflow.org/guide/tf_numpy

With the added advantage that TF is natively accelerated on M1.

Will TF-Numpy maintain parity ?

qoqosz · on Jan 31, 2021

TF-Numpy is not a competitor to NumPy. It just introduces a small subset of NP API to TF codebase. In TF you still have immutable tensors, so e.g. sth like: tensor[mask] = new_value doesn't work.

RocketSyntax · on Jan 31, 2021

Hmm. I am checking dtypes via `np.floating` and `np.signedinteger`. What will this change to?