Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
NumPy 1.20 (numpy.org)
168 points by heydenberk on Jan 31, 2021 | hide | past | favorite | 60 comments



A really great way to learn more about numpy is with Math Inspector[1]. It creates a block coding environment which works with the entire numpy/scipy stack, it also has an interactive 2d and 3d plotting library that updates the functionality in mathplotlib. Also, I made it =) Just spent the past 3 months working day and night and released a massive update a few days ago. Check out the youtube video on the bottom of the page for more in depth information.

[1] https://mathinspector.com/


Fantastic. Jupyter is a handicap disguised as dataman’s best friend. This is actually intuitive


Thank you!!! I worked so hard on this project, and really appreciate the kind words


Could you elaborate more on Jupyter? I use it almost daily, and despite its warts (such as lack of source control and a lack of explicit dependencies), it's pretty darn useful.


No doubt useful, I just find that it’s failings are whisked away as “you don’t need to worry about source control because it won’t work!” It prevents users from learning things that would make their lives easier because it can’t do those things, it gets them stuck at a local optimum. It’s great for what it’s great for but as soon as you leave its usefulness domain jupyter becomes a burden to integrate even in a simple way.


There are ide plugins to use the jupyter kernels thought. Atoms hydrogen[1] or vscodes Python Support [2] comes to mind. Jetbrain IDEs have similar plugins.

They enable pretty much both package managers and scm without any overhead to speak of.

> 1: https://blog.nteract.io/hydrogen-interactive-computing-in-at...

> 2: https://devblogs.microsoft.com/python/data-science-with-pyth...


There's Pluto for Julia: https://github.com/fonsp/Pluto.jl


You can use it with source control, I do it for about 18 notebooks I use on a daily basis:

https://github.com/kynan/nbstripout


This is quite amazing! I might use it when I give lectures.

How abstracted is the interface? Will there be an API that let's me plug in another language, for instance? I would be incredibly surprised if that is already the case, but I thought it would not hurt to ask ;)


Python is very well suited for being mapped to a block coding environment because it has such a large number of helper functions which enumerate all the things each module/object/function can do.

I have considered doing a similar project for javascript, but I had assumed it would require an entire new codebase for each programming language given how different every language is under the hood.


Great project - you're on the front page: https://news.ycombinator.com/item?id=25978962


getting on the front page like this was quite unexpected and has completely made my day =) =) =)


Dude, this is a GREAT concept!

There's always been that extra little step of going from working code to pretty graphs/animations that acts as a barrier to entry to where I almost never do that part. It's not that it's hard, it's just always a PITA with a bunch of required boilerplate and always requires that I dig through the API.

This streamlines that process in a way I didn't think would be possible.

That said.... maybe I'm doing something really dumb, but it's basically unusable for me right now on windows since every time I type the letter "d" it just highlights the line in the interpreter instead of typing the letter. I see this issue has been mentioned already in github though.

I'd love to keep playing around with this some more. I'm also SUUUUUPER curious about doing some GIS integrations using the qgis python libraries.


oh whoops! I saw that error earlier and forgot to fix it. The idea was to replicate sublime text's feature for highlighting words with ctrl+d, but there is some kind of error for detecting when ctrl is being used as a modifier key in windows. I'm going to fix that later today and upload a new build. Thanks for the heads up

EDIT: problem should be fixed now. I have just uploaded a new windows build to the website and pushed the changes to master on GitHub. Tested it on my windows machine and it looks like the fix is working properly. Re-downloading the installer file and installing again should resolve the issue.


I really want more interfaces like this, with diagram representations of source code, and convenient and interactive plotting. Thank you!

Can you speak to why a scatter plot and a line plot are distinguished using `(xs, ys, zs)` and `[xs, ys, zs]`?


Great question! The plotting library is it's own stand alone module, and there is a `plot` function which accepts an arbitrary number of arguments. This is how the app is able to plot multiple graphs at the same time.

Using tuple/list for points/lines makes it easy to plot points and lines at the same time, with a single function call, without any nasty keyword arguments.

For example

>>> plot((0,0), (1,1), [(0,0),(1,1)])

The other reason was it makes the code a lot nicer in some places. I tried using a keyword argument at first and there was a lot of additional logic to keep track of.


wow this is fantastic. not just the product itself but the website. I can tell you put a lot of love into this project and I'm downloading it as I type this.


No love for linux? :(


There is! I didn't build and codesign the linux version yet (just ran out of steam and needed a few days of rest), but I set up everything in a VM and got it working on my mac through x11.

If you follow the instructions in the install.sh[1] script (which is for doing VM testing with Vagrant), it should hopefully be simple to install from the source code on your system.

https://github.com/MathInspector/MathInspector/blob/master/i...


NumPy is incredible to me: it serves not only as a critical bare-metal layer, but also an essential tool in its own right, and I can’t imagine using Python without it. I do numerical work—ML/data science/statistics—and I simply couldn’t accomplish any of what I do without the functionality provided by NumPy. np.array alone is worth all the king’s gold.

It is interesting to me that in many respects it serves as “guts”. I definitely drop down to use numpy directly with regularity. But it’s also possible to do 80-90% of the job without ever explicitly using the module itself. It’s baked in everywhere, to the point that it feels like just another standard library.

Exciting to see progress! Keep up the good work, numpy team :)


I agree with everything you say, but I don't understand this part:

> it serves not only as a critical bare-metal layer

What's bare-metal about NumPy??


Heh, I’m probably misusing that term—forgive me, I’m just a script kiddy! :D

I meant that it can serve the role of low-level, nitty gritty machinery, entirely separate from its use case as a module unto its own right.

Eg, pandas (in some sense) is just a convenience layer on top of numpy—but to me that’s like saying any piece of software is “just a convenience layer on top of python.” It’s partly a means to an end! Not just an end unto itself.

Numpy enables a whole new class of functionality, independent of its direct use as part of my software. Is there a better word for that?


I think you sum it up nicely! The better word you're looking for might be cornerstone?


perhaps foundation, or base layer.


Sidebar: I originally used the word “gunmetal” and that...was even less correct.

Get this guy a dictionary!


Curious question - do you not use dataframes instead ? Is that not much nicer as a UX ?


Yeah! That’s what I mean...under the hood data frames use numpy arrays, but the abstractions of pandas are much easier to work with.


Quick summary: "This NumPy release is the largest so made to date, some 684 PRs contributed by 184 people have been merged."

   > Annotations for NumPy functions. This work is ongoing and improvements can be expected pending feedback from users.

   > Wider use of SIMD to increase execution speed of ufuncs. Much work has been done in introducing universal functions that will ease use of modern features across different hardware platforms. This work is ongoing.

    >Preliminary work in changing the dtype and casting implementations in order to provide an easier path to extending dtypes. This work is ongoing but enough has been done to allow experimentation and feedback.

  >  Extensive documentation improvements comprising some 185 PR merges. This work is ongoing and part of the larger project to improve NumPy’s online presence and usefulness to new users.

   > Further cleanups related to removing Python 2.7. This improves code readability and removes technical debt.

    > Preliminary support for the upcoming Cython 3.0.
Type annotations seem the biggest deal to me. I'd say if you care a lot about SIMD and the performance issues, you should be thinking of moving to Julia: it's still a valuable technical achievement.


I would rephrase your statement in : "If you care about SIMD, performance issues and type annotations, you should look into Julia".

Numpy is an incredible piece of software and provides performance for one of the most mainstream language. It has been one of the main building block in the python takeover in data science, ml, etc. But if I had the choice, I would have move to Julia during my precedent work/projects as soon as it reached v1.


The type annotation story is indeed better with Julia, but having type annotations for NumPy is beneficial for many users for whom Julia isn't a win, where number crunching isn't the main thing going on and Python's better library situation is important and you want to avoid the complication of calling Python from Julia.


We should be careful to appreciate the “types” in Julia for what they are at their essence: a way to direct dispatch of methods. Certainly the Julia compiler reasons about types in order to generate efficient code. But Julia types also affect the meaning of programs, not just the performance. In fact, the presence or absence of “type annotations” on arguments in a method definition don’t affect the code that that method generates. It simply affects whether that method gets dispatched to or not for a given function call.

I think it’s helpful to consider that this particular use of types is morally different from when type annotations are used to 1. Document intent, 2. Run programs faster, 3. Reason about correctness statically.


"sliding_window_view" method will be a great addition. Sliding windows are used quite frequently when analysing data and currently you can kind of achieve it using "as_strided" method but that method is very cumbersome to use. But from the examples given, "sliding_window_view" is much easier to use.


Yes! This is a big win for me!

Basically, across projects, I've been reusing a snippet that uses some as_strided magic for years now. The snippet looks seriously deranged, it will be great to refer to something built in... also for my colleagues who now have to understand my as_strided shits.


I _love_ numpy, and I am getting excited about jax, too.

However, I do have one request for it. Getting the argmax of a multi-dimensional array, in terms of the array's dimensions, is difficult for new users.

np.argmax(np.array([[1,2,3],[1,9,3],[1,2,3]])) is 4, rather than (1,1). I understand why, but it seems strange to me that argmax cannot return a value the user can use to index their array.

Having to then feed that `4` into unravel_index() with the array's shape as a parameter seems less elegant than say passing a parameter of "as_index=True" to the argmax.


Consider this:

  In [1]: np.argmax(np.array([[1,2,3],[1,9,3],[1,2,3]]).flat)
  Out[2]: 4


Alternatively you could use flat:

a = np.array([[1,2,3],[1,9,3],[1,2,3]])

idx = np.argmax(a)

a.flat[idx] # 9


Does that work the same way with strided arrays?


Assuming you mean what I think you mean, it does work.

e.g. a[::2, ::3].flat[idx], where idx is from 0 to width*height of the view

(idx can also be a NumPy array, for getting multiple values)


NumPy/Pandas is one of the reasons I'm stuck on Python. I'd prefer a strongly typed language like go/java/rust but there isn't the same library or community. Any recommendations? Its for applications not just DS/ML so Julia isn't really in scope.


Python is strongly typed. Did you mean statically typed?

You can use optional type-hinting in Python now, and install MyPy as a type-checker.


Latest Python (> 3.7?) typing combined with mypy --strict is the best of both worlds. Highly suggest you try it out.


Julia is not just a datascience and ML language. It's a very nice general purpose programming language.


If you know Python, then trying NumPy is pain.

Because for some reason data scientists have managed to misuse the python syntax. E.g. `np.ogrid[ -200000000:200000000:100j,-500000:500000:100j]` it's completely unexplainable what this does. They have managed to overload index/slice operator and imaginary numbers to produce two arrays.

(Example taken from here https://asecuritysite.com/comms/plot06 )


I've been getting some odd warnings (thousands of them) from CI runs against 1.20, typical:

    some-file.py:83: DeprecationWarning: `np.bool` is a
    deprecated alias for the builtin `bool`. To silence 
    this warning, use `bool` by itself. Doing this will 
    not modify any behavior and is safe. If you specifically
    wanted the numpy scalar type, use `np.bool_` here.
    Deprecated in NumPy 1.20; for more details and guidance: 
    https://numpy.org/devdocs/release/1.20.0/notes.html#deprecations

    return self.z[j][i]
But all of the lines mentioned in these warning do not reference np.bool


I haven't seriously used Python for quite a while so I don't know if this is possible/usual, maybe there's some kind of code generation happening at runtime that would throw off line numbers or added/removed/substituted lines compared to the on-disk script files?


I don't actually use "np.bool" anywhere in the project :-|


That's what I meant by code generation, that maybe NumPy at runtime add/removes/substitutes some code to optimize dynamically, although I don't know if that's a thing in Python/NumPy.


Are you using libraries that use `np.bool`? These could trigger warnings to stderr that you would see.



It is great to see type annotations! A huge step in the right direction.


we are slowly approaching the ease of use and efficiency of fortran. In a few decades more we'll be there!


Modern fortran has classes, too! I've occasionally joked that fortran 25 and python 4 are to be the same language...


“I don't know what the language of the [future] will look like, but I know it will be called Fortran.”

— Tony Hoare, 1982


I assume you say it at least a bit in jest, and with a good deal of sincerity.

Did you ever know an interactive environment for producing Fortran code? Anything like a REPL?


In development, but there is lfortran: https://lfortran.org/


The next step after type annotations will be having the type specified by the first character in the variable name.


There's a random shuffle function. Instead of writing one out using an algorithms textbook. Lots of data types get randomly shuffled during coding.

Shuffles arrays and lists. https://numpy.org/doc/stable/reference/random/generated/nump...


The most interesting question here is the newest competitor to Numpy - Tensorflow - https://www.tensorflow.org/guide/tf_numpy

With the added advantage that TF is natively accelerated on M1.

Will TF-Numpy maintain parity ?


TF-Numpy is not a competitor to NumPy. It just introduces a small subset of NP API to TF codebase. In TF you still have immutable tensors, so e.g. sth like: tensor[mask] = new_value doesn't work.


Hmm. I am checking dtypes via `np.floating` and `np.signedinteger`. What will this change to?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: