Python Is Not C: Take Two

wyldfire · on Dec 17, 2015

This is a useful walk through optimization in python.

> The lesson is clear: do not write Python code as you would do in C. Use numpy array operations rather than iterate on arrays. For me it meant a mental shift.

I think the lesson is that "performance critical software requires attention to very many specific details." Using a garbage collected language that allocates objects on the heap by default means that you'll have to take special steps to change the default behavior of the system. You will see very similar behavior with C++, Java, etc.

> I didn't use Pypy so far because it only supports part of Numpy, and not Pandas, Scipy, and Sklearn. It may be that Pypy supports enough of Numpy to run my closest distance code though. If that's the case, then I'll blog about it soon.

I'll confess, I was eagerly waiting the pypy results. IMO using pypy means I get the best of both worlds: more code that can be naively python with python idioms and still yield sufficient performance. For critical algorithms where I can't hit sufficient performance, it's appropriate to use cffi or something to optimize.

notalaser · on Dec 17, 2015

> The lesson is clear: do not write Python code as you would do in C. Use numpy array operations rather than iterate on arrays. For me it meant a mental shift.

That's true in C as well. If you're writing performance-critical code, you use things like BLAS and LAPACK (possibly even the optimized, often parallel versions supplied by your compiler vendor).

jfpuget · on Dec 17, 2015

Thank you, you made Pypy move higher on my todo list.

You have a good point about gc and memory layout. That's why I like C or C++ when writing high performance software. You can control memory locality more easily than with other languages.

ngoldbaum · on Dec 17, 2015

Sure, but what if my work depends on a library that uses C extensions that isn't supported by pypy? Any of the scikits, for example.

wyldfire · on Dec 17, 2015

Unfortunately I have no answer for those. For now scikit and pypy seem mutually exclusive.

joshvm · on Dec 17, 2015

I like Numba, but it's sometimes jarring writing C-like code in a language which almost explicitly advises against it. Once you get into the habit of "FOR LOOP BAD" it's difficult to get out of!

It's interesting seeing what the effect of Big-O is. Once you introduce a logarithmic solution, it doesn't really matter that you're using Python because it's asymptotically so much better than the n^2 brute force approach. Nearest neighbour is a particularly nice demonstration of this, because it's an algorithm which is frequently used in realtime and obviously you don't want a performance hit of 2 seconds for every query!

Note Numpy has a builtin np.radians() which is a little cleaner than a one value look up table.

In [2]: %timeit np.random.random((10000))*3.14 1000 loops, best of 3: 245 µs per loop

In [3]: %timeit np.radians(np.random.random((10000))) 1000 loops, best of 3: 262 µs per loop

gh02t · on Dec 17, 2015

Yeah I've started using Haskell a lot lately, which doesn't really do for loops per se. It was pretty natural to use `map` though because I was plenty used to Numpy style (ok really it's originally APL style but Numpy is where I picked it up).

Numba is pretty nifty for things that are unnatural in this style, where an algorithm is better expressed in an index-dependent style, like e.g. a numerical ODE solver. Sometimes you can get around it with tricks like convolution, but not always.

wylee · on Dec 17, 2015

I guess it's kind of beside the point of the article, but I wonder if the author considered using a library like Shapely[1].

[1] http://toblerity.org/shapely/manual.html

jfpuget · on Dec 17, 2015

Thank you for the suggestion.

At first glance, it does not seem to provide an object for dealing with a finite set of points. Using it would require first to transform the track into a continuous line, via interpolate(), then project() each waypoint to find the nearest point on that line. Issue is that this will not give me one of the original points of the track, which is what I needed.

But I may be wrong.

bazzargh · on Dec 17, 2015

I'm curious about why that was? From the graph showing the raam datapoints, there's a region with large gaps between points, and it would choose a track point that's well away from the original?

I've written some code to deal with this before (in my case, I wrote a distributed timer for cycle races, predicting current rider locations and gaps from sparse user reports). For me, finding nearest point on the segments of the polyline worked, but choosing only vertices would have made the time estimates terrible. Were the points somehow more fixed than that, eg towns/motels on the route?

jfpuget · on Dec 18, 2015

The waypoints are not distributed regularly, but the track points are.

Here is why performance mattered in my case.

I needed to predict future positions of the cyclist (see https://www.ibm.com/developerworks/community/blogs/jfp/entry...). The computation relied mostly on weather forecasts provided for each track points.

I then needed to know where to start my computation, i.e. find the track point closest to the current location of the cyclist.

wylee · on Dec 18, 2015

Good point. I'm not sure off the top of my head if Shapely has a specific function for what you're trying to do. It does have highly-optimized distance functions though. Might be interesting to compare those to your version.

One way I've approached this problem that seems pretty fast is to load a set of points into a PostGIS-enabled database, then use the `ST_Distance` function and order by distance.

[Edit]

Shapely solution:

    points = MultiPoint(((0, 0), (1, 1), (4, 4)))
    point = Point((3, 3))
    sorted(((p, p.distance(point)) for p in points), key=lambda item: item[1])

joshvm · on Dec 18, 2015

An alternative solution that I discovered a year ago. My problem was that I had a list of 2D points lying in an image. I wanted to calculate the distance to the nearest point for every pixel - somewhat similar to a Voronoi diagram except I needed the distances as well as the cells.

When I asked Stackoverflow, having tried Kdtree, someone suggested a simple brute force solution in OpenCL (using the pyopencl library):

https://stackoverflow.com/questions/24020409/producing-a-nea...

Scipy's KDtree took around 30 seconds per 1MP image. The OpenCL solution took 50ms (at the expense of my screen locking up)!

As the response said: "And the improvement was quite good.", no kidding.

jfpuget · on Dec 19, 2015

Thanks a lot. I'll have a look at using OpenCL in my case too.

oneeyedpigeon · on Dec 17, 2015

Does everyone at IBM still use a 1024*768 monitor? Come on, guys, we've moved on from 12px sized fonts.

Edit: BTW, irony intentional.

jfpuget · on Dec 17, 2015

No, I am not ;)

More seriously, are you unhappy with the default layout of developerWorks blogs? What change do you suggest? Not that I am in any way responsible for it, but I can certainly pipe your suggestions to the power that be.

wyldfire · on Dec 17, 2015

This is a good example of Sayre's Law [1], which I only learned about from another HN post the other day. ;)

[1] https://en.wikipedia.org/wiki/Sayre%27s_law

jfpuget · on Dec 17, 2015

Lol and thanks. I'll reuse that one for sure!

cglace · on Dec 17, 2015

I'll echo what other posters have said. Larger fonts. Highlight the code samples, maybe even add a subtle background color to them.

Less important but annoying, why are the share buttons stacked? It creates about 150px of wasted space.

bonobo3000 · on Dec 17, 2015

As long as you're taking suggestions - i would like to see bigger fonts, syntax highlighted code and maybe more color contrast between sidebar/top/comments and the main content.

oneeyedpigeon · on Dec 17, 2015

On a retina screen, 12px is practically unreadable, particularly in Helvetica Neue, which is quite a light font. To be honest, IBM's site has had tiny fonts for as long as I can remember, they just become more problematic as resolutions increase over time.

leephillips · on Dec 17, 2015

I have a retina-class screen and I didn't notice that there was a problem, because I zoomed here as I do for almost all sites.

oneeyedpigeon · on Dec 17, 2015

Maybe I should get into the habit of doing that, then. I've only ever felt the need to do it for one site before (Daring Fireball which uses an unbelievable 11px font!), and I've been put off in the past by the inferior behaviour of zoom as opposed to text size (which now seems to not even be available in Chrome). Minimum font size would be a good workaround if it didn't apply to the developer console, which is pretty crazy.

mkesper · on Dec 17, 2015

Being able to zoom on mobile would be essential (cannot read the whole table on Samsung S3 with Firefox).

jlarocco · on Dec 17, 2015

Article shows up great for me and I'm using 2560x1600, although syntax highlighted code snippets would be nice.

Maybe adjust your browser font settings or DPI settings?

lucozade · on Dec 17, 2015

TL;DR if you want fast code: write Python but execute C.

jfpuget · on Dec 19, 2015

Exactly.

Note that you execute C code anyway when you use Python standard implementation (CPython). Point is to use the best C code you can get with limited effort.

timClicks · on Dec 17, 2015

> The lesson is clear: do not write Python code as you would do in C. Use numpy array operations rather than iterate on arrays. For me it meant a mental shift.

This was actually one of the main motivators for me to learn Julia. No need to vectorize loops. Looping through things is idiomatic Julia code that the compiler knows what to do with.