First Python 2.7 interpreter to use multiple cores

sp332 · on May 9, 2012

To be a bit pedantic: the multiprocess module works fine for multiple cores in regular cpython, I actually used it for a little graphics program I wrote. Nice to see the GIL gone for multithreading though :)

timtadh · on May 9, 2012

It isn't pedantic just irrelevant. Using multiprocess is an entirely different beast than using threads. All shared values have to explicitly allocated that way in shared memory. Since there is a separate interpreter for each process there is no problem with internal state getting messed up. Threads are just a harder problem for python and one which has never seen an effective solution.[1] That is why this kind of work is exciting.

[1] You may argue (correctly) that multiprocessing and using pipes to communicate is a better solution than threads to begin with (for many applications). However, sometimes a threaded solution is the right solution and previously if you needed threads python didn't work for you.

slurgfest · on May 9, 2012

Using multiprocess is different from using threads. But that doesn't make it irrelevant, because it is a completely functional, already-existing way of using multiple cores.

andreasvc · on May 10, 2012

I wouldn't call it fine. Try hitting ctrl-c when running a process Pool. Worse, it's not just ctrl-c, but any abnormal termination such as a segfault as well which will screw everything up. The bug is acknowledged, but it seems they won't fix it. In my opinion that makes it better to just leave it out of the standard library.

j_baker · on May 10, 2012

To be even more pedantic, python already runs on multiple cores. They poll the GIL until they can run. I believe some version of Python 3 fixes this behavior so they don't have to poll the GIL.

Peaker · on May 10, 2012

The GIL is not a spinlock. Since only one core gets the GIL at a time, it's still just using one core (even if that one core is being switched around).

The GIL is released during the execution of some C code, so another thread may execute Python code in the meantime, but that's probably besides the point too.

willvarfar · on May 10, 2012

As we are in the <pedantic> sub-thread, we can point out that actually a Python process can use more than one core at the same time, just trying to acquire the GIL.

This is how Python programs can show up using more than 100% CPU in `top`.

http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-und... covers this

halayli · on May 9, 2012

I much prefer share-nothing approach and use sockets + subprocess for communication. It scales nicely.

freyrs3 · on May 9, 2012

Depends on what you're doing. Shoving a numpy array with 10^7 elements over the wire isn't going to scale nicely at all. There are still a lot of computing scenarios having shared memory is the best approach.

jandrewrogers · on May 9, 2012

You do not need threads for shared memory. I know of quite a few high-performance software systems that use process isolation and yet leverage shared memory via mechanisms like mmap() for IPC.

exDM69 · on May 10, 2012

A context switch from one thread to another is less expensive than from one process to another. And there are other advantages to running in a shared memory address space. It tends to make some things complicated, but if getting the most out of your CPUs is the goal, threads are the way to go.

d0mine · on May 10, 2012

On some systems (Linux) there is no difference once threads/processes are initialized. Simple benchmark confirms it (see my comment about mmap).

seunosewa · on May 9, 2012

Good point, but shared memory via mmap isn't very Pythonic since you have to manage the shared memory manually, you can't allocate Python objects on the shared region, etc.

d0mine · on May 9, 2012

You don't need to use mmap explicitly. multiprocessing.Array and numpy arrays can share the same memory that can be used from different processes e.g., http://stackoverflow.com/a/9849971

freyrs3 · on May 9, 2012

Yes, but if you're doing HPC idiomatic Python comes second to raw performance.

helpbygrace · on May 9, 2012

It seems to use Software Transactional Memory scheme. I hope that it will work better with hardware-aided transactional memory in Intel's next generation architecture Haswell.

dbecker · on May 10, 2012

This is incredibly interesting, and the people working on it are incredibly smart.

But I think it's a slight oversell to call this a python 2.7 interpreter. For instance, most scientific computing code that runs in cpython 2.7 won't run on pypy.

It's impressive from a computer science point of view. But, pypy has a ways to go before I'd call it a python interpreter without adding qualifications.

fijal · on May 10, 2012

PyPy is officially sanctioned as python 2.7 interpreter. Cannot run scientific code does not rule it out. This is more a problem on the scientific code being too implementation specific (for good reasons though). This is known to be a problem to a lot of people (including me), however it does not rule pypy as a python 2.7 interpreter (without qualifications).

dbecker · on May 12, 2012

My apologies. I mistakenly lumped C extensions in as an inherent part of python. I was wrong.

lloeki · on May 10, 2012

> most scientific computing code that runs in cpython 2.7 won't run on pypy

Pure python? or stuff written in/depending on C modules like NumPy? PyPy is supposed to be 2.7 compliant if pure python, while C extension support is still experimental.

tocomment · on May 9, 2012

I'm not following this. Does it mean the GIL is finally gone?

axiak · on May 9, 2012

This is the first of what hopes to be many releases of pypy using STM for multithreaded safety. Up until this release Armin had a special API for creating and running transactions. Now he has wrapped python's threading module to use STM, see http://morepypy.blogspot.com/2012/05/stm-update-back-to-thre... for more information.

Note that this is independent of CPython development.

masklinn · on May 9, 2012

That is the goal of the effort (see the pypy blog for the history of the effort, http://morepypy.blogspot.com ). But as noted in TFA, it is currently very, very slow, so no the problem is not solved, creating a very very slow GIL-free interpreter has been possible for a long time, just... very very slow so not acceptable.

Also it's not CPython. Technically, if you want the GIL to be gone on Python in general you can already use IronPyton or Jython which do not use a global interpreter lock.

Tobu · on May 9, 2012

It is a branch of PyPy (a fast and compliant Python 2.7 interpreter) that behaves as if the GIL was still there — statements are always run atomically, so are the new with atomic blocks — but will actually run multithreaded programs on multiple cores (as long as contention is low). It will also provide a transaction module (implemented with threads and atomic blocks) so that existing reactors can be ported, allowing coroutine-based programs to run on multiple cores too. This isn't faster at the moment, that will become the focus once this is stable.

Current documentation: https://bitbucket.org/pypy/pypy/raw/stm-thread/pypy/doc/stm....

The plan: http://pypy.org/tmdonate.html

slurgfest · on May 9, 2012

I will get excited when it actually works out to be faster than interpreters with the GIL.

I'm a bit disappointed that Python seems to have no interest in standardizing a lightweight-thread (greenlet, coroutine) interface that isn't an awful mess. yield is powerful but coroutine implementations with it are baroque and opaque. Java-style threading has a million gotchas and the interface is just a dog. I want to be able to teach people with a week of experience to write concurrent programs. (What about writing turtle programs with multiple turtles running at the same time?)

If Python doesn't do it, then I want someone else to eat their lunch.

viraptor · on May 9, 2012

Most likely it never will be faster than just GIL. It may be very close to the same performance, but nothing is likely to beat a "one lock every N thousands of instructions" approach when you have a single thread.

If it's within 5% though, I'll be happy - especially if the backend is possible to switch. It seems that it's possible now - so if you know you're running only in a single thread you could just choose the cut-down version.

jmtulloss · on May 9, 2012

Stackless works fine for this kind of thing.

Tobu · on May 9, 2012

Which of the existing coroutine implementations have you tried, and what did you dislike about it?

slurgfest · on May 14, 2012

The issue isn't that all the third-party implementations are no good, several are good. It's that there is no Pythonic one which is also standard and blessed, which would help with interoperability as well as providing one obvious way to do it.

I'd rather Python moved in the direction of go than Java

jlgreco · on May 9, 2012

In my humble opinion, the greatest problem with all of them is that there are so many. Most of them are rather fine I think, but it can just be a serious pain in the ass getting your head around all of them.

Tobu · on May 9, 2012

I've only needed one (Twisted, which is actually slightly outside the space occupied by gevent et al). You can't stop people from building more, but feel free to bet on a popular project if you don't want to deal with choice.

jlgreco · on May 10, 2012

In a world where I only work with my own code, that works. Unfortunately that is not my reality.

sp332 · on May 9, 2012

Yes (although so far only in this branch of PyPy). It basically uses fine-grained locking instead of a global lock. I can't explain it simply so I'll just point to recent discussion: https://news.ycombinator.com/item?id=3941642 and https://news.ycombinator.com/item?id=3464484 and the beginning of this project https://news.ycombinator.com/item?id=2710235

chimeracoder · on May 10, 2012

My understanding of the GIL is that its inclusion was justified by the fact that removing it would speed up multithreaded programs at the cost of slowing down single-threaded programs (the latter being more common than the former).

Given that, and given that the patches for removing the GIL were submitted for an earlier version a while back (2.4 or before, I believe), is it unfeasible/impossible to design an interpreter that detects whether a program requires support for multiple threads and then act accordingly? Since this currently requires the inclusion of a module, it seems like it would be unambiguous.

sanxiyn · on May 10, 2012

Detection will be non-trivial. On the other hand, it could be a simple command line option, like specifying the number of cores you want your program to run on.

chimeracoder · on May 10, 2012

Not really - we already have __future__ imports, which must be declared at the top of the file before any other modules.

I think this could be achieved in a simple way in a similar fashion - some sort of a straightforward 'multithreaded' pragma shouldn't be too much of a burden.

sanxiyn · on May 11, 2012

__future__ is per-module, so while it is good for syntax changes, I think it is inappropriate for runtime pragmas.

ori_b · on May 10, 2012

I'm not sure about that. If you hook thread creation into invalidating the code that was was JITted without STM, you might be able to handle it reasonably well.

Of course, the slow path would just use STM all the time.

sanxiyn · on May 10, 2012

> ... invalidating the code that was JITted ...

I mean, that is non-trivial.

ori_b · on May 10, 2012

Maybe. If you determine that you only need to do it on spawning the first thread, and never go back, then you can probably spare the expense of throwing out ALL of the jit cache and starting from nothing.