Has the Python GIL Been Slain? Subinterpreters in Python 3.8

gmueckl · on May 19, 2019

Hm, this solution seems very cumbersome, inelegant and not like python's "batteries included" approach at all. This means that python will have native threads that behhave as expected minus true parallel execution, so you shouldn't use those, even though the interface is fairly simple. Instead, you should learn to use this weird contraption that is neither multiprocessing nor intuitive multithreading and comes with a cumbersome interface.

I get that the GIL is a very hard problem to solve, but this solution is so inelegant in my eyes that python would be better off without it. I'd feel better if this was a hidden implementation detail that coukd be improved transparently. Just my two cents.

coldtea · on May 19, 2019

>This means that python will have native threads that behhave as expected minus true parallel execution, so you shouldn't use those, even though the interface is fairly simple.

Python already has exactly that, and has had that for ages.

>Instead, you should learn to use this weird contraption that is neither multiprocessing nor intuitive multithreading and comes with a cumbersome interface.

It also comes with performance improvements over multiprocess, so there's that.

Besides the "cumbersome interface" is irrelevant, as it would be easy to wrap and forget about it, the same way nobody really uses urllib directly.

ori_b · on May 19, 2019

> Python already has exactly that, and has had that for ages.

And the point is that this should be fixed, instead of adding yet another clunky way of doing the same thing.

> Besides the "cumbersome interface" is irrelevant, as it would be easy to wrap and forget about it, the same way nobody really uses urllib directly.

Then why bother having it, if the only reason for it to exist is so that it can be wrapped? That's just stacking layers of turd.

coldtea · on May 20, 2019

>And the point is that this should be fixed, instead of adding yet another clunky way of doing the same thing.

Well, unless we have a (1) capable (2) volunteer the point is moot.

Those who actually work on Python development accessed both scenarios and found adding the "yet another clunky way" if not optimal, a more feasible, easier and better use of their limited time.

>Then why bother having it, if the only reason for it to exist is so that it can be wrapped?

Because lower level primitives can be used in many useful ways than some constraining higher level construct, but the latter is still nice to have.

baq · on May 19, 2019

Almost nobody writes assembly nowadays, but I don’t see anyone wanting it gone.

This is assembly. Don’t want to wrap it yourself, fine, don’t complain that it exists.

cyphar · on May 20, 2019

You could implement requests without urllib. You cannot implement C without assembly. That's the distinction between useful layers of abstraction and pointless ones. If a better interface to subinterpreters is bound to be developed, then that should've been the stdlib interface in the first place.

coldtea · on May 20, 2019

>You could implement requests without urllib.

Yes, but you couldn't do more fine-grained than requests things, unless you did them from excruciatingly lower level (e.g. sockets).

ru999gol · on May 19, 2019

I have the same opinion about asyncio, its such a bad API that its almost impossible to use correctly. But still, probably better than nothing.

jnbiche · on May 19, 2019

Coming from the JS world, asyncio feels fine to me. I've noticed a lot of Python programmers who have a strong background in C and Python struggle with asyncio. But in their case, I think it's more them disliking the async programming paradigm rather than the specific asyncio API.

ru999gol · on May 19, 2019

personally I have JS experience (browser/node.js), as well as asio in C++/Boost, EventMachine in ruby and Twisted/Tornado in python. I strongly disagree that people who dislike asyncio only do so because they don't like the event loop. That just sounds like a typical quick dismissal of criticism. No, its a objectively awful convoluted API.

But if you think you can write correct asyncio applications of any significant complexity, that's great. Doesn't change the fact that a lot of highly experienced developers struggle to do so.

jnbiche · on May 20, 2019

> But if you think you can write correct asyncio applications of any significant complexity, that's great.

To be fair, I've never written an asyncio application of any significant complexity (at least compared to the async JS application I've written). But nothing jumps out at me as horrific with asyncio. The worst thing is the split between futures and coroutines, but once you realize that there's no real need to use futures any more with asyncio (although you can integrate futures-based libraries if needed), you're good to go.

But fair criticism, I've only written small asyncio applications. The only sizable async-Python application I've made was made using Tornado and ZeroMQ async APIs(although even that eventually ran on the asyncio event loop and used an asyncio-based library).

theelous3 · on May 19, 2019

Use curio or trio instead :)

async/await are not only for asyncio.

akvadrako · on May 19, 2019

I completely disagree - Python threads are basically "green threads", so they have their place but aren't related to parallelisation. But true multiprocessing is ugly when you have hundreds of cores, which is where CPUs are going. There is no standard UI convention on most OSes to group those processes per app, in terms of signals or stats or whatever.

So besides the unproven possibility of removing the GIL, subinterpreters are the best way forward, better than threads or the multiprocessing package.

zbentley · on May 19, 2019

> Python threads are basically "green threads"

That's not accurate.

> they have their place but aren't related to parallelisation

You can parallelize all sorts of things with Python threads--just not some things you'd expect to be able to parallelize, due to the GIL. Waiting on or buffering I/O, calling out to compiled code, doing cryptographic operations--all of those can be parallelized, as (in many cases) they entail releasing the GIL.

> But true multiprocessing is ugly when you have hundreds of cores

Why?

> There is no standard UI convention on most OSes to group those processes per app, in terms of signals or stats or whatever.

I have no idea what this means. What do UIs have to do with process groups? Do you know how many processes your Chrome instance is running on the operating system? There are very solid conventions regarding process management, at least on Unix-ish systems: process groups and parent-child relationships are well established and well understood, as is their relationship with signals and signal handling.

anaphor · on May 19, 2019

I think he's saying that they allow you to do concurrency, but not parallelism. Those are two different things.

zbentley · on May 19, 2019

They do allow you to do parallelism, though. There are just some things they can't parallelize because of the GIL.

Green threads also allow for parallelism, if they're scheduled onto more than one executor.

anaphor · on May 19, 2019

That's true, behind the scenes they can be running on multiple CPUs, which is the definition of parallelism.

I think threads are a bad abstraction for doing parallelism personally, though. Programs designed to run in parallel should be deterministic, unless they need concurrency for some other reason. I think trying to shoehorn parallel programming into Python's threads isn't necessarily the best approach.

As far as I'm concerned, if I'm using threads and they happen to get scheduled on to multiple cores, then that's a nice optimization, but isn't necessary for what I use threads for.

tellak · on May 19, 2019

If that’s what he’s saying, he’s wrong. Python threads are based on native threads. They are parallel executed, they just don’t do so very well because of the GIL.

chrisseaton · on May 19, 2019

I don’t think they’re green threads - they’re full OS threads. They just have the GIL.

icebraining · on May 19, 2019

It's not that Python shouldn't have subinterpreters, but that they should only be added when they have a better API - at least as good multiprocessing.

pmontra · on May 19, 2019

It's somewhat similar to the GIL removal effort in Ruby [1]

They are isolating the GIL into Guilds there, which are containers for language threads sharing the same GIL. They are providing two primitives for communication between threads in different guilds. Send, for immutable data (zero copy) and move, for mutable data (copy). They remove the need for the boiler plate code for marshalling and unmarshalling. However I bet that there will be some library to hide that code in Python too.

[1] http://www.atdot.net/%7Eko1/activities/2018_RubyElixirConfTa...

Animats · on May 20, 2019

Now that's an interesting approach.

I proposed something similar for Python 9 years ago.[1] Guido didn't like it.

Objects would be either thread-local, shared and locked, or immutable. Thread-local objects must be totally inaccessible from other threads, and not leakable across thread boundaries, for memory safety. (Python has "thread local" objects now, but it's just naming, and not airtight against leaks. You can assign a thread-local object to a global variable.) Shared and locked objects lock when you enter, unlock when you leave. Objects are thread-local by default, so single-thread programs work as before.

Minimize shared and locked, while using thread-local or immutable objects as much as possible. Locking is needed only for shared and locked objects.

This is almost conventional wisdom today, but 9 years ago, it was too radical.

Retrofitting concurrency is never pretty. But we have to. Individual CPUs are about the same speed per thread that they were a decade ago.

[1] http://animats.com/papers/languages/pythonconcurrency.html

riffraff · on May 19, 2019

IIUC, python's sub-interpreters won't have a `move`.

That might not be a bad idea because I am worried `move` will end up being problematic in ruby, but time will tell.

AlexTWithBeard · on May 19, 2019

Copy + move looks like a typical function call to me: pass in a bunch of immutable arguments, return a result.

Such approach would cover many, if not most user cases for multithreading.

rocqua · on May 19, 2019

What happens when the caller modifies the read only arguments while the function is running?

pmontra · on May 19, 2019

I'm not sure if this the question you're asking but in Ruby the program fails.

gsub! is a method of String that mutates the object (vs gsub which returns a new string)

freeze is a method that makes immutable the object it's called upon.

  2.3.0 :001 > def replace_a_with_b(s)
  2.3.0 :001?>   s.gsub!("a", "b")
  2.3.0 :001?> end
   => :replace_a_with_b 
  2.3.0 :002 > replace_a_with_b("abc")
   => "bbc" 
  2.3.0 :003 > replace_a_with_b("abc".freeze)
  RuntimeError: can't modify frozen String
          from (irb):20:in `gsub!'
          from (irb):20:in `replace_a_with_b'
          from (irb):23
   from /home/me/.rvm/rubies/ruby-2.3.0/bin/irb:11:in `<main>'

XMPPwocky · on May 20, 2019

wait, so you can either have one "owner" and transfer ownership between domains (via move) or have shared, but immutable, ownership?

Hm. Who let the Rust folks in?

pmontra · on May 21, 2019

Languages cross pollinate. Eventually every language going through the message passing route will reimplement the Erlang VM and OTP...

Animats · on May 20, 2019

Move is hard without ownership. Rust can do moves because it knows that there's nothing else pointing to the thing being moved. Python doesn't know that.

jashmatthews · on May 19, 2019

Yeah, I’d much prefer a traditional sub interpreter or isolate design with shared nothing than the current „raise an exception if shared non frozen objects are touched“.

It’s much easier to map existing multi process code onto shared nothing sub interpreters.

FartyMcFarter · on May 19, 2019

> This, in turn, means that Python developers can utilize async code, multi-threaded code and never have to worry about acquiring locks on any variables or having processes crash from deadlocks.

Dangerous advice. Whether this is true or not depends on lots of things such as how many and which operations you're doing on those variables.

Sure, CPython might do lots of simple operations atomically, but this is not enough to avoid the need for all locks. Threads can still interleave their execution in many ways.

See also: https://blog.qqrs.us/blog/2016/05/01/which-python-operations...

tasubotadas · on May 19, 2019

The current state of threading and parallel processing in Python is a joke. While they are still clinging to the GIL and single core performance, the rest of the world is moving to 32 core (consumer) CPUs.

Python's performance, in general, is a crappy[1] and is beaten even by PHP these days. All the people that suggest relying on multiprocessing probably haven't done anything that's CPU and Memory intensive because if you have a code that operates on a "world-state" each new process will have to copy that from a parent. If the state takes ~10GB each process will multiply that.

Others keep suggesting Cython. Well, guess what? If I am required to use another programming language to use threads, I might as well go with Go/Rust/Java instead and save the trouble of dabbling with two languages.

So where does that leave (pure-)Python? It can only be used in I/O bound applications where the performance of the VM itself doesn't matter. So it's basically only used by web/desktop applications that CRUD the databases.

It's really amazing that the machine learning community has managed to hack around that with C-based libraries like SciPy and NumPy. However, my suggestion would be to drop GIL and copy the whatever model has been working for Go/Java/C#. If you can't drop GIL because some esoteric features depend on that, then drop them as well.

[1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

AlexTWithBeard · on May 19, 2019

Cython is nice, but debugging it requires gdb. For the PyCharm-loving end-users it may be quite cumbersome.

Those recommending to use multiprocessing have probably never been in that bitter spot when serializing something and computing something takes exactly the same time.

Also forking didn't really work until Python 3.6.

dual_basis · on May 20, 2019

The consistent requirement has been that Python will drop the GIL for anything that doesn't make single-threaded performance suffer. There has been substantial work to this end but no solution to date has achieved this goal.

gray_-_wolf · on May 19, 2019

> If the state takes ~10GB each process will multiply that.

In POSIX there is such thing as copy-on-write memory during forks.. So if that state is mostly read-only, additional memory required by each slave should be minimal.

pjmlp · on May 20, 2019

There is no such thing, as COW on fork() is implementation specific, although most UNIXes do follow it.

gray_-_wolf · on May 20, 2019

Heh, my bad, did not know. Is there any significant UNIX system that does not do COW?

pjmlp · on May 20, 2019

Probably not, as it is a common optimization.

However it isn't required by POSIX for compliance.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/fo...

http://pubs.opengroup.org/onlinepubs/9699919799/

dual_basis · on May 23, 2019

I hereby award you 1 pedantic point. Spend it sensibly!

juststeve · on May 19, 2019

> Go/Rust/Java

And there's also Kotlin.

olliej · on May 19, 2019

This is essentially the same concurrency model as Workers in JS engines - on the one hand it’s a fairly limiting crutch[1], on the other hand it is harder to create a bunch of different classes of concurrency bugs.

[1] vs fully shared state of C-like, .NET, JVM, etc, etc. Rust’s no-shared-mutable state model allows it to do some fun stuff but python (and JS) don’t really have a strong concept of mutable vs immutable, let alone ownership so I don’t think it would be applicable?

icebraining · on May 19, 2019

Python already has fully shared state across multiprocesses: https://docs.python.org/3.7/library/multiprocessing.html#sha...

maayank · on May 19, 2019

Very limited shared state (only c types, including structs), which in practice means (for non trivial apps) some form of marshaling from python classes to the C structs. Boilerplate abound! In other words, nothing behind mmap with simple size calculations, which is what you'd really want to abstract away.

olliej · on May 19, 2019

As @maayank has already said, python’s shared state is limited to pure c-structures, rather than actual python objects. Web workers have the same through SharedArrayBuffer (I am unsure though whether the security pains from spectre, etc have allowed them to be turned on again).

SharedArrayBuffer is fundamentally the same as python’s C-only sharing - a bunch of raw bytes not tied to the host environment’s type system.

nicoburns · on May 19, 2019

JS (or at least wasm) should be getting opt-in shared memory at some point. That seems like a pretty good way to do it to me.

Animats · on May 19, 2019

This is just a way to do the same thing as "multiprocessing", but with less memory usage. You still have multiple Python instances that send messages back and forth.

I wonder if they ever fixed the CPickle bug which broke it if you were using CPickle from multiple threads.

loeg · on May 19, 2019

Yeah, it's got some of the same weaknesses as multiprocessing (and several new ones). Conceivably you could provide an API for handing off objects to the other interpreter without copying. I'm imagining an API like:

  my_foo = interpreterX.pass_object(my_foo)

(The assignment being required to delete the originating reference from the source interpreter.) The interface would be obligated to check that there are no references that escape to the current interpreter and then my_foo and all referenced objects could be handed off to the other interpreter in whole.

I don't have any intuitions for if that would be cheaper than copying or not, and getting it right is certainly more difficult than serialization. (Because of the complexity, it's not worth having if it isn't cheaper.)

lsorber · on May 19, 2019

What in your view are the weaknesses of subinterpreters compared to multiprocessing?

pas · on May 19, 2019

If they live in the same process (address space), then it's a lot harder to correctly manage concurrent memory access (at both low level and high).

It's not a theoretical problem, just a very likely pragmatic observation. (Meaning, we'll see bugs in both the interpreter and the code using this feature.)

Plus if one of the threads crashes, the whole process aborts. (Sure, the interpreter can handle a lot of faults gracefully, but not all.)

loeg · on May 19, 2019

The main weakness is the one pas described in the sibling comment. Historically, Python C modules, inside and outside stdlib, have more or less safely been able to assume there was one global interpreter. This means they may have global state (apparently even some stdlib C libraries assume this). With multiple interpreters in the same process address space, any global state in C modules conflicts.

I'm not claiming any originality here — the author of the article recognizes this problem and describes it, starting with:

> Because CPython has been implemented with a single interpreter for so long, many parts of the code base use the “Runtime State” instead of the “Interpreter State”, so if PEP554 were to be merged in it’s current form there would still be many issues.

mintplant · on May 19, 2019

Less memory usage, and - hopefully - without all the quirks that crop up with multiprocessing. Off the top of my head: subprocesses don't always want to die along with the main process; error conditions can cause the underlying IPC layer to end up in a permanently stalled state.

loeg · on May 19, 2019

It avoids some quirks but also introduces new quirks multiprocessing doesn't have, like broken C modules (including parts of the core interpreter and stdlib) that have global state, rather than per-interpreter state. There's a huge ecosystem of Python libraries in the world and most have been able to more or less ignore the distinction between per-interpreter state and global state prior to this proposal. (Not true if you actually used the C API to embed many interpreters in a process, but most people don't do that.)

btown · on May 19, 2019

Is it possible to load an instance of a native library per interpreter? Give each one its own memory space?

loeg · on May 19, 2019

I don't think you can do this with traditional dynamic linkers — the separate memory space is not so difficult (requires relocatable libraries, -fPIC, which is usually already enabled on ASLR systems) — but you would want to be able to load the same .so twice without symbol naming conflicts. I don't think most dynamic linkers (ld-linux / rtld-elf) support that. I could be mistaken, I am not very familiar with any implementation.

There is nothing preventing you from adding this support to an existing dynamic linker and using it for your program, though.

nybble41 · on May 20, 2019

> but you would want to be able to load the same .so twice without symbol naming conflicts

This is supported in ld-linux by using the dlmopen() glibc function and distinct namespaces. Loading the same .so file multiple times is one of the use cases explicitly mentioned in the manual page[1].

[1]: http://man7.org/linux/man-pages/man3/dlmopen.3.html#NOTES

loeg · on May 20, 2019

Thanks, I wasn't aware of that!

One caveat seems to be:

> The glibc implementation supports a maximum of 16 namespaces.

pas · on May 19, 2019

That requires making the native parts of the stdlib position independent (relocatable). I don't know what obstacles are there currently, but basically you'd need to have a runtime linker that gives a reference to the loaded object/module to access the symbols, but these are usually handled by the C compiler and linker. So it's doable, just a lot of effort.

loeg · on May 19, 2019

Theoretically, if this proposal lands, all of the Python stdlib will have to be fixed not to have global state — the main concern would be 3rd party C libraries, I think.

frutiger · on May 19, 2019

I think you can avoid these kinds of issues if you use multiprocessing.Pool as a context manager.

gigatexal · on May 19, 2019

No, Mr. Click-baity-title it’s not. They’re still there just you can use many interpreters now like one would when using the multiprocessing module. I do like the idea of Go-like queues for message passing.

sbierwagen · on May 19, 2019

Betteridge's law of headlines is an adage that states: "Any headline that ends in a question mark can be answered by the word no."

https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...

zapzupnz · on May 19, 2019

It makes sense. If the answer could be 'yes', the title would be an affirmative statement.

yingw787 · on May 19, 2019

From my limited understanding, I think Eric Snow’s push to use subinterpreters is to move an orchestration layer for multiple Python processes from the service layer to the language layer. It may also modularize Pythons’s C API scope. It may also be one of the cheapest ways in order to provide for true CPU bound concurrency in Python, which is important given Python’s limited resources.

MichaelMoser123 · on May 21, 2019

Wow, just like perl threads since perl 5.8 (1) When in doubt, look at the granddaddy of scripting languages, all your trials and tribulations in scripting land have been considered in the past.. let's all sing 'living in the past' by Jethro Tull (2) this one is also good (3)

(1) https://perldoc.perl.org/threads.html

(2) https://m.youtube.com/watch?v=EsCyC1dZiN8

(3) https://m.youtube.com/watch?v=mXeoNX7DSc8

andrewshadura · on May 19, 2019

Tcl has had threads that were subinterpreters since a decade ago or more. I find it quite ironic that Python, it would seem, is reinventing it, only in a less elegant way.

rkeene2 · on May 20, 2019

I'm personally glad that Python is (poorly) copying this feature from Tcl. This means it's closer to the time when JavaScript (poorly) copies it from Python ! ;-)

cmacleod4 · on May 20, 2019

Actually Tcl has been successfully using the model of one or more interpreters per thread since Tcl 8.1, released in 1999, a full TWO decades now.

mixmastamyk · on May 20, 2019

The functionality was always there it just rusted over from disuse.

yjftsjthsd-h · on May 19, 2019

Everything old is new again:)

fithisux · on May 20, 2019

My thoghts exactly.

bch · on May 19, 2019

This sounds like an application (or variation) of the apartment threading model[0]. Given the problem and it’s desrciption/characteristic (Global Interpretter Lock), this sounds like an elegant approach.

[0] https://docs.microsoft.com/en-us/windows/desktop/com/process...

mballantyne · on May 19, 2019

Racket's "places" work a similar way, though do a bit extra to get down to one memory copy, rather than two: https://www.cs.utah.edu/plt/publications/dls11-tsffd.pdf

Uptrenda · on May 19, 2019

There's nothing wrong with the GIL as long as you know its there. It makes writing concurrent code in Python semi-magical and thats a huge benefit. Concurrent != parallel though, so if there's really a need to scale up to multiple cores there's always the option of forking with multi-processing or "sub interpreters."

I can think of maybe having network code run in their own process and the UI in another. That way there's no risk of bottle necks slowing down the UI and transfers are likewise protected. If you look at bottle.py it seems that this approach could add A LOT of performance for managing downloads / uploads if it's done right.

weberc2 · on May 19, 2019

How does the GIL help you write concurrent code?

dual_basis · on May 20, 2019

It means you never have to worry about message passing or locks, because they aren't a thing at all. On the other hand, what looks like concurrent code often isn't, and is actually run slower than single process code, also because of the GIL.

weberc2 · on May 21, 2019

That reduces to "you don't have to worry about performance because the GIL doesn't let you write performant code". That's not what I think of as "helpful".

cyphar · on May 20, 2019

> Another issue is that file handles belong to the process, so if you have a file open for writing in one interpreter, the sub interpreter won’t be able to access the file (without further changes to CPython).

Wouldn't just using CLONE_FILES when forking off interpreters solve this problem?

qwerty456127 · on May 19, 2019

> The GIL also means that whilst CPython can be multi-threaded, only 1 thread can be executing at any given time.

How does this make sense? What's the point of having multiple threads then?

jcl · on May 19, 2019

It could be better phrased: "whilst CPython can be multi-threaded, only 1 thread can be executing Python code at any given time." Other threads can be doing other things at the same time -- just not actively interpreting Python bytecode.

xkgt · on May 19, 2019

It is because only one thread at a time holds the lock in order to avoid race conditions. The keynote[1] by Raymond Hettinger from PyBay '17 will be a great place to start if you are new to this.

[1] https://youtu.be/9zinZmE3Ogk

keypusher · on May 19, 2019

Not all operations are CPU bound. For anything that is IO bound, such as reading a file, db access, network calls, etc, CPython threads work just fine.

tylerhou · on May 19, 2019

And even for CPU bound programs, if your CPU is running non-Python code (e.g. numpy functions that call to FORTRAN/C) then CPython threads still work.

weberc2 · on May 19, 2019

Until someone adds a callback from C back into Python and the unexpected GIL contention makes performance dramatically worse even as compared to the single-threaded version.

pletnes · on May 19, 2019

some C libraries release the GIL before running CPU intensive computations. Examples include numpy and hashlib.

isbvhodnvemrwvn · on May 19, 2019

Parallelism allows multiple threads interleave with each other. It does not guarantee parallelism (two or more threads executing at the same time). It's similar to multiple threads operating on a uniprocessor system, with the difference that I/O can happen in parallel.

munchbunny · on May 19, 2019

I believe this is still beneficial in I/O bound processes.

boulos · on May 19, 2019

The usual answer is: in the case of blocking I/O, the thread running send/recv can block while other python code runs.

In practice, this doesn’t work particularly well, as you rarely have massively I/O bound things in Python.

keypusher · on May 19, 2019

Not sure what kind of applications you work on, but most web apps are IO bound (waiting on network calls).

preordained · on May 19, 2019

I'm not the person you were originally responding to, but I know Python is popular in data science and such where there would be a lot more tied up in pure computation/number crunching.

dodobirdlord · on May 19, 2019

I know that you already got an answer to this effect, but to provide some more information, most data science workflows in Python rely heavily on calls into numerical libraries (numpy, scipy, pandas, tensorflow, pytorch, matplotlib) that are Python wrappers over compiled binaries (mostly C and Fortran, a not-inconsiderable amount of handwritten Assembly), that have been constructed so that the wrapper safely yields the GIL before invoking the underlying binary. This is all the more important when considering libraries like tensorflow or pytorch that may involve complex long-running interaction with training resources across a network. Control is yielded to allow the interpreter to continue carrying out tasks like displaying the ongoing training progress, or loading training data.

thelastbender12 · on May 19, 2019

To that point, most third-party libraries used also delegate the number crunching to C/C++ code, so most time is still spent outside the GIL.

boulos · on May 19, 2019

I should have clarified: when I meant "doesn't work particularly well", I meant that its incredibly difficult to saturate say a 100 thread box. (Which with Rome isn't crazy!)

Only being able to run a single thread of "logic" basically makes that impossible, as you need to usually do some computation to figure out what bytes to send/receive, do something with the results, and so on.

ptx · on May 19, 2019

The program doesn't have to be massively I/O bound for threads to be useful. Even with only a UI thread and a single background thread they're useful for animating a progress indicator in the UI thread while waiting for a slow network request to finish in the background thread, which works just fine in Python.

zbentley · on May 19, 2019

> you rarely have massively I/O bound things in Python.

Citation needed.

riskneutral · on May 19, 2019

"How much overhead does a sub-interpreter have? Short answer: More than a thread, less than a process."

So ... No.

moefh · on May 19, 2019

I know nothing about Python internals, but my understanding from the article is that this "overhead" is about creating a new sub-interpreter (loading modules is particularly slow in Python), not the performance of executing code after it's created.

The article also makes it clear that each sub-interpreter still has its own GIL, but two sub-intepreters can run at the same time without having to care about each other's GILs.

loeg · on May 19, 2019

Message passing via serialization / copy is inherently more expensive just passing a reference between threads. So any benefit vs threaded Python depends on the ratio of IPC to actual Python bytecodes interpreted.

IPC-heavy programs with low concurrency may suffer worse under this model than threaded with traditional single GIL. As threads approach infinity, though, anything that scales beats the single GIL model.

chrisseaton · on May 19, 2019

So how do these subinterpreters communicate? By copying? There’s the overhead compared to threads.

> Each of these [methods for communicating between subonterpreters] has pro’s and con’s, all of them have an overhead.

loeg · on May 19, 2019

Other overheads include spinning up a full interpreter state (including object and malloc caches, GC, etc) per sub-interpreter. And there are some modules with process-global semantics, such as signal-handling — it's unclear how that will be coordinated between co-interpreters, if at all.

newen · on May 19, 2019

Exactly. This seems more like a way to advertise that the GIL is gone while not actually having shared memory parallelism via threads.

It reminds me of how python advertising tricks people into thinking python has real parallelism by talking about ayncio and import multiprocessing; people then actually try those out only to discover the sad state of affairs.

Alex3917 · on May 19, 2019

Are there any overall benchmarks for Python 3.8 yet? I know there are a bunch of performance improvements for calling functions and creating objects, but I have no idea how that translates to real software.

dragonwriter · on May 20, 2019

Huh. This sounds a lot like Ruby Guilds. This looks it will land sooner, though likely in less complete form, as even the prototype Guild implementation has inter-guild communication.

sciurus · on May 19, 2019

Some earlier coverage: https://lwn.net/Articles/754162/

eximius · on May 19, 2019

Oof, that code-as-strings API guarantees I will never use it.

magwa101 · on May 20, 2019

Same process for everyone, small team bootstraps with Python. With success they find another language, now, mostly Go.

madhadron · on May 19, 2019

Am I misreading, or does this say that I have to serialize and deserialize data within the same process?

firethief · on May 20, 2019

> If you want truly concurrent code in CPython, you have to use multiple processes.

Uh what?

imhoguy · on May 19, 2019

Wouldn't it be good to have Python 4.x next with all these workarounds cleaned up and with only one right pythonic way for parallel procesing? Surelly with a bit of backward compatibility sacrificed like 2 vs 3.

stefano · on May 19, 2019

> Surelly with a bit of backward compatibility sacrificed like 2 vs 3.

After 10 years, the 2to3 transition is still ongoing, and lots of companies will be dealing with python2 code for a long, long time. Breaking backward compatibility was a terrible decision. Breaking it again now that most libraries and open source projects have finally moved to python3 would be suicide.

Programming languages should take backward compatibility as one of their most important features, because of the hundreds of millions of lines of code already written and that would need to be changed and validated. This is especially true in a dynamic language where you can't even rely on the compiler to point out the issues.

gray_-_wolf · on May 19, 2019

> dealing with python2 code for a long, long time.

Won't python2 be EOLed sometime in 2020? Is someone expected to pick up security maintaince after that?

yjftsjthsd-h · on May 19, 2019

Just because something is EOL/unmaintained doesn't mean that people stop using it.

FartyMcFarter · on May 19, 2019

Getting rid of the GIL would make the 2 to 3 transition look like kindergarten. There's lots and lots of Python code (and C extensions) that makes assumptions enabled by the GIL.

For starters:

If you get rid of CPython's atomic operations on lists and other simple objects, lots of Python code would need more locks than it needs now.

If, on the other hand, you wish to keep those operations atomic, without the GIL that means introducing fine-grained locks on specific objects, which would worsen even single-threaded performance. Imagine taking a pthread mutex each time you append an item to every list or store a number in every variable.

anewhnaccount2 · on May 19, 2019

Please no. There's a long tail of unmaintained but working libraries. Throwing away compatibility is not worth it.

DonHopkins · on May 19, 2019

Don't think of it as throwing away compatibility, it's just adding incompatibility. ;)

stefano · on May 19, 2019

You'd think people would learn from the 2to3 debacle.

azinman2 · on May 19, 2019

You could always have a compat API that’s different, or puts in “legacy” mode.

zbentley · on May 19, 2019

In order for such an API to work, you'd have to bundle the vast majority of the existing Python runtime.

_wmd · on May 19, 2019

Considering this will require basically a total rewrite of the C extension API, without having had a chance to look yet, the idea it could be done in a minor release seems shocking

upofadown · on May 19, 2019

It will probably come to something like that (for the Unicode issue alone) but it would be a disaster. Python has always had an issue with ongoing changes that break existing code. That was the idea behind Py3, that we would do all the breakage at once and be done with it forever. Ironically that made Py2 a much better target for a time as it was not changing very much.

AlexTWithBeard · on May 19, 2019

Larry Hastings gilectomy project is an interesting approach.

https://lwn.net/Articles/754577/

TLDR: simply replacing object usage counters with their atomic versions grinds the interpreter to the halt.

sandGorgon · on May 19, 2019

hmm...there's no mention of Gevent - does Gevent share GIL state as well ?

tus87 · on May 19, 2019

The ghost of Perl5 lives on...

Ultimatt · on May 19, 2019

Good thing Perl 6 was created and has a fantastic multi threading story.

https://docs.perl6.org/language/concurrency

https://youtu.be/l2fSbOPeSQs

_wmd · on May 19, 2019

Whoever is downvoting this, please realize this design stinks strongly of something Perl5 had a long time ago, that almost nothing could make use of due to a variety of limitations.

The question in my mind is how the Python design addresses the problems that made the perl equivalent so unusable. It's only had ~20 years of context to do better, so I'm hopeful

gpapilion · on May 19, 2019

I was thinking this sounded nearly identical to perl 5s threading implementation.

fanf2 · on May 19, 2019

Yes this is very reminiscent of the interpreter threads model from perl 5.8 (2002)

https://perldoc.perl.org/threads.html

tyingq · on May 19, 2019

The warning there is worth pulling up here:

"The "interpreter-based threads" provided by Perl are not the fast, lightweight system for multitasking that one might expect or hope for. Threads are implemented in a way that make them easy to misuse. Few people know how to use them correctly or will be able to provide help.

The use of interpreter-based threads in perl is officially discouraged."

Perl5 also had a similar queue based scheme for sharing data across the interpreters: https://perldoc.perl.org/Thread/Queue.html

zaro · on May 19, 2019

That's what I thought also when I saw the article.

mrmonkeyman · on May 19, 2019

Let it rest in peace please. All non-python devs know it's taking its last breaths. Give it some space. Python is dead, long live python.

zaptheimpaler · on May 19, 2019

You might want to look at some data around that lol.

yjftsjthsd-h · on May 19, 2019

Yep, Python is dying. Just like C, Java, FORTRAN, and all the others:)