If you are interested PEP703 describes the scenarios pretty well: https://peps.p...

johnjr · 2024-07-19T18:19:09.000000Z

I just wrote a post about how the Cpython is much faster without GIL:https://news.ycombinator.com/item?id=40988244

arp242 · 2024-07-19T22:00:48.000000Z

I mean, only the threaded version, which is expected. For tons of cases Python without the GIL is not just slower, but significantly slower; "somewhere from 30-50%" according to one of the people working on this: https://news.ycombinator.com/item?id=40949628

All of this is why the GIL wasn't removed 20 years ago. There are real trade-offs here.

rbenchmark · 2024-07-20T20:47:01.000000Z

30-50% is an understatement. The latest beta is more than 100% slower in a simple benchmark:

https://news.ycombinator.com/item?id=41019626

BossingAround · 2024-07-20T07:04:17.000000Z

How is single-threaded code slower without GIL?

pKasdhB · 2024-07-20T09:00:55.000000Z

Because in the --disable-gil build data structures like ref-counting, dicts, freelists, etc. are locked, even when there is only a single thread.

This is the reason why previous attempts were rejected. But those attempts came from single individuals and not from a photo sharing website.

This matters if --disable-gil becomes the default in the future and is forced on everyone.

nemetroid · 2024-07-20T11:59:03.000000Z

That cannot be the reason for a 30-50% slowdown. Uncontested locks are very fast.

krhsG · 2024-07-20T12:33:19.000000Z

They may be fast in C++, but not in the context of CPython. Here are the dirty details. Note that fine-grained locking has also been tried before:

https://dabeaz.blogspot.com/2011/08/inside-look-at-gil-remov...

nemetroid · 2024-07-20T13:48:29.000000Z

Thanks for the link, that's an interesting read. Actually the referenced PyMutex is a good old pthread_mutex_t, the same you'd use in C or C++. But I shouldn't have written so surely. Although uncontested locks are very fast, if the loop is tight enough, adding locks will be significant.

However, PEP 703 specifically points out that performance-critical container operations (__getitem__/iteration) avoid locking, so I'm still highly skeptical that those locks are the cause of the 30-50%.

https://peps.python.org/pep-0703/#optimistically-avoiding-lo...

tialaramex · 2024-07-20T15:05:24.000000Z

The pthread_mutex_t is focused on compatibility at any cost. So while you're right that the C++ stdlib chooses this too, it's not actually a good choice for performance.

But I think you're right be sceptical that somehow this is to blame for the Python perf leak.

tialaramex · 2024-07-20T18:19:33.000000Z

One of the things this spends some time on that was already obsolete in 2011 is using a pool of locks. In 1994 locks are a limited OS resource, Python can't afford to sprinkle millions of them in the codebase. But long before 2011 Linux had the futex, so locks only need to be aligned 32-bit integers. In 2012 Windows gets a similar feature but it can do bytes instead of 32-bit integers if you want.

If a Linux process wants a million locks that's fine, that's just 4MB of RAM now.