Ruby’s GIL and transactional memory

huxley · on Jan 2, 2014

Pypy has a (currently, software) transactional memory branch that is trying to remove the python GIL without changing the semantics

http://pypy.org/tmdonate.html (has the best executive summary of what the branch intends to accomplish)

memracom · on Jan 2, 2014

Just like with Python, why would you even care about the GIL?

Writing single multithreaded apps with low-level locking hardcoded everywhere is now quite clearly NOT the right way to build software. If you don't use locks, i.e. only use lock-free data structures and immutable state, then you won't care about the GIL. And you can use multiple processes and interproccess communications in place of threading. On Linux, the difference in performance between threads and processes is very small. Most people who complain about the GIL have not even profiled multithreading versus multiprocessing. They are just bound and determined to reinvent the wheel in their own code base.

There is no reason why you can't leverage C++ (ZeroMQ) or Erlang (RabbitMQ) to do the hard bits and write the rest of your app in nice simple Ruby (or Python) scripts that are designed according to the Actor Model.

exDM69 · on Jan 2, 2014

> Writing single multithreaded apps with low-level locking hardcoded everywhere is now quite clearly NOT the right way to build software.

It is a very opinionated view that writing multi threaded code with locking is not "the right way". Of course, using locks requires disciplined engineering but there are applications where you might want to use threads in a single process rather than multiple processes. Although it might be less common for domains where Ruby is popular.

Multiple processes and IPC is not a whole lot easier than writing multi threaded code, especially if you have nice a nice concurrency framework with channels, etc available.

> If you don't use locks, i.e. only use lock-free data structures and immutable state, then you won't care about the GIL.

Unfortunately, lock free data structures or immutability will not avoid the GIL. The GIL is used to guard internal data structures in the interpreter and it's practically held always when code is being interpreted and released only to when blocking I/O is happening (at least that's the way it works in Python).

Not being able to write multi threaded code is a weakness in CPython and CRuby and finding ways to avoid locking the GIL would make them better.

pmahoney · on Jan 2, 2014

> Most people who complain about the GIL have not even profiled multithreading versus multiprocessing.

This is a fair point, but the issue might be about memory usage, not speed. A unicorn setup might have two to eight worker processes to service HTTP requests. Even with copy-on-write-friendly garbage collection, the memory usage of each additional process is significant. On the other hand, a thread-based solution (using JRuby, for example) can maintain a threadpool with hundreds of worker threads because the cost of an additional thread is nearly negligible.

nostrademons · on Jan 2, 2014

Why would you need hundreds of worker threads? If you're using an event-loop in each process (which you probably want if only to minimize context-switching overhead, and is how Unicorn does it), then you need only one process per physical core in the machine. Anything else will just sit in the runqueue and cause context switches.

There is definitely annoying memory overhead with multiprocess (vs. multithreaded) architectures, but it's on the order of 2x-8x, not 100x. And that's 2x-8x the code size of the application, not data size - you only need duplicate interpreter objects, anything at the app or framework level (like templates or data files) can be stored in read-only shared memory or just COW'd with no writes. (It's technically not even every interpreter object - a number of function objects are completely static data that will never have additional references made, and so COW means they'll be shared perpetually between processes.)

rubiquity · on Jan 2, 2014

> If you're using an event-loop in each process (which you probably want if only to minimize context-switching overhead, and is how Unicorn does it)

Unicorn is a pre-forking multiprocess server so I don't know why it would be using an event loop.

Why threads over processes? Because memory isn't cheap when you don't own it yourself.

pmahoney · on Jan 2, 2014

> Why would you need hundreds of worker threads?

Shrug why not? You get a different (easier?) programming model where you can use blocking IO rather than an event loop.

> on the order of 2x-8x, not 100x

Yeah, but that 2x might be the difference between one virtual machine flavor and the next price up.

Nursie · on Jan 2, 2014

>> If you don't use locks, i.e. only use lock-free data structures and immutable state, then you won't care about the GIL.

Unless you're trying to do something that requires parallel processing. Crypto cracking was one example recently.

>> And you can use multiple processes and interproccess communications in place of threading.

This adds complexity

>> There is no reason why you can't leverage C++ (ZeroMQ) or Erlang (RabbitMQ) to do the hard bits and write the rest of your app in nice simple Ruby (or Python) scripts that are designed according to the Actor Model.

This also adds complexity.

Basically it would be nice if threads in python acted like they do in other languages, rather than just pretending to.

rossjudson · on Jan 2, 2014

There are tiers for performance. A few hundred (or thousand) ops/sec -- a GIL isn't going to hurt too much. A few thousand -- well-designed IPC is fine. A few tens of thousands -- lock-free gets important. Hundreds of thousands (or millions) -- userspace/kernel transitions and interrupt servicing can dominate, and an application's interactions with the OS need to be very carefully managed...

It all depends on how much you want to get out of your hardware.

dragonwriter · on Jan 2, 2014

The reason you care is because if you remove the GIL, you can write a 1:1 or green-threaded M:N library (actor model like Erlang, or dataflow like Oz, or whatever other models floats your boat) once in Ruby, and then a rite all the application code you want using just Ruby.

Using scripts running in separate OS level processes -- no matter what infrastructure you use to connect them -- isn't necessarily the ideal solution.

polysics · on Jan 2, 2014

Based on our experience running large telephony applications, I would say that the threading approach, using JRuby as it provides a stable GIL-free interpreter, is vastly superior both in resource handling and developer/sysadmin productivity, also known as "less headaches".

nostrademons · on Jan 2, 2014

Our experience at Google (using first CPython and then Java) was that the single-threaded multi-process approach lead to better developer productivity but worse sysadmin productivity. It made reasoning about the behavior of the code significantly easier and wasted less time searching down race conditions, livelocks, and deadlocks, but it meant that SREs and infrastructure teams had to spend more time managing memory consumption and dealing with deployment and monitoring hassles.

So it's a trade-off where the downsides often get pushed off into another group. As a developer, I really miss the CPython solution, which was a lot simpler and seemed more robust. But then, I wasn't the one responsible for pushing out new code or monitoring. I do think there were various optimizations we could've made to our other tools that might've compensated for the need to run more server processes, and wish we'd tried that before jumping to "Let's use a GIL-free language."

polysics · on Jan 2, 2014

We are a pretty small team, and aside from one person most of us double as ops at some point. Not trying to generalize at all, but at our scale, it is far more productive to go to something GIL-less and possibly as self-managing as possible, such as BEAM VM languages. Incidentally, Erlang/Elixir describe applications as a collection of "in process processes", essentially allowing you to reason much like you were doing single threaded processes but without the real system implications. What I see as the most important thing here is that developers have a better pulse on resource needs of their own software, since we do a lot of load testing before pushing anything to production, and the ops team can take our baseline data and handle applications almost as a black box. On the other hand, Docker essentially makes all my points nul by itself, so there is room for all approaches.

_gtly · on Jan 2, 2014

Moore's law-scale performance improvements have increasingly come from harnessing multiple cores and much of Ruby is not taking advantage of this. The GIL can limit performance on multi-core systems. This is a serious problem worth tackling. Many on this thread defending just status-quo solutions seem to miss this point.

twoodfin · on Jan 2, 2014

I haven't read the linked paper yet, but if transactional memory can be used in a feedback loop with the developer to identify sources of resource contention (where, for example, more intricate locking could usefully replace both TM and the GIL) that sounds like a real win.

memracom · on Jan 2, 2014

To me locking means waiting which means greatly reduced performance. That is the whole reason why event-based systems like NGINX and node.js are so popular.

Look at the first slide from this talk by Professor Michael Stonebraker to see why more locking is not such a good idea. http://blog.jooq.org/2013/08/24/mit-prof-michael-stonebraker...

Just because we can do it doesn't mean that we should do it.

viraptor · on Jan 2, 2014

> To me locking means waiting which means greatly reduced performance. That is the whole reason why event-based systems like NGINX and node.js are so popular.

I think you've got it slightly wrong. What do you think happens when one request is being worked on in effectively one green-thread-equivalent in node.js? Everything else is blocked until the current operation yields. That's an equivalent of a GIL, yet it doesn't "reduce performance". If you switched to multithreading + locking (making it run in a M+N scheduling model), you'd gain performance, not lost it.

ksec · on Jan 2, 2014

Ruby needs JIT, then Incremental GC much more urgently then getting rid of GIL and getting TM.

geoffroy · on Jan 2, 2014

I thought Ruby 2 standard library was thread safe now ?

adamtj · on Jan 2, 2014

GILs are at a lower level than that. A GIL is used to protect the internal state of the interpreter.

You (or your standard library) need ruby-level locks in ruby code when you do things like increment a counter. E.g.: "obj.count = obj.count + 1".

The interpreter needs interpreter-level locks when doing things like looking up attributes from an object's dictionary or hash table or whatever ruby calls it. That's why you don't always need a lock around a simple assignment to avoid interpreter crashes: "obj.count = 0" is safe. Without a ruby-level lock around that, there is a race condition with the previous example, and resetting the count to zero may not have an effect if you're unlucky.

Without a GIL, however, you could have problems like the interpreter segfaulting (instead of throwing a nice execption that you can handle). You may also have problems like that assignment turning into an infinite loop, depending on ruby's implementation of hash tables.

And if accessing a hash table isn't safe, you can't even use ruby-level locks. How would you create a lock? You'd need to access classes or functions in a module. The module stores those things in a hash table, which you can't access without a lock. That's what the GIL is for.