Ruby 2.1 Garbage Collection: ready for production

matthewmacleod · on April 8, 2014

I think the takeaway is that there actually is a straight-up bug in the 2.1.1 GC that causes unbounded memory growth, and that the new GC does typically result in higher memory use.

The memory issue isn't really that serious, as it seems to be a tradeoff for performance. Although it's not like Ruby is light on memory use as it is…

Far more interesting are some of the other issues, like this one: https://bugs.ruby-lang.org/issues/9262

For an app like Discourse 3-10% of request time is occupied looking up methods, due to cache inefficiency.

That's amazing, and demonstrates that there's probably still quite a lot of low-hanging performance fruit that Ruby can look to exploit.

All of that aside, performance is generally so much better in the 2.1.1 series that it's really worth using.

rubiquity · on April 8, 2014

I think it's part bug and part having a GC with only two generations (old and young). When you have to choose between putting these tweener objects somewhere, you have to be more conservative and move them to the old generation. Once a third generation is added (Ruby 2.2?) this will be much smoother.

> For an app like Discourse 3-10% of request time is occupied looking up methods, due to cache inefficiency.

Hmmm, I thought Ruby 2.1 already had a per-class method cache, or maybe it was just a per-class method cache invalidation, but I don't know how you coud have one without the other. I'll have to reinvestigate this.

> That's amazing, and demonstrates that there's probably still quite a lot of low-hanging performance fruit that Ruby can look to exploit.

I'm not sure I share as much of a positive outlook. Short of adding JIT compilation, I think the gains from here on out will start to get smaller and smaller. The performance gains of RGenGC were very impressive, though.

vidarh · on April 8, 2014

I'm working on a "as static as possible" Ruby compiler as a hobby project, and it's incredibly frustrating at times to see the generated code grow to ridiculous size as I'm getting closer to actually complying with real Ruby semantics... But I do still think there are substantial gains possible.

For starters, for most method calls there's no reason to do the expensive method lookups that MRI still uses - cache or no cache - you can use C++ style vtables, as long as you propagate updates to them downwards when a method is re-defined. You do need to be able to fall back to handle dynamically created methods with names not present when you generate the vtables, and optionally reduce waste (as the vtables needs to be the same size for all classes, with unimplemented methods replaced with pointers to method_missing thunks), but in terms of performance you can do fairly well and compared to this GC blowup, the memory waste would be small.

But there's also not much alternative but going for proper JIT'ing of at least some things.

pjmlp · on April 8, 2014

Dynamic languages tend to gain more from JIT as AOT due to such issues.

On the other hand, have a look at Dylan, as it might inspire you:

http://opendylan.org/

vidarh · on April 8, 2014

For the method lookup, other than for methods that are dynamically generated with names not known at compile time, the only additional gain you'll get from JIT is by going to full on inline caches, but vtables gets you most of the speedup without the hassle of inline caches and tracing, and doesn't prevent using tracing and inline caching down the line.

pjmlp · on April 8, 2014

With JITs you get devirtualization as well, so no need for vtables.

Something possible in AOT as well to certain extent, but it requires a mix of profile guided optimizations coupled with whole programm analysis.

Which have issues with dll/so anyway, as those calls cannot be optimized away as in JITs.

vidarh · on April 8, 2014

> With JITs you get devirtualization as well, so no need for vtables.

That's what I referred to with "inline caches". The problem is that for Ruby you need fully polymorphic inline caches, with guards all over the place, because unless you do tons of analysis upfront, you will have problems knowing whether or not the world has totally changed on you after any method call, and almost anything is a method call. (call into code you have not verified can't possibly call "eval", and you might find that adding two integers afterwards does in fact not add them, but returns a string and changes global variables, and what-not)

The upshot, is that compared to vtables, you're not actually saving all that much. E.g., take "1 + 2 - 3". You could inline Fixnum#+ (and could reasonably do so with an AOT compiler too). But you need to add a type guard before the inlined fragment to verify that Fixnum#+ still is the Fixnum#+ you inlined, which at the minimum costs you a comparison and a branch or you need to record every call-site with inlined code and be prepared to overwrite it with fixups if the implementation changes.

And if Fixnum#+ has been overridden, or the Fixnum#+ implementation has method calls, chances are you will need another guard before "-" too, because you might not even know for sure whether or not the object returned from "1 + 2" will be a Fixnum, so you might find that the inlined method suddenly is for the wrong class.

I'm planning on benchmarking inline caching for my compiler against vtables, but absent evidence to the contrary I'm expecting that there will be a very substantial number of cases where the complexity isn't worth it, or where they might even turn out to be slower.

> Something possible in AOT as well to certain extent, but it requires a mix of profile guided optimizations coupled with whole programm analysis.

It does if you want to do everything upfront, but you can pull things into inline caches with a mostly-AOT compiler relatively easily with just a little bit of extra information, and a few guards thrown in to do some basic tracing.

mieko · on April 8, 2014

I've implemented a handful of simple dynamic languages years ago, and something I was interested in trying, but never did, was taking advantage of the MMU to replace guard clauses.

For example, mapping a few pages for vtables/method dictionaries read only. When something like `def` or `define_method` comes along, catch the segfault (which in this case would actually mean "segmentation fault" instead of "I fucked up") and rewrite all JIT blocks or method caches that depend on that method table. Once everything has settled, generally after startup and the vtables tend to stay more stable, the overhead seems like it'd be negligible.

vidarh · on April 9, 2014

Catching the vtable updates and propagating them downwards is pretty simple, you "just" need every class to know which classes inherits from them. There's an implementation for dynamic runtime updates of dispatch tables for Oberon, of all languages (though that version sidesteps the "sparse vtables" issue by splitting the vtables into interfaces, and adding one extra level of indirection).

The tricky bit is if you have gone as far as inlining the method.

thedigitalengel · on April 8, 2014

(Disclaimer: I know nothing about Ruby, but I know some things about JIT compilers)

Another way to handle this is to assume that Fixnum#+ hasn't changed when compiling a method that is using it (maybe add a check at method entry); but when it does get redefined you "deoptimize" the methods that you compiled while holding that assumption.

vidarh · on April 9, 2014

That's what this referred to:

> or you need to record every call-site with inlined code and be prepared to overwrite it with fixups if the implementation changes.

pjmlp · on April 8, 2014

Interesting post. You have really spent some time looking into it.

Compilers was one of the main focus on my CS degree, so I am really into this type of discussions.

Good luck for the project.

vidarh · on April 9, 2014

Thanks.

It's really fascinating, and what fascinates me in particular with Ruby is exactly that once you start looking into it, there are new problems around every corner, and trying to make it as "ahead of time as possible" makes it even trickier - I absolutely agree with you that there are parts that are much easier to do if you JIT, though, and I'll have to go there anyway to handle "eval"..

pjmlp · on April 9, 2014

Yep, I keep on jumping between "love JIT" and "love AOT" in terms of implementations.

Currently my feeling is that most languages could benefit from both.

A JIT like environment for live coding and portable deployment.

And a AOT one for certain types of deployment where thin runtimes are desired.

jordanthoms · on April 8, 2014

Kind of depressing how far behind Ruby is from V8, Hotspot, CLR etc in terms of the sophistication of the GC, non-existent JIT etc. Still hoping someday someone will make the investment needed to catch up.

stiff · on April 8, 2014

I would like to read an actual in-depth comparison. While undeniably the quality of the Ruby implementation has a long way to go, Ruby has very "loose" semantics and I think some of the optimizations that are possible for Java, JavaScript or even for Smalltalk are simply not possible for Ruby because you have less assumptions to rely upon, everything has to be looked up at runtime basically.

Besides just nitty-gritty optimizations there are some high level design issues which are interesting. JVM performance is great for languages like Java, but for some reason probably related to the JVM all Java web servers seem to be multi-threaded monsters, which turns debugging into a mess and seems to make DoS attacks way easier. We use a self-hosted Jira on top of Tomcat at work, there are maybe 30 simultaneous users and every time one of them does something computationally expensive or uses some buggy functionality the whole process goes nuts, memory leaks until exhaustion, load goes to maximum etc., the whole web sever has to be restarted. With a web server with multiple worker processes you just restart the single worker and call it a day. Does anyone happen to know why the worker-based hosting model with separate worker processes seems to be completely absent from the Java world in favour of multi-threaded servers?

pjmlp · on April 8, 2014

I fail to see how Ruby is more dynamic than Smalltalk as I am not aware of any Ruby dynamic feature not present in Smalltalk as well.

Are you aware that Hotspot was born as a Smalltalk (StrongTalk actually) JIT compiler?

riffraff · on April 8, 2014

I seem to remember a discussions about this a decade ago, with smalltalkers saying that the only big difference is that programmatic generation of code is not so common in ST as it is in ruby (i.e. "attr_*" is possible in ST but doesn't actually exist).

Other than that I _think_ maybe ST has a fixed number of slots per class compared to dynamically added instance/class variables in ruby? My memory is fuzzy.

By the way, there _is_ a ruby built by smalltalk developers[0] exactly because the object model is 99% the same.

[0] http://maglev.github.io

stiff · on April 8, 2014

I don't know Smalltalk well enough to say for sure, it's just my rather vague impression that Smalltalk is more compiler-friendly, likely based mostly on what I've read over the years about this, for example looking at:

http://lambda-the-ultimate.org/node/2606

Ruby permits adding methods to individual objects; in Smalltalk, all methods reside in classes.

In Ruby, it is practical and somewhat useful to add methods dynamically; in Smalltalk, the practice is generally to treat the methods and classes as static.

Have also a look at:

http://www.hokstad.com/the-problem-with-compiling-ruby

I don't think all those 5 problems apply equally well to Smalltalk, do they?

dragonwriter · on April 8, 2014

> Ruby permits adding methods to individual objects; in Smalltalk, all methods reside in classes.

All Ruby methods reside in classes, too, "adding a method to an object" is really adding it to the object's singleton class (metaclass).

The difference between Ruby and Smalltalk is that in Smalltalk, each explicit class has an implicit metaclass (a class in which the class's instance methods are defined), whereas for Ruby each object (including, but not limited to, class objects -- and notably including other implicit metaclasses) has an implicit metaclass (often called the eigenclass).

vidarh · on April 9, 2014

And note that Ruby eigenclasses are in effect perfectly normal classes, with the exceptions that: 1) they dynamically get created the first time you define a method on it, 2) they are then "injected" into the inheritance chain, yet hidden from you when you try to follow the inheritance chain.

In effect, from an implementation point of view (at least for MRI), they're just normal classes except for when you define them, and a handful of insignificant cases where you have to check a flag to determine whether or not to consider them.

vidarh · on April 9, 2014

> http://www.hokstad.com/the-problem-with-compiling-ruby

I wrote that one in frustration when working on my Ruby compiler project, actually, and I think there are reasonable solutions to all of them, and in fact you will find that most of them, possibly all are solved to various extents in either Smalltalk implementations or Self implementations. In particular the Self papers is an invaluable resource for anyone who wants to implement languages as dynamic as Ruby, or Javascript...

The main problem is the effort. And implementations like MRI really has not tried very hard to address any of this - you'll note it first got a bytecode version (instead of AST walking) with 1.9, and still does not JIT. AFAIK they've not even tested simple optimizations like partial vtables to avoid the costly method dispatch etc.

A big part of the challenge is that Ruby is a remarkably hard language to implement, not just or even mainly because of the dynamic features - those make it hard to make it fast, not hard to implement - but because it's still largely unspecified, the parsing is chock full of annoying corner cases, and there are lots of dark, hairy areas of Ruby that almost nobody ever see. I love Ruby, but I think it badly needs a revision that deprecates a whole chunk of functionality that nobody really uses but that affects implementations, as well as writing a proper spec and then tighten up a whole lot of areas where users would be largely unaffected, but that would make it vastly easier to implement.

So a lot of effort that could have gone into making implementations faster, go into addressing annoying digressions.

nikbackm · on April 8, 2014

Seems to me they would apply to Javascript though, and that got quite fast with V8.

dragonwriter · on April 8, 2014

Being the one widely supported client-side language for the web means that lots more money has been poured into making JavaScript implementations effective than has gone into Ruby (I wouldn't be surprised if Google put more resources into V8 than have gone into all Ruby implementations combined.)

danneu · on April 8, 2014

For example, Lars Bak (virtual machine master) is the head programmer of V8.

http://en.wikipedia.org/wiki/Lars_Bak_(computer_programmer)

nikbackm · on April 8, 2014

True, but since V8 is open-source, shouldn't it now be possible for the Ruby developers to borrow the techniques from it that made Javascript so much faster?

vidarh · on April 9, 2014

None of the techniques that's gone into V8 are all that novel. Most of them - at least the ones that are relevant to Ruby - stems from research around Smalltalk and Self that's been well known for a very long time. The hard/time consuming part is implementing them.

dragonwriter · on April 8, 2014

Ruby isn't JavaScript. Studying V8's solutions to particular problems with making JavaScript efficient may help people seeking to make an efficient Ruby implementation, but how much is easily transferrable I don't know.

stiff · on April 8, 2014

No doubt V8 is much more mature than MRI, I think some language considerations do still apply though, JavaScript doesn't have method_missing, problem #5 doesn't apply, I guess there is more.

protomyth · on April 8, 2014

Kind of depressing how far it is behind all the Smalltalk VMs given the object model.

igouy · on April 8, 2014

Don't begrudge the Smalltalkers their crumb of comfort :-)

http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te...

IPGlider · on April 8, 2014

Like Rubinius? Or why not JRuby?

kaffeinecoma · on April 8, 2014

Because there's always something different that needs to be done for them once your project starts becoming non-trivial. You might need a different version of a gem (e.g. pure-java Nokogiri), or they're behind recent MRI features. And if you care about concurrency, it's different everywhere.

In my personal experience I've found MRI to be the best experience simply because that's what most other people are using, and there's a lot to be gained from being in the mainstream.

Don't get me wrong- I would actually love for JRuby to become the de-facto Ruby implementation. So many headaches are caused by native code in gems. And we'd have a solid foundation for GC, concurrency, etc. But that's not the current reality.

YorickPeterse · on April 8, 2014

Rubinius does not need Gem replacements like you need in JRuby, it still has a compatible CAPI.

> [...] or they're behind recent MRI features.

Part of this is due to MRI having literally no specification process at all. Python has the PEP system, no such thing exists in Ruby land. People tried to change this in the past (http://rubyspec.org/design/) but with little to no success so far. As a direct result of this there are only two ways to keep up to date with what changes in Ruby:

1. Follow every issue reported on bugs.ruby-lang.org, forever. 2. Wait until users report issues about something not being present, behaving differently, etc.

> And if you care about concurrency, it's different everywhere.

This is FUD. An implementation may offer different primitives for concurrency (e.g. Rubinius has Rubinius::Channel) but they also offer shared APIs. For example, the Thread class works across all implementations as you'd expect. Whether you use this on JRuby or Rubinius the end result is the same: proper concurrency.

Freaky · on April 8, 2014

For me it's always the triple-whammy of being a bit (or a lot) slower, needing vastly more memory, and having astronomical startup times. Not exactly an appealing combination.

nirvdrum · on April 8, 2014

If you're seeing a situation where you need vastly more memory with JRuby, please file it. While there is a higher base memory footprint due to the JVM being booted, I've found in practice it's much better on RAM than MRI. We saved a lot of money on EC2 by switching to JRuby because servicing a new request with a thread was a lot cheaper than forking a process running Rails.

ksec · on April 9, 2014

I think the problem with no Ruby JIT stern from the fact there are no Ruby design spec. We have implementation and no standard. Compared to Javascript and Lua, Python or Java.

Ruby may not ever catch up to the superfast LuaJIT or Javascripts V8 performance due to the nature of the languages itself. But that doesn't mean Ruby should be an order of magnitude slower then them.

adrianlmm · on April 8, 2014

And still, there is no RubyInstaller 2.1 for Windows =(.