Hacker News new | past | comments | ask | show | jobs | submit login
Optimizing Ruby lazy initialization in TruffleRuby with deoptimization (shopify.com)
150 points by kipply on April 11, 2020 | hide | past | favorite | 66 comments



Nice to see some TruffleRuby love.

Ruby could have been Swift, if MacRuby had not been killed (well at least it lives on as RubyMotion).


Ruby? It's Dylan that could have been Swift:

https://en.wikipedia.org/wiki/Dylan_(programming_language)


Indeed, but that was with the old Apple, hence why I left it out.


Ruby is inherently much too slow to have ever been a Swift. All of its dynamic runtime features give it a pretty low performance ceiling compared to a statically typed, compiled language with very limited runtime introspection/reflection.


I think it is important the parents was referring to MacRuby [1], which is a very different Ruby Implementation than current CRuby.

And yes, MacRuby ( not Ruby ) could very much been Swift.

[1] http://macruby.org


Common Lisp, Dylan kind of prove otherwise. Also RubyMotion isn't that slow either from what I can tell.

It is all a matter of how much money one wants to throw at the problem.

Also ironically, some of the high performance libraries in macOS/iOS are still being written in Objective-C, despite the goal to replace them with Swift.


RubyMotion is awesome but it should be noted that RubyMotion apps are statically compiled for each native platform. Also note that it has been rebranded to DragonRuby:

http://www.rubymotion.com/news/2019/03/01/the-sleeping-drago...


While it's still very slow, modern Javascript VMs are fast enough for many usages. Ruby is slow, but it could be a lot faster with a lot of time and money.


JavaScript VMs are quite fast, actually. They get a lot of attention.


But developing in JS is consideray slower than in Ruby, and time to market should be king.


On a mobile device, efficiency should be king... that's not exactly Ruby's forte

I'm skeptical of your first point as well


How does backend development reflect on mobile?


"Ruby could have been Swift"

Swift's raison d'etre is mobile development


> time to market should be king

Eh, not really.


V8 is a work of art <3


What do you like the most in the V8 internals?


It's not nice to pick a favourite child. Though something I heard about recently that made me happy (less about V8 internals) is how they try not to improve compilation speed by not compiling. There isn't a video about it yet but you can stalk https://2020.programming-conference.org/details/MoreVMs-2020... when it exists


BS. Ruby isn't slow. The same apps can be written in pretty much any language nowadays.

The problem is in developers. They are slow and dumb and lazy and most often also biased. Blaming language is easy. Bad developer will always write spaghetti no matter which language he uses.


It’s both. Ruby IS slow. If you ever need to create a tree structure that isn’t available in a natively-compiled Gem, you’ll know what I mean.


This is only true if your applications are mostly high level glue.


Which is what most "business" apps are anyway.


> Benchmarking isn’t a perfect measure of how effective this change is (benchmarking is arguably never perfect, but that’s a different conversation), as the results would be too noisy to observe in a large project.

So... does that not mean that any micro-improvement would also be too small to be significant on any large project?


This was my question as well. If the change can't be measured even with a benchmark, what is the goal?


The goal of the blogpost was less about talking about this optimization and more introducing TruffleRuby and making changes to programming language implementations at a high level understanding. The goal I had when I started working on the feature, was to introduce myself to TruffleRuby and was actually my first PR. It's not an impressive change, but it helps get into a lot of the important aspects of how TruffleRuby is engineered.


I kinda suspect it may be time to restate the problem and revisit it.

Microbenchmarking has nearly the entire resources of the machine available to itself. In live scenario, an arbitrary and unknowable mix of other code will be sharing those resources. We assert that we cannot model the negative - or positive - effects of that other code on the behavior of the code being benchmarked.

Except us workaday programmers do that in practice all the time. We hear about a performance improvement, we estimate the likelihood of a payoff, then we try it, first in isolation and then in our code. Sometimes the change is bad. Sometimes the change is boring. Occasionally, the change is surprising.[1]

Why couldn't we come up with test fixtures that run some common workloads and stick the code being benchmarked into the middle? You'd have 3 data sets instead of two in that case, control, A, and B. You will want to know how (A - control) relates to (B - control) as a fraction.

1. Biggest perf surprise I ever got (aside from the time I predicted the benefit of 4 optimizations in a row with a margin of error of <3%), I got a 10x when I expected 2x. We had two methods that got the same data from the database, filtered it, and compared the results. I moved the lookup to the caller and made them both pure functions (which helped with testing them). I don't remember chaining the calls, but even if I did, both methods were roughly the same cost, so that should have been 3-4x. The rest I suspect was memory and CPU pressure.


This particular one is. Maybe we could measure time-to-warmup or hot performance and with extremely careful scientific process we can say X application has sped up by Y percent. It also would not be helpful, because we know that it's a micro-improvement. Having data on the results of this improvement in the context of compilation speed and machine code output is useful, since it tells us that it's not a micro-un-improvement and because that was the intermediary goal in this optimization.


Hopefully with Shopify's help we are closer to running Rails on Truffle.


I wonder if ruby could eventually officially migrate to Graalvm


that is unlikely, matz mentioned a few times how adopting something developed elsewhere as a core vm was something he didn't want to do.

IIRc the reasoning goes along the lines of not wanting the risk of something being discontinued by a third party and having to shoulder the extra labor of supporting something with which the devs are not familiar.


I remain skeptical of JRuby. As of about 2-3 years ago, there were many benchmarks touting it's performance, but in our large publicly deployed web app, it always underperformed and switching to MRI on Phusion more than doubled our performance on half the servers. We also spent like a developer-year on issues that accompanied being off the beaten path with JRuby, like various nokogiri complications.

I even contacted the devs at one point with a benchmark that ran jruby directly on strings for several minutes. The answer I got was always that it would eventually perform better than MRI with more warmup. Spoiler: it didn't. Well, maybe I was supposed to run it for another month and it would have.


This article is about TruffleRuby, not jruby.


Sorry to be a bit overbearing here~

It's worth thinking about JRuby-performance-skeptical-ness when thinking about TruffleRuby due to some similarities. Not going to outline in a comment, but suffice it to say that TruffleRuby was once called JRuby+Truffle.

TruffleRuby has the power to work through JRuby shortcomings. With GraalVM, TruffleRuby gets to have C extensions! Graal is also open source, so we get more control of what we want it to do and more understanding of what it does. In theory, that gives TruffleRuby more room to get faster.

It's also worth noting Charles Nutter's comment on the post, that mentions that Hotspot C2 already does this optimization! (though they wouldn't have this kind of control over branch profiling)


The main reason that JRuby and truffle Ruby parted ways was due to their desire to support C extensions. We went down that road many years ago and decided that the performance characteristics of the MRI extension API inside a managed VM like the JVM just would not scale, and most extensions are not thread safe or memory safe. In our estimation, C extensions are the biggest thing holding Ruby back. We wish the Truffle folks the best of luck in their efforts to fix their C extension performance and threading problems but they just aren't a fit for JRuby.


IIRC TruffleRuby is implemented very differently than JRuby and has a lot more potential. So I'm not sure any comparisons can really be made.


source: https://chrisseaton.com/truffleruby/

> TruffleRuby started as my internship project at Oracle Labs in early 2013. It is an implementation of the Ruby programming language on the JVM, using the Graal dynamic compiler and the Truffle AST interpreter framework. TruffleRuby can achieve peak performance well beyond that possible in JRuby at the same time as being a significantly simpler system. In early 2014 it was open sourced and integrated into JRuby for incubation, then in 2017 it became its own project, and now it is part of GraalVM. Since 2019 Shopify has sponsored development.

> In early 2014 it was open sourced and integrated into JRuby for incubation

Additionally, we also tried Truffle when it was JRuby, or in JRuby, however you want to put it, but without a lot of success. Things could be different now.


I interpreted that sentence as meaning it was integrated into the JRuby project as a "home" for the project, similar to being under the management of a particular github org. Not that its implementation was in any way changed or integrated into JRuby.


Your interpretation is understandable, but incorrect.

https://www.jruby.org/2016/05/03/jruby-9-1-0-0.html


Concretely, TruffleRuby never shared much in terms of the core language implementation with JRuby (essentially the parser and encoding stuff, never core methods/operators/etc).


If you're willing to try again, I'd love to see that benchmark. JRuby almost always beats MRI on computation heavy benchmarks oh, and we have continued to get faster over the years. I would be very surprised if a string heavy benchmark wasn't faster on JRuby without too much warm up.


Haven't tested this but I feel like JRuby will really exceed only if the solution to your problem benefits from hardcore multi-threaded computations, for not having a GIL in this case diminishes the added JVM overhead.

Also, I'm a sucker for Java but JVM-based things deployability just rocks.


JRuby is also usually faster (sometimes much faster) on straight-line CPU-heavy operations, but you're right... real parallelism is one of the biggest selling points.


Sounds like every other believer of the JVM. I’ve been reading about how Java outperforms C for 20+ years now. When it doesn’t even come close, someone faithful of the JVM insists that I try some new garbage collector or wait for the next release.


Funny, I managed to spent a similar time in this industry and cannot remember any claim of Java being faster than C.

That's not to say it doesn't exist: too many monkeys with internet-connected typewriters will eventually make every claim. But Google turns up a paltry 652 results for "Java faster than C"[0][1].

[0]: https://www.google.com/search?client=safari&rls=en&q=%22java...

[1]: On a related note, I found one of those Google searches with a single hit. Isn't there a name for that? It's "PHP faster than C"

Edit: I'm on a roll here. "Google faster than C" also gets one result: https://www.google.com/search?client=safari&rls=en&q=%22goog...


Java running on a good VM such as HotSpot can outperform C in certain cases.


(btw I am the author of this post, thread questions if wanted. Trying to avoid draining myself addressing all the comments. Take care in these strange times <3)


So what's Shopify's goal in supporting TruffleRuby ? isn't Ruby 3x3 promise good enough? Is Shopify looking to optimise memory footprint or speed or both?


I think "good enough" is not a perfect mentality for what we want from our services. Ruby 3x3 is great (3 times faster is insane!), and we are looking at many things to optimise Ruby. TruffleRuby does aim to get more than 3x faster (already does on optcarrot) and could also provide some insight to be applied elsewhere. Other reasons include TruffleRuby features such as Chrome dev tools debugging (I'm not endorsing it as superior or inferior to any existing Ruby debugging tools) and polyglot programming. All in all, the goal is most accurately described as "research".


As a general comment about production applications, memory is less important than speed since computers are cheap compared to having poorer user experience. Also, with JITs, consuming less memory isn't that realistic. You can find "complaints" about Pypy and Java and consuming a lot of memory.


> All in all, the goal is most accurately described as "research"

So what is a successful outcome you're hoping for with this project? Or is it purely academic?


Not really sure there is a good answer for this. A past successful outcome was that we could actually get TruffleRuby to run an entire production application without code-changes. I suppose a future one could be that it's easy, fast and good to develop and ship an application running on TruffleRuby.


How close to that goal is TruffleRuby right now?


oof that's the big question isn't it. Would take a lot to describe, as well as a press-check since all my information / context about this is specific to Shopify applications


Fair enough :)


>'TruffleRuby has high potential in speed, as it is nine times faster than CRuby on optcarrot, a NES emulator benchmark developed by the Ruby Core Team.'

Really? But now the latest results are in apparently and judging by these benchmarks[0], I don't know what to make of this claim given how slow TruffleRuby is when compared to other similar languages or even Ruby. But its clear that its LLVM-based cousin Crystal still blows it out of the ocean in every benchmark if you're really talking about 'high performance'...

[0] https://github.com/kostya/benchmarks


There are four benchmarks in there that compare CRuby and TruffleRuby and two of them are slower. TruffleRuby is 2x faster on bench.b (brainfuck implementation), 3x faster on matmul, 3x slower on base64 and 7x slower on json. All four of these are pretty specific benchmarks. Optcarrot has code that is much more representative of what a Ruby program might look like.


In addition, it's interesting to see that TruffleRuby does better than CRuby on the longer running benchmarks, and worse on the shorter ones. The page claims that JIT warmup "is applied when necessary", but there are no further details. If I had to bet, I would guess that in the case of TruffleRuby maybe not enough warmup is being done in these experiments.


Seems there isn't any warmup accounted for: https://github.com/kostya/benchmarks/issues/246


That sounds likely, but if it's not warmed up in some amount of time that a decent programmer believes to be enough, that probably means that it's warming up too slowly! With that being said, it could also mean that the faster ones could be even faster.

I totally expect base64 to perform like this though. It spends like one line in Ruby and then it's shipped off into the C extension, so really it's testing the differences between Sulong (LLVM bitcode interpreter for Graal) and a proper C compiler.


The optcarrot benchmark is pretty far from being typical Ruby code. It simulates several chips inside the Nintendo entertainment system and essentially interprets the instructions those chips would execute. In actuality, real world benchmarks like web applications perform best on jruby,


Crystal is not Ruby.


Not technically but it can run a lot of Ruby code with no changes.


That is a very, _very_ dubious claim indeed considering that even stuff as trivial as `attr_reader`, `attr_writer` and `attr_accessor` go by a different name in Crystal.


Although I didn't say it was, you're actually half right since its an inspiration of Ruby even when some code compiles 1:1 with Crystal; the same inspiration can be also said for Elixir.

But the benchmarks here don't lie and one can still say that in syntax similarity, Crystal is a 'faster and memory efficient Ruby with static types'.


> one can still say that in syntax similarity, Crystal is a 'faster and memory efficient Ruby with static types'

Actually they don't even share the same syntax for declaring getters and setters. Honestly speaking I think people really only think of Crystal as "faster Ruby with static types" because they've mostly been exposed to C-style languages so anything that doesn't look like C just blurs together as "the same".

(Obviously Crystal is heavily inspired by Ruby, but there's a huge difference between "heavily inspired by" and "an implementation of".)


Why is there is no TruffleRuby Native in that page? (i.e., the default)


FWIW, I ran bench.b locally: MRI: 87.83s TruffleRuby Native 20.0: 18.82s truffleruby-dev Native: 14.09s




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: