Here's the summary: Ruby like most of all other Linux applications from that age was written to run in multi-process mode for parallelism, instead of being multi threaded.
But running multi-process is heavy on the JVM as it has to load the virtual machine and JIT for every process start, special when the processes are short lived.
Java solves it by encouraging running things as multi-threaded instead of multi-processed.
But Ruby wasn't built to run multi-threaded. And CRuby even if can run threads, does it by implementing a global thread lock. Aka, only one thread can run in a process at a time. The other threads have to wait until it finishes.
It looks like the multi-process implementation is good enough for CRuby and JRuby's attempt to turn it into multi threaded application didn't improve things.
Solution is to build a JVM that can load the virtual machine, JIT and execute applications fast enough like C programs.
> Ruby like most of all other Linux applications from that age was written to run in multi-process mode for parallelism, instead of being multi threaded.
That’s...not really true. Ruby wasn’t really written with parallelism in mind at all, because it mostly ran on machines that couldn’t run really parallel processes. MRI threads were originally green threads, which allow a high degree of concurrency without parallelism, with less overhead than native threads.
When old MRI was replaced with YARV in 1.9 (which became the new MRI), it got native threads with a global VM lock (GVL, similar idea to Python’s GIL) which allowed running thread-safe native code with real parallelism but only having one thread running Ruby code at a time. This made Ruby thread-based concurrency somewhat more expensive, but some parallelism possible in Ruby (as native code can and some basic common processes like waiting on I/O do release the GVL.)
And Ruby 3.0 introduces a new parallelism model with Ractors (basically inspired by the Actor model), which are logically above the thread level (each contains its own set of nonshared threads) and below the process, don’t share mutable state within the VM, and each have their own VM lock, allowing a higher degree of Ruby parallelism without going multiprocess.
> Here's the summary: [...] But running multi-process is heavy on the JVM as it has to load the virtual machine and JIT for every process start, special when the processes are short lived.
This is a valid summary of some facts, but not of the article or facts relevant to the article. Some quotes:
"Jekyll is not forking processes, so that is not the issue."
"The area where JRuby and TruffleRuby shine are long running processes that have had time to warm up. Based on suggestions I put together a repo of a simple small Jekyll build being built 20 times by the same process in a repo here. After 20 builds with the same running process the build times do start to converge, but even after that MRI Ruby is still fastest."
> * Real-World projects like Jekyll involve a lot more code, and JITing that code has a high start-up cost.
> * Real-world code like Jekyll or Rails is optimized for MRI Ruby, and many of those optimizations don’t help or actively hinder the JVM.
The title seems a bit provocative, though I guess if you were reading it from within the Jekyll community, it makes sense without further disclaimers, but otherwise the article seems fairly even-handed.
The threading stuff just seems like a special case of his second point.
What makes GraalVM interesting is that it builds upon the research done in JikesRVM and MaximeVM, alongside a free beer offering.
Commercial versions of "a JVM that can load the virtual machine, JIT and execute applications fast enough like C programs" have been available since around 2000, like Excelsior JET or WebSphere Real Time JVM among others.
The JIT cache used in recent versions of Hotspot started as part of JRockit JVM, also commercial only product.
The way Android solved this problem (before they implemented AOT) was to fork the JVM post-load (and I think even JIT) of all the core Java and Android classes; the real somewhat-fundamental problem with the JVM in a multi-process system isn't really having to "load the virtual machine and JIT for every process start" but the multiple, almost-certainly-uncoordinated garbage collectors.
They still use zygote, however as expected, cheap Android devices aren't the best hardware to run an AOT compiler, so with Android 7 they introduced a multitier execution, with interpreter hand written in Assembly, JIT compiler and AOT that take advantage of PGO data.
The AOT compiler only runs when the device is idle.
Starting with Android 10, they introduced a mechanism to upload PGO data into the store, so that when an APK is installed, if such data is already available the JIT/AOT don't have to relearn everything from scratch regarding the application.
IBM J9 has something called the JIT server. My understanding that JIT is done is one process and then provided as a service to clients (other processes)
Never used it though, so no idea how it works in practice.
There's once I was running some jruby stuff in jenkins during a build, the job kept on hanging on some stage, I thought there must be bug somewhere, I forced kill it a couple of times with no success, but kept the last one running before I head home. Then after a couple of hours, I found out an email said the built was passed... eventually, I had figured out that jruby was using /dev/random, since jenkins was running in vm, so no enough entropy was generated. after force mounting /dev/random to urandom, the hanging issue just disappeared.
Now that this has gotten attention from the creator of both JRuby and TruffleRuby. I would not be surprised if Jekyll runs ridiculously fast on both implementation in the near future.
I remember about 10 years ago the promise was that JRuby was going to let ruby be basically as fast as Java for many things. Invokedynamic and all that.
It can be with work. This is basically true of all languages. I remember telling my team that I wanted them to use Java instead of python for some new service we were building because I wanted it to scale better. They were not happy about it, and they coded up two versions, one in python and one in Java. I was shocked by the results, they performed roughly equally in terms of latency and scaling. When I dug into it, the very abstract Java library that gave them the power of python ran about as fast as python. They could get rid of that library which made them far more productive, but then the project took longer and cost more.
In fairness, the Java version had an easier path for optimization, but there are no free lunches.
> When I dug into it, the very abstract Java library that gave them the power of python ran about as fast as python. They could get rid of that library which made them far more productive, but then the project took longer and cost more.
Sounds like they were determined to write Python on Java. Doing it that way likely has a lot of performance costs. However, you can’t assume that idiomatic Java code would take that much longer than Python code for a team that was familiar with Java. Likely it comes down to which languages and frameworks a team is familiar with.
I am curious what library was this and what was it doing.
We used spring expression language for dynamic evaluation of (user-defined) expressions in our code and for most cases we could compile and cache the expressions after first usage and invoking them was really fast and close to pure java expressions.
We also had some JVM-python interop which we eventually got rid of (in favor of kotlin) because we were unable to optimize it after a month of effort and it continued to be the biggest bottleneck in the system.
So I am not entirely convinced that there could be real-world usage scenarios that inherently demands so much runtime dynamism that most benefits of JVM optimizations are nullified.
Of course, I'd love to be enlightened otherwise, but rather happy with JVM as of now.
There is nothing at the JVM level that would disallow such dynamism. Clojure, JRuby, JPython all can run on the JVM.
Also, if you are looking for interop, then GraalVM might be worth a look — not the better-known AOT part, but the runtime one, which can seamlessly do interop between a number of languages, and it even optimizes between them!
Yes, being possible and being performant are two very different things.
What I intended to convey in my previous comment was that using strategies like pre-compilation (eg. Spring EL) it is possible to get good performance even for dynamic logic not known at runtime.
So I was curious what was so dynamic about this use case that JVM performance drops down to pythonesque level.
I don't want to speculate - maybe there is something that JVM is unable to optimize; maybe it is something weird happening in the library; or maybe python has gotten really better in recent past or this use case was able to benefit from some python lib with native bindings.
> I remember about 10 years ago the promise was that JRuby was going to let ruby be basically as fast as Java
I’m pretty sure it was more than 10 years ago that Charles Nutter wrote a detailed description of why that wasn’t going to happen without breaking compatibility with Ruby, identifying the specific language features preventing that.
I was surprised at the level of analysis and optimization here. Just running flame graph and saying "I think IO is slow" isn't going to cut it. Let's break out a profiler and dig in to the output. And also try some Java tools like your kit.
Exactly, this is not jruby being slow but a particular ruby program being slow on jruby. There is a reason for that and it is extremely likely to be a fixable problem. This is code that was never optimized for the JVM so there is probably all sorts of stuff happening that makes a lot of sense on MRI that is maybe a bit sub optimal on the JVM.
One thing that comes to mind is that a lot of performance critical stuff in ruby is implemented via native libraries. The Jruby ecosystem has alternate implementations for a lot of that stuff. But it is probably also able to interface with native code directly. That sounds like that might be a little bit of a bottleneck potentially. And any alternate java based replacements for whatever is being called might have its own issues/bugs/etc.
But instead of hypothesizing what the problem might be (and getting that wrong repeatedly), profiling tends to be much more effective indeed. I've done this a couple of times to diagnose performance issues and it rarely is anything you'd expect. Once you know where it is spending its time, you can usually mitigate the issues. Use a profiler, add some logging, instrument the jvm, etc. There are lots of ways to do this. Even just knowing how often it starts a new process would be good to know. It's apparently more than once because otherwise you'd expect --dev to not speed things up like it did.
IMHO interest in ruby (irrespective of implementation) itself is declining.
A lot of people have realised that dynamic typing hinders maintenance of long lived projects and the tooling and dev experience with type safe languages have also gotten much better over last few years.
Despite having worked with Ruby for multiple years, I pick Kotlin/C# for new projects.
I know ruby has recently introduced support for typing, but until the wider ecosystem embraces type-safety it is gonna be an uphill battle to write type-safe code in ruby.
So there was no easy way to get a profile from the JVM version, and data came only from looking at the CRuby profiles where file access dominates. Then an informed guess: slower JVM file access.
This process and lack of data in itself sounds like a receipe for performance problems in the JVM version, unless there was some low probabvility coincidence that prevented getting the profile data only in this case. Good measurements are a prerequisite for sustainable and maintainable performance work.
There was (is?) a time, where JRuby couldnt pass the ruby spec because of their IO abstraction written in Java. They changed that code to be written in ruby instead, with FFI to call out to libc, which made it more correct but slower, afaik.
My biggest problem with JRuby really wasn’t server performance or multithread - it was start time. If you used JRuby for something like scripting then you feel it way more. Try using it for a map reduce process on large data workloads in Hadoop and you see the startup time of JRuby materially.
Q: Why would one do that? A: Ability to bundle your code to hadoop machines you don’t control
Around 2010 I really enjoyed working with JRuby. At that time dynamic languages weren't really a thing but I was able to convince my boss to try it in a project because of Java library compatibility. So I used it for a Swing GUI application doing data processing combined with R, Postgres and Processing. It was really fun to write it. (Apart from that I also used in on Google App Engine) I never had the impression it was slower or less responsive than any other language, however startup times were quite slow. But I think that was due to the JVM. The way Ruby developed - into the language of Rails - I don't see myself writing anything CPU bound with it though.
That said, I wish the article would just include numbers without the startup times. I also remember people claiming back at that time JRuby would be much faster than MRI.
Seems like there's obviously a big I/O performance difference. Maybe it's something simple like buffering being setup differently or not at all or sync vs async. It'd be interesting to dig in more
Even something like mmap can drastically improve performance since it lets the kernel handle I/O asynchronously from your program execution (so your code doesn't block as much or as easily on I/O)
We don't know that it's really I/O, as in "pushing some bytes to the system". All we know is that the author saw a hot method called "write" and stopped the analysis there. It might well be something like messing around with character encodings to get those bytes in the first place.
Thinking about this more, we know that the author saw a hot "write" method in the profile for the fast run and doesn't have a profile for the slow run. The slow versions could be spending most of their time in a completely different place.
If you want fast Ruby it seems Crystal is your only option. Ruby is an interpreted language released in 1996 for scripting. Why are we surprised when attempts to shoe-horn it into something else (Ruby 3.0, JRuby, Truffle Ruby) fall flat?
Ruby's basically as fast as Node.js. If you look at comparative benchmarks, Roda/Rack/Puma is within spitting distance of Koa, Fastify, etc. Express is actually slower in some benchmarks.
I don't think a lot of people are aware of just how much faster the Ruby ecosystem has gotten in the past few years (especially when you leave Rails out of the equation which is known not to do well in microbenchmarks).
I just heard Matz saying yesterday additional effort will be made to make Ruby run faster specifically on benchmarks. He said while they don't always correlate completely to real world performance, since developers seem to care a lot about benchmarks he wants Ruby devs to feel good about themselves and score well on these.
> If you want fast Ruby it seems Crystal is your only option.
Crystal may be fast, but its definitely bot Ruby. Choices for fast not-Ruby are not lacking.
> Why are we surprised when attempts to shoe-horn it into something else (Ruby 3.0, JRuby, Truffle Ruby) fall flat?
Weird that you don’t put Ruby 1.9+ on that list, though that is as much or more a switch from what immediately preceded it with parallelism as an improvement area as 3.0 is (sure, Ractors are a bigger language change, but going from green threads to native threads with a VM lock was a major implementation change.) The difference is that 3.0’s relevant improvements are still experimental and its easier to misrepresent “haven’t yet stabilized and seen wide production use” as “fell flat” than it would be to claim thr same thing about 1.9’s improvements. But its not true in either case.
Could you not make the same argument about JavaScript? JavaScript was pretty slow for many years. Google then put some significant engineering investment into the V8 engine, resulting in a huge increase in JavaScript's performance.
I think JavaScript performance received very significant investments from the biggest players (Google, Apple, Microsoft, etc.) thanks largely to the fact that you can’t throw more compute at it — JavaScript has to run on crappy user devices developers can’t upgrade. Ruby (or Python) never received anywhere near the JavaScript level of investment in performance. Of course there are fundamental design limitations too.
With GraalVM around, niche languages can pretty much take advantage of the significant investment into the JVM. Its whitepaper is truly novel, but the gist of it is that one has to implement an AST-based interpreter for a dynamic language, and then it will make use of state-of-the art JVM GCs, JIT, etc.
TruffleRuby is the fastest Ruby implementation around, GraalJS can after warmup match the performance of V8, so it is indeed a very interesting technology.
But it will take a drastic / complete rewrite of the VM, as it happened to Chrome's and Firefox's JavaScript VMs.
This means many man-years, and a certain break in the continuity, maybe with small but noticeable deviation in the VM's behavior. If we squint just so, we can consider Ruby 3.0 to be such a rewrite.
Crystal is a similar rewrite effort, but a one that isn't trying to stay backwards-compatible.
For those of us with many man years of investment into existing Ruby systems, any further investment into speeding up the language's runtime and general evolution is welcome.
Since when did JRuby or Truffle Ruby fall flat? The article is about a given usecase/lib, that as others mentioned uses multiple processes, which doesn’t really favor a JIT runtime.
> Ruby is an interpreted language released in 1996 for scripting. Why are we surprised when attempts to shoe-horn it into something else (Ruby 3.0, JRuby, Truffle Ruby) fall flat?
Because Lua (1993) and JavaScript (1995) can be very fast.
And javascript's speed seems mostly a result of throwing amazing amounts of smart engineers at it over time, rather than anything inherent in the language itself.
But running multi-process is heavy on the JVM as it has to load the virtual machine and JIT for every process start, special when the processes are short lived.
Java solves it by encouraging running things as multi-threaded instead of multi-processed.
But Ruby wasn't built to run multi-threaded. And CRuby even if can run threads, does it by implementing a global thread lock. Aka, only one thread can run in a process at a time. The other threads have to wait until it finishes.
It looks like the multi-process implementation is good enough for CRuby and JRuby's attempt to turn it into multi threaded application didn't improve things.
Solution is to build a JVM that can load the virtual machine, JIT and execute applications fast enough like C programs.