> direct threaded (using computed gotos) I've seen different people mean differe...

tenderlove · on March 9, 2023

> I've seen different people mean different things by this, do you mean the IR is a list of bytecode handler addresses, and then the end of every handler is a load+indirect jump? Or is there also a dispatch table? In my experience the duplication of the dispatch sequence (i.e. no dispatch "loop") is worth 10-40% and then eliminating the dispatch table on top of that a bit more.

It's the former. Each bytecode is the handler address and every handler does a load + jump. There's no dispatch table (though there are compilation options that allow you to use a dispatch table, but I doubt anybody does that since you'd have to specifically opt in when you compile Ruby).

> Sure, the question is always about the dynamic frequency of such call sites. What kind of ICs does YARV use? Are monomorphic calls inlined?

In one of our production applications, the most popular inline cache sees over 300 different classes and ~600 shapes (this is only for instance variable reads, I haven't measured method calls yet but suspect it's similar).

The VM only has a monomorphic cache (YJIT generates polymorphic caches), and neither the VM nor the JIT do inlining right now.

titzer · on March 9, 2023

Thanks for the replies. I could keep picking your brain, but maybe it's more efficient for me to read some documentation. Are there some design docs or FAQs or summaries of the execution strategies that you can point me to? Thanks.

ignoramous · on March 9, 2023

> In my experience that can be a 2x-4x performance win.

What's the state-of-art in reg allocation? I see that the Android Runtime makes use of SSAs to allocate registers in linear-time [0]. Are other language runtimes pushing the boundaries further and in different ways?

[0] https://www.arxiv-vanity.com/papers/2011.05608/