Yes, this was already becoming true around the time I was writing the linked art...

Yes, this was already becoming true around the time I was writing the linked article. And I also read the paper. :-) I also remember I had access to a pre-Haswell era Intel CPUs vs something a bit more recent, and could see that the more complicated dispatcher no longer made as much sense.

Conclusion: the rise of popular interpreter-based languages lead to CPUs with smarter branch predictors.

What's interesting is that a token threaded interpreter dominated my benchmark (https://github.com/vkazanov/bytecode-interpreters-post/blob/...).

This trick is meant to simplify dispatching logic and also spread branches in the code a bit.