> For instance, the JavaScript world has switched from tracing JITs to method-at-a-time JITs, due to the compelling performance benefits.
This is a weird way of putting it. Method-at-a-time JITs have been around much longer and represent a more traditional approach to JIT compilers. Tracing JITs have only become popular in the last 5-10 years. And during that time they've been seen as the sort of hot new thing, so much that there is a classic LtU thread from 2010 titled "Have tracing JIT compilers won?" (http://lambda-the-ultimate.org/node/3851)
While it's true that V8 has always been method-at-a-time and Mozilla has abandoned their tracing JIT TraceMonkey, LuaJIT is one of the fastest dynamic language implementations out there and is a tracing JIT. Unfortunately the benchmark game dropped LuaJIT so it's not easy to find benchmarks, but last I saw LuaJIT was pretty dominant speed-wise among dynamic language JIT implementations.
Mike Pall (LuaJIT author) argues that TraceMonkey's lack of compelling performance was more a result of trying to bolt tracing onto an existing VM as opposed to any shortcoming of tracing as an approach (http://lambda-the-ultimate.org/node/3851#comment-57643).
Dalvik (the Android VM), when they added a JIT, claimed to be "starting" with a tracing JIT which they would "supplement" with a method later, which they seemed to feel could provide more performance. Their argument is that the method JIT operates over a larger code window; but Dalvik only seemed to operate on extremely short traces, so it has always been unclear to me whether this was just a weird limitation of their specific design :/.
I'd love to hear more about their issues with PyPy, it sounds like they wrote of PyPy simply because they don't understand why it works so well. Not to mention that this is mostly a re-hash of stuff found in unladen-swallow.
I mean, if your end goal is to write another Python, sure go for it. But it really sounds like these people haven't done their research. I see nothing to write home about.
--- EDIT ---
Not to mention that JS is a completely different language form Python. Everytime you add two objects in Python you have the possibility of hitting a system defined add, or __add__ or __getattr__ or __getattribute__, or __radd__, or __getattr__ (looking for __radd__), etc. That'll be fun....
Their description does not suggest they don't understand how PyPy works, but rather they don't think they can tackle the yet unsolved failure modes (blowup, etc) of trace compilation.
They approach they're describing is one that already works in V8, JScore, and IonMonkey, which is to mix type prediction, type analysis, and runtime handling of unexpected cases. Basically, you use type feedback information to get an initial set of types for a method, use type inference techniques to squeeze out type checks, and then compile the method in a way that handles the expected types in a fast path and traps into a slow path as necessary.
None of V8, JavaScriptCore, SpiderMonkey do allocation removal, which is the single most important optimization PyPy and LuaJIT do, which also goes back to Psyco. I think it is unknown how to do this well in method JITs.
This is not correct. V8 does sink allocations into deoptimization exits. It does not sink allocations out of the loops at the moment though.
> I think it is unknown how to do this well in method JITs
I don't think it is unknown. The main simplification for tracing JITs comes from the fact that deoptimization and loop-exit can be elegantly treated within the uniform framework, which is a little bit harder for method JIT and you need to find right place to insert materialization instruction after the loop based on post-domination. Nothing hard or unsolvable though.
Yet V8, etc, perform pretty well. Is there a reason to believe allocation removal is particularly more important for Python than JS? It might be, I haven't thought about it, but at first blush that doesn't seem to be the case.
It seems like there are a couple topics that come up when talking to people about switching to PyPy:
- performance or memory usage on larger programs
- C extension module support
I'm not going to promise that we will do a better job at these, but there are technical reasons to think that it's possible.
You're definitely right, Python has a lot of user-customizability that can make it harder to execute efficiently than JavaScript (though thankfully it has less than Ruby). Both PyPy and Pyston have their techniques for cutting through the complicated+expensive slow case and trying to predict and execute a fast path.
Just to be perfectly clear: it's 100% possible to write an efficient Ruby implementation, and if you have the right technical infrastructure, it's no more difficult than Python, or Javascript for that matter.
"Everytime you add two objects in Python you have the possibility of hitting a system defined add, or __add__ or __getattr__ or __getattribute__, or __radd__, or __getattr__ (looking for __radd__), etc"
That's pretty much a solved problem. Lookup the method once, cache it for next time, deoptimize if you're wrong or the definition of the class changes. If it keeps changing build up a larger cache.
Every time you add two objects in JS you have the possibility of hitting toString at least; every time you access a property it might be a getter or setter, or a Harmony proxy object. Python is more liberal, but in both cases you have to make guesses about object shapes (and objects not having those properties) to have any hope of performance.
I can't speak to the non-technical reasons mentioned in the post, but there's good reason to believe that LLVM is in a very different state than the time that Reid wrote this, particularly wrt JIT support: the JIT engine has been completely replaced. Both of the things that he mentions (lack of back-patching, lack of gdb support) have been added to LLVM mainline.
Definitely true; there are some overheads that could be low-hanging fruit to optimize, but the real solution is most likely to use a tiered-compilation system. ie only invoke the full LLVM code generator once a function has been called 10,000 times. It's definitely an open question as to how to get faster compilation times out of it; we have a simple LLVM interpreter but since LLVM isn't designed for interpretation it's pretty slow.
I think you can look at the people who are interested in adding LLVM tiers to their JITs (ex Apple, Facebook) to see that we're not the only ones that think there can be a place for "good but expensive" code generation in a JIT.
"(...). As for Unladen Swallow, there are some reasons to think that LLVM has matured greatly in the past few years, particularly in the JIT engine which has been completely replaced. I'm not sure if that's the only part to the story; I'd be interested in talking with any of the people who were involved or knowledgeable about the project."
It's my impression that Unladen Swallow failed because of insufficient manpower, not because of the general approach. It's true that LLVM is missing some features for a JIT like this (mainly OSR and back-patching), but those can be added. For example, there was a discussion a few months ago on llvm-dev about adding patching support [1] (it was originally meant for garbage collectors, but would also work for JITs).
It is my impression that LLVM has matured quit a bit since unladen swallow. Not least in the JIT domain. A lot of unladen swallows development was spent on improving LLVM and not on implementing Python as fast a possible. I seem to recall that the LLVM developers have had discussions on how to better support JIT compilers in LLVM, since unladen swallow, some of it precisely because unladen swallow failed.
Guido works at Dropbox. I am sure he is advising - and I bet, he has seen pretty much all the previous attempts closely to let them make any obvious mistakes.
> Guido works at Dropbox. I am sure he is advising
Guido worked at Google during Unladen Swallow, he was not involved. From the comments of Pyston's announcements, here's the extent of GvR's involvement:
> Guido's advice has been extremely helpful, but so far we haven't been able to get any code from him :/
Right now they have no baseline compiler but will interpret (un-optimised?) LLVM IR at first, second tier is unoptimised LLVM compilation, then LLVM compilation with type recording hooks and finally a fully optimised compile. Given the history of the Unladed Swallow project and others using the LLVM JIT, they're likely to find they have a lot of work on their hands, particularly as PyPy is really rather good these days.
As always, if you're interested in LLVM or compiler stuff you should subscribe to http://llvmweekly.org (disclaimer: I write it) and follow @llvmweekly
It's mostly a question of manpower and the fact that Dropbox is still on Python 2.7 internally. If we can get more people on the team, we'd love to add support for Python 3.
AFAIK, a large chunk of Twisted is already working on Python 3. I don't know how long they have left to get 100% of it, but it's made substantial progress.
From what I gather, one of the things holding mercurial back is support for python 2.4, although the support of %-formatting for bytestrings will move the port of mercurial forward.
What does Python 3 offers in terms of : performance, or
improved libraries, safety, other major improvements to justify spending time switching production code to it, breaking library dependencies (could be transitive as well).
I don't see a company with a large Python code base justifying switching to Python 3. Yeah personal toy Github projects, sure, Python 3 is nice, but it is simply not enough rewards justifying all the downsides of switching for many projects.
Looking back, how Python 3 was handled was a mistake. There should not have been a Python 3 when it happened. It should have happened a lot earlier. But if it was going to happen, it should have offered some drastic benefits -- GIL is gone, LLVM JIT 30x numerical code crunching improvements, integration with PyPy, ... I don't know, awesome new built-in libraries like "requests", Flask integrated in. Things like that.
What do we have instead?, unicode improvements, generator code improvements, a Twisted-like async library (don't get me started on that). Iterator cleanups around dicts... That is just not enough, sorry.
Between Dart at Google, Hack at Facebook, and now Pyston at Dropbox, it's really neat to me that there is a resurgence in interest in language implementations. The field seemed quite moribund through most of the late 1990's, early 2000's.
Dart and Hack are new languages, not new implementations of an existing language. Pyston seems like a new VM for Python (more comparable to HHVM, PyPy and V8).
By "Dart" and "Hack" I meant the Dart VM and HHVM. What I mean to say is that it's neat to see a renewed emphasis on serious, high-performance language implementations, whether for existing or new languages. For awhile it had seemed that there was just JVM/CLR on one side and a bunch of interpreted languages on the other.
Hack and HHVM are completely separate projects (they're even written in different languages: Hack in OCaml, HHVM in C++). They're both really interesting projects though.
I also toy with the idea of build a language, and my main contender is luajit (perhaps with terra). In the other hand, julia have be done on the LLVM...
It's pretty natural, we don't really have anything better that this stack, do we?. If you're writing JIT, why invent your own backend if LLVM is good enough? I mean, well, probably it's possible to write something better than LLVM, but it would be quite exceptional work. And what that section describes would be pretty typical for any duck-typed language passed to LLVM. Devil's in the details, anyway. Actually, at this level of detalization you can say that almost every cooking recipe sounds the same: "Chop products; put them in the cattle and put it on fire for some time; season it".
This is a weird way of putting it. Method-at-a-time JITs have been around much longer and represent a more traditional approach to JIT compilers. Tracing JITs have only become popular in the last 5-10 years. And during that time they've been seen as the sort of hot new thing, so much that there is a classic LtU thread from 2010 titled "Have tracing JIT compilers won?" (http://lambda-the-ultimate.org/node/3851)
While it's true that V8 has always been method-at-a-time and Mozilla has abandoned their tracing JIT TraceMonkey, LuaJIT is one of the fastest dynamic language implementations out there and is a tracing JIT. Unfortunately the benchmark game dropped LuaJIT so it's not easy to find benchmarks, but last I saw LuaJIT was pretty dominant speed-wise among dynamic language JIT implementations.
Mike Pall (LuaJIT author) argues that TraceMonkey's lack of compelling performance was more a result of trying to bolt tracing onto an existing VM as opposed to any shortcoming of tracing as an approach (http://lambda-the-ultimate.org/node/3851#comment-57643).