Announcing Pyston: an upcoming, JIT-based Python implementation

haberman · on April 3, 2014

> For instance, the JavaScript world has switched from tracing JITs to method-at-a-time JITs, due to the compelling performance benefits.

This is a weird way of putting it. Method-at-a-time JITs have been around much longer and represent a more traditional approach to JIT compilers. Tracing JITs have only become popular in the last 5-10 years. And during that time they've been seen as the sort of hot new thing, so much that there is a classic LtU thread from 2010 titled "Have tracing JIT compilers won?" (http://lambda-the-ultimate.org/node/3851)

While it's true that V8 has always been method-at-a-time and Mozilla has abandoned their tracing JIT TraceMonkey, LuaJIT is one of the fastest dynamic language implementations out there and is a tracing JIT. Unfortunately the benchmark game dropped LuaJIT so it's not easy to find benchmarks, but last I saw LuaJIT was pretty dominant speed-wise among dynamic language JIT implementations.

Mike Pall (LuaJIT author) argues that TraceMonkey's lack of compelling performance was more a result of trying to bolt tracing onto an existing VM as opposed to any shortcoming of tracing as an approach (http://lambda-the-ultimate.org/node/3851#comment-57643).

saurik · on April 3, 2014

Dalvik (the Android VM), when they added a JIT, claimed to be "starting" with a tracing JIT which they would "supplement" with a method later, which they seemed to feel could provide more performance. Their argument is that the method JIT operates over a larger code window; but Dalvik only seemed to operate on extremely short traces, so it has always been unclear to me whether this was just a weird limitation of their specific design :/.

igouy · on April 3, 2014

> Unfortunately the benchmark game dropped LuaJIT

If only someone else was interested enough to make some comparison measurements.

Meanwhile -- http://luajit.org/performance_x86.html

_halgari · on April 3, 2014

I'd love to hear more about their issues with PyPy, it sounds like they wrote of PyPy simply because they don't understand why it works so well. Not to mention that this is mostly a re-hash of stuff found in unladen-swallow.

I mean, if your end goal is to write another Python, sure go for it. But it really sounds like these people haven't done their research. I see nothing to write home about.

--- EDIT ---

Not to mention that JS is a completely different language form Python. Everytime you add two objects in Python you have the possibility of hitting a system defined add, or __add__ or __getattr__ or __getattribute__, or __radd__, or __getattr__ (looking for __radd__), etc. That'll be fun....

rayiner · on April 3, 2014

Their description does not suggest they don't understand how PyPy works, but rather they don't think they can tackle the yet unsolved failure modes (blowup, etc) of trace compilation.

They approach they're describing is one that already works in V8, JScore, and IonMonkey, which is to mix type prediction, type analysis, and runtime handling of unexpected cases. Basically, you use type feedback information to get an initial set of types for a method, use type inference techniques to squeeze out type checks, and then compile the method in a way that handles the expected types in a fast path and traps into a slow path as necessary.

sanxiyn · on April 3, 2014

None of V8, JavaScriptCore, SpiderMonkey do allocation removal, which is the single most important optimization PyPy and LuaJIT do, which also goes back to Psyco. I think it is unknown how to do this well in method JITs.

Allocation removal by partial evaluation in a tracing JIT: http://dl.acm.org/citation.cfm?id=1929508

Allocation Sinking Optimization: http://wiki.luajit.org/Allocation-Sinking-Optimization

mraleph · on April 4, 2014

> None of V8 .. do allocation removal

This is not correct. V8 does sink allocations into deoptimization exits. It does not sink allocations out of the loops at the moment though.

> I think it is unknown how to do this well in method JITs

I don't think it is unknown. The main simplification for tracing JITs comes from the fact that deoptimization and loop-exit can be elegantly treated within the uniform framework, which is a little bit harder for method JIT and you need to find right place to insert materialization instruction after the loop based on post-domination. Nothing hard or unsolvable though.

rayiner · on April 3, 2014

Yet V8, etc, perform pretty well. Is there a reason to believe allocation removal is particularly more important for Python than JS? It might be, I haven't thought about it, but at first blush that doesn't seem to be the case.

kmod · on April 3, 2014

It seems like there are a couple topics that come up when talking to people about switching to PyPy: - performance or memory usage on larger programs - C extension module support I'm not going to promise that we will do a better job at these, but there are technical reasons to think that it's possible.

You're definitely right, Python has a lot of user-customizability that can make it harder to execute efficiently than JavaScript (though thankfully it has less than Ruby). Both PyPy and Pyston have their techniques for cutting through the complicated+expensive slow case and trying to predict and execute a fast path.

kingkilr · on April 3, 2014

Just to be perfectly clear: it's 100% possible to write an efficient Ruby implementation, and if you have the right technical infrastructure, it's no more difficult than Python, or Javascript for that matter.

chrisseaton · on April 3, 2014

"Everytime you add two objects in Python you have the possibility of hitting a system defined add, or __add__ or __getattr__ or __getattribute__, or __radd__, or __getattr__ (looking for __radd__), etc"

That's pretty much a solved problem. Lookup the method once, cache it for next time, deoptimize if you're wrong or the definition of the class changes. If it keeps changing build up a larger cache.

Ruby has exactly the same behaviour, and we can still get it down to really simple machine code http://www.chrisseaton.com/rubytruffle/how-method-dispatch-w....

comex · on April 3, 2014

Every time you add two objects in JS you have the possibility of hitting toString at least; every time you access a property it might be a getter or setter, or a Harmony proxy object. Python is more liberal, but in both cases you have to make guesses about object shapes (and objects not having those properties) to have any hope of performance.

b0sk · on April 3, 2014

But considering GvR works for Dropbox and tweeted this out a few minutes back, they do have his backing.

sanxiyn · on April 3, 2014

Those who do not learn history are doomed to repeat it.

http://qinsb.blogspot.kr/2011/03/unladen-swallow-retrospecti...

dbecker · on April 3, 2014

Those who do not learn history are doomed to repeat it.

I'd bet the Dropbox team is aware of unladen swallow and the issues it faced.

Applied too broadly, that quote would mean "never try anything where others have failed."

kmod · on April 3, 2014

I can't speak to the non-technical reasons mentioned in the post, but there's good reason to believe that LLVM is in a very different state than the time that Reid wrote this, particularly wrt JIT support: the JIT engine has been completely replaced. Both of the things that he mentions (lack of back-patching, lack of gdb support) have been added to LLVM mainline.

sanxiyn · on April 3, 2014

LLVM's JIT support has changed a lot, but one thing didn't change: "LLVM code generation and optimization is good but expensive."

Even LLVM's "fast" code generator is slower than most JITs' code generators.

kmod · on April 3, 2014

Definitely true; there are some overheads that could be low-hanging fruit to optimize, but the real solution is most likely to use a tiered-compilation system. ie only invoke the full LLVM code generator once a function has been called 10,000 times. It's definitely an open question as to how to get faster compilation times out of it; we have a simple LLVM interpreter but since LLVM isn't designed for interpretation it's pretty slow.

I think you can look at the people who are interested in adding LLVM tiers to their JITs (ex Apple, Facebook) to see that we're not the only ones that think there can be a place for "good but expensive" code generation in a JIT.

jbs3982 · on April 4, 2014

Like this?

https://github.com/dropbox/pyston#compilation-tiers

perone · on April 3, 2014

From Dropbox to the python-dev mailing list:

"(...). As for Unladen Swallow, there are some reasons to think that LLVM has matured greatly in the past few years, particularly in the JIT engine which has been completely replaced. I'm not sure if that's the only part to the story; I'd be interested in talking with any of the people who were involved or knowledgeable about the project."

ahomescu1 · on April 3, 2014

It's my impression that Unladen Swallow failed because of insufficient manpower, not because of the general approach. It's true that LLVM is missing some features for a JIT like this (mainly OSR and back-patching), but those can be added. For example, there was a discussion a few months ago on llvm-dev about adding patching support [1] (it was originally meant for garbage collectors, but would also work for JITs).

1 - http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/0665...

driax · on April 3, 2014

It is my impression that LLVM has matured quit a bit since unladen swallow. Not least in the JIT domain. A lot of unladen swallows development was spent on improving LLVM and not on implementing Python as fast a possible. I seem to recall that the LLVM developers have had discussions on how to better support JIT compilers in LLVM, since unladen swallow, some of it precisely because unladen swallow failed.

krishna2 · on April 4, 2014

Guido works at Dropbox. I am sure he is advising - and I bet, he has seen pretty much all the previous attempts closely to let them make any obvious mistakes.

masklinn · on April 4, 2014

> Guido works at Dropbox. I am sure he is advising

Guido worked at Google during Unladen Swallow, he was not involved. From the comments of Pyston's announcements, here's the extent of GvR's involvement:

> Guido's advice has been extremely helpful, but so far we haven't been able to get any code from him :/

asb · on April 3, 2014

There's some more technical details here: https://github.com/dropbox/pyston#technical-features

Right now they have no baseline compiler but will interpret (un-optimised?) LLVM IR at first, second tier is unoptimised LLVM compilation, then LLVM compilation with type recording hooks and finally a fully optimised compile. Given the history of the Unladed Swallow project and others using the LLVM JIT, they're likely to find they have a lot of work on their hands, particularly as PyPy is really rather good these days.

EDIT: There's some more info here in a post to the LLVM mailing list by one of the Pyston developers http://article.gmane.org/gmane.comp.compilers.llvm.devel/718.... They've added a simple escape analysis pass for GCed memory among other things.

As always, if you're interested in LLVM or compiler stuff you should subscribe to http://llvmweekly.org (disclaimer: I write it) and follow @llvmweekly

chrismonsanto · on April 3, 2014

I was really hoping that they would support Python 3. Unfortunately, Python 2.7 only.

Please. Python 3 is ready. :(

kmod · on April 3, 2014

It's mostly a question of manpower and the fact that Dropbox is still on Python 2.7 internally. If we can get more people on the team, we'd love to add support for Python 3.

durin42 · on April 3, 2014

Python 3 is still very much not ready for a variety of complicated things, and some large stacks will likely take years of effort to be ported.

Mercurial is an example, I think twisted is another.

gtaylor · on April 3, 2014

AFAIK, a large chunk of Twisted is already working on Python 3. I don't know how long they have left to get 100% of it, but it's made substantial progress.

herge · on April 3, 2014

From what I gather, one of the things holding mercurial back is support for python 2.4, although the support of %-formatting for bytestrings will move the port of mercurial forward.

rdtsc · on April 3, 2014

Why?

What does Python 3 offers in terms of : performance, or improved libraries, safety, other major improvements to justify spending time switching production code to it, breaking library dependencies (could be transitive as well).

I don't see a company with a large Python code base justifying switching to Python 3. Yeah personal toy Github projects, sure, Python 3 is nice, but it is simply not enough rewards justifying all the downsides of switching for many projects.

Looking back, how Python 3 was handled was a mistake. There should not have been a Python 3 when it happened. It should have happened a lot earlier. But if it was going to happen, it should have offered some drastic benefits -- GIL is gone, LLVM JIT 30x numerical code crunching improvements, integration with PyPy, ... I don't know, awesome new built-in libraries like "requests", Flask integrated in. Things like that.

What do we have instead?, unicode improvements, generator code improvements, a Twisted-like async library (don't get me started on that). Iterator cleanups around dicts... That is just not enough, sorry.

rayiner · on April 3, 2014

Between Dart at Google, Hack at Facebook, and now Pyston at Dropbox, it's really neat to me that there is a resurgence in interest in language implementations. The field seemed quite moribund through most of the late 1990's, early 2000's.

ahomescu1 · on April 3, 2014

Dart and Hack are new languages, not new implementations of an existing language. Pyston seems like a new VM for Python (more comparable to HHVM, PyPy and V8).

rayiner · on April 3, 2014

By "Dart" and "Hack" I meant the Dart VM and HHVM. What I mean to say is that it's neat to see a renewed emphasis on serious, high-performance language implementations, whether for existing or new languages. For awhile it had seemed that there was just JVM/CLR on one side and a bunch of interpreted languages on the other.

ahomescu1 · on April 3, 2014

Hack and HHVM are completely separate projects (they're even written in different languages: Hack in OCaml, HHVM in C++). They're both really interesting projects though.

vfclists · on April 3, 2014

None of these new languages come with a decent usable IDE.

When will a new language be developed that also provides the Smalltalk developer experience out of the box?

mamcx · on April 3, 2014

Why not build this on top of luajit?

I also toy with the idea of build a language, and my main contender is luajit (perhaps with terra). In the other hand, julia have be done on the LLVM...

beagle3 · on April 4, 2014

LuaJIT is married so hard to Lua that it makes no sense to do so. Did you look inside?

meemo · on April 3, 2014

The "How it works" paragraph almost sounds like it was taken from a description of how Julia works. (Not implying anything negative, btw.)

krick · on April 3, 2014

It's pretty natural, we don't really have anything better that this stack, do we?. If you're writing JIT, why invent your own backend if LLVM is good enough? I mean, well, probably it's possible to write something better than LLVM, but it would be quite exceptional work. And what that section describes would be pretty typical for any duck-typed language passed to LLVM. Devil's in the details, anyway. Actually, at this level of detalization you can say that almost every cooking recipe sounds the same: "Chop products; put them in the cattle and put it on fire for some time; season it".

sanxiyn · on April 4, 2014

Many people (including myself) think PyPy is a better stack. We will see.