Implementation plan for speeding up CPython

oefrha · on Oct 20, 2020

The only discussion I can find about this on python-dev/python-ideas:

https://mail.python.org/archives/list/python-dev@python.org/...

(Mark Shannon has been a core dev since 2018-05-15: https://devguide.python.org/developers/ though his involvement dates back to 2011 according to git log.)

bratao · on Oct 20, 2020

He, Inada Naoki and Victor Stinner on the optimization front are some developers that are capable of pulling a JIT of CPython.

This is something that I feel that Python need to do to keep the relevance. .Net Core, Rust and GoLang are examples of modern languages that deliver on the performance front.

I love Python and it is what use everyday. But very often I feel the need to a better code generation. Numba and Cython feels like a glue. PyPy it is slow with extensions, and this is a huge part of Python ecosystem. Although HPy (https://github.com/hpyproject/hpy) is trying to change this.

I feel that this is a must to Python, and would love to contribute money to this task.

throwaway894345 · on Oct 20, 2020

It’s a chicken and egg situation. If Python weren’t so slow, there wouldn’t be such a heavy reliance on C extensions in the ecosystem, and consequently it would be easier to optimize CPython and PyPy because there would be less at stake with respect to breaking the ecosystem. Not only that, but lots of other things get better if C-extensions go away: much less need for executable project files which dramatically improves package manager performance (no need to recursively download a package and run setup.py to determine the next level of dependencies just to build the dependency tree), packaging gets dramatically easier (C projects have no standard reproducible build system so every C project including C extension projects must have its own package recipe and it’s often impractical to support more than a handful of target systems), etc.

Hpy appears to be the right approach to fixing this, but there needs to be a concerted effort to migrate the ecosystem toward it and then deprecate the old expansive interface. And then at some point in the distant future we can expect things to be as nice as other ecosystems are today.

throwaway189262 · on Oct 20, 2020

This is absolutely the problem. Fast languages like C#, Java, Go and even JS almost nothing is native. It's normal to have even large projects with no FFI'd native code.

Python and Ruby are trapped in the C extensions spiral. Languages are slow so everything uses C bindings, which makes it hard to speed up the language

pjmlp · on Oct 21, 2020

And in what concerns Java and .NET platforms, it has been acknowledged that too much of low level capabilities have been left out and those features are being added, even if slower than many of us would like to, so eventually even the stuff that might still require a native library today, won't be necessary in the future.

murgindrag · on Oct 21, 2020

Translation layer. There ought to be a thick, fat, slow layer between Fast Python and CPython.

You can write fast native you. You can write fast code by calling into native. The two rarely cross.

If we were clever, we could even have one front-end and multiple backends to go into fast Python, CPython, Cython, embedded micropython, etc. If we were double-plus-clever, we'd add GPGPU, multicore, and compute cluster to the mix.

We'd need someone with deep pockets and high risk tolerance to pull it off. It's a big change. Big changes often fail. Still, when I look at the amount of Python at Google... Well, it'd sure be a pity if all of that became legacy code when we all jumped ship to Julia.

japanuspus · on Oct 21, 2020

Absolutely agree.

I do believe this will be hard to change, though: Coming from academia my first Python experience was as a front-end to actively developed Fortran code. In that segment, the easy link to external code is one of the main selling points of Python.

oefrha · on Oct 20, 2020

To extend what you said a bit: PyPy is not only slow with extensions, it's also massively slower for running short-lived code that isn't repeated much, which is precisely the scenario of many if not most command line tools written in Python. Although I'm not sure there are enough people concerned about Python's performance in this scenario.

And yeah, I would be happy to pitch in too if CPython can be 50% faster reliably. It only takes 500 people each pitching in $100 to reach $50k.

Edit: somehow got $50k etched in mind when it says $500k. That’s more difficult from individual funding.

dr_zoidberg · on Oct 21, 2020

OTOH, $2M is a sneeze for large corporations. I totally understand why they want to push community funding, but I wonder if they're open to having corporate sponsors or something along those lines. The idea would be "yes, you pay the bill, but you don't get to push an agenda with this".

quotemstr · on Oct 21, 2020

Thanks for the link to Hpy. That's a fantastic project

ebg13 · on Oct 20, 2020

I hope that discussion thread continues, because so far there's some really juicy sass happening in there.

> > 1. I already have working code for the first stage.

> I don't mean to be negative, or hostile, but this sounds like you are saying "I have a patch for Python that will make it 1.5 times faster, but you will never see it unless you pay me!"

and

> Where is your working code for the first stage?

duckerude · on Oct 20, 2020

His response:

> I believe that's how business works ;)

> I have this thing, e.g an iPhone, if you want it you must pay me.

> I think that speeding CPython 50% is worth a few hundred iPhones.

I wonder if he considered publishing it under a restrictive license that doesn't allow real world use. Then people could scrutinize his claims but there would still be an incentive to pay up for him to relicense it.

sk2020 · on Oct 20, 2020

He sounds like a grifter.

Why not release this super-dooper Python interpreter under a proprietary license? Just have people pay $1000 for it. If it works, then it’s a steal for people who use Python for business. “Well, you have to take me at my word and pay me first” is highly suspect.

I’ll use PyPy, thanks.

remote_phone · on Oct 20, 2020

Releasing it before payment would be an insanely bad idea.

50k is about 3 months worth of a single FAANG developers base salary. It’s well worth it for someone to pay for the code.

duckerude · on Oct 20, 2020

The parties most likely to pay would still have reason to pay.

The PSF couldn't merge the code until it has the right license.

Large tech companies like Google wouldn't be willing to use the code until it has the right license.

If releasing the code under "look, don't touch" terms helps convince them that paying is worth it then that seems like a good idea.

nlpx · on Oct 20, 2020

There are many other developers who have made huge contributions and have not asked for money.

But in the new corporate Python world this might be the logical conclusion. The losers are the people who made Python what it is today, for free.

coldtea · on Oct 20, 2020

>The losers are the people who made Python what it is today, for free.

Why are they "losers"? Did they sign any contract that something is owned to them? They did it for fun/personal reasons/ideology, and what they created is used by millions.

If they wanted to get paid for doing it, they could very well have had as well - several core python contributors have worked in large companies doing Python (including core Python) work, most famous of all Guido.

ybhx · on Oct 20, 2020

Not a core developer, but part of my job at my company is to follow CPython development very closely.

I don't see the pattern that the core contributors who work for large companies do important work. A lot of it is churn and minor patches.

Most important work appears to have been done by people who contribute a large chunk of code and then often leave.

Or by people who don't work for large companies but did a lot of useful bug fixes.

xapata · on Oct 20, 2020

Maybe. Guido probably got some Google and Dropbox shares, so I'll bet he's not unhappy with the results of his work.

sippingjippers · on Oct 20, 2020

edit: whups, wrong Mark :) I was thinking of Mark Hammond, not Mark Shannon

munificent · on Oct 20, 2020

I've been observing something akin to Zawinski's Law ("Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.") but for programming languages:

Every dynamically-typed programming language attempts to expand until it has static types and compiles to machine code.

* Common Lisp has long had type annotations that can be use to generate specialized code.

* JS started out as a simple bytecode VM, then got a JIT from V8 and others, then TypeScript came and gave it static types.

* Python has had several JITs over the years—Unladen Swallow, PyPy, etc. It got static types with mypy (and others) and is now adding type annotations directly to the core language.

* Ruby's 3x3 plan involves adding a type-specializing JIT. Sorbet adds static types and I believe Matz wants the core language to go in that direction.

* Facebook created Hack to statically type PHP and PHP core added "scalar type dependencies". Facebook's HipHop VM brings a JIT to PHP.

* LuaJIT brought a high performance JIT to Lua and there's been a number of projects that layer static type annotations onto the language.

* Dart started with an optional type system and moved to a fully sound static type system.

So what I see is a language that starts out simple and dynamically typed with a set of core libraries and idioms designed around dynamism. Then later people add static types on the front end to help people maintain larger programs. And they add type specializing JITs on the back end to generate faster code.

But right in the middle you're still stuck with a mountain of existing code designed around the assumption that code and data don't need to be statically shaped. So even though you end up doing all the work (and adding all the complexity) to design a static type system and native code generator, you don't get the full benefits.

The static type systems are almost always unsound in order to play nice with existing dynamic idioms, so the back end can't use the static types for optimization purposes. You end up with these fantastically complex type systems like TypeScript's and these incredibly complex JITs, but you still don't get the performance you get from a simple fully-statically typed language like Go or, hell, Pascal.

I think the reasonable take-away is that if you ever intend your language to be used for large programs, just take the hit and start off with static types. Your future self will thank you.

chubot · on Oct 20, 2020

Not to beat this into the ground, but I realized I could have written a much shorter response.

This idea to "start with static types" presumes that there are no benefits to dynamically typed languages!

But the simplest argument is also the one I think that's the most true: dynamically typed languages let you explore the problem domain with faster feedback, leading to more successful software.

-----

Quote from John Ousterhout: The greatest performance improvement of all is when a system goes from not-working to working [1]

And we should never be cavalier about how hard that is. Most software projects fail; the successful ones are miracles!

I also take an expansive definition of "working" -- i.e. "useful to its users".

[1] https://wiki.tcl-lang.org/page/John+Ousterhout

_8ljf · on Oct 21, 2020

https://en.wikipedia.org/wiki/Gradual_typing

Also, TBH, types are kind of the wrong answer to the question. The question should be: How do we make the language efficiently expressive? For my money, that means defining constraints primarily on the code’s main interfaces, which a given constraint may be a combination of traditional generics, dependent types, and/or declarative run-time checks on input/output values. (e.g. Eiffel comes to mind.)

For instance, in my kiwi language I can define an argument as:

    list (whole number (0, 100), 4, 4)

Which is to say a 4-item list, where each item is an integer between 0 and 100. And since I don’t want to type all that every time I can give that constraint a descriptive name:

    define type (CMYK color, list (whole number (0, 100), 4, 4))

I can even include user documentation in that definition so the whole lot’s self-describing.

Kiwi’s very late-bound and interpreted, so its constraints are implemented solely as run-time coercions with optional bounds checks; nothing fancy. But the semantics are quite well formalized so a linter could be implemented as an assistant authoring tool for users, and if you can implement a linter then you can implement a type checker; and so on.

What matters is that the language has a formal mechanism by which it can guarantee that a given value will always satisfy one or more user-defined requirements; and whether those requirements are checked at compilation, execution, or some combination is secondary to that. How rigorous the user makes these declarations, and if/where she makes them, is entirely up to her.

Remember, the goal of any language is to please its users. Not the machines it runs on, nor the designers who created it. And users’ needs are not constant, not even across the development cycle of a single program, so a language that cannot adapt to those users’ changing requirements as it goes has already failed its first hurdle.

..

Perhaps if authors of existing languages like Python and C put less effort into chasing the constantly-diminishing returns of post-hoc micro-optimizations, and more into thinking how to design the next generation of languages so as to carry forward the good characteristics of their predecessors minus their original already-painted-themselves-into-a-corner limitations, we might actually have languages that tick all the boxes by now.

arunix · on Oct 21, 2020

What's the status of Kiwi (couldn't find it on your github) ?

_8ljf · on Oct 21, 2020

Current implementation is proprietary so not on my GH, alas. (I am working on a new one, but it’s early days.) Some of its ideas carry over into iris though, which is. HTH

teddyh · on Oct 20, 2020

> I've been observing something akin to Zawinski's Law ("Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.") but for programming languages:

Even more applicable, in this case, would be Greenspun's tenth rule:

“Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.”

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

kazinator · on Oct 20, 2020

It occurs to me that whereas plenty of such C programs have been seen, where can we find such a Fortran program?

kazinator · on Oct 20, 2020

I totally forgot that Lisp's List processing came from Fortran: a system called FLPL. This had the CAR and CDR naming (in an ugly form, with additional prefix and suffix characters: XCARF and XCDRF).

https://www.informatimago.com/articles/flpl/flpl.html

jononor · on Oct 20, 2020

Statically typed languages are going the other way to. Like focusing on ergonomics for dealing with key-value data with heterogeneous types (like JSON objects), to generics/traits not needing a specific baseclass, and structural typing more generally, and type inference to avoid having to specify types.

So I think we are more seeing an ongoing convergence on combined static/dynamic typing approaches.

throwaway189262 · on Oct 20, 2020

Static languages are going all in on type inference. Types aren't annoying if they're hardly ever specified. It's largely a win-win. Even Java has decent type inference on newer versions

morei · on Oct 20, 2020

I think the convergence here is toward 'automatically discoverable types'. For python et al this means type annotation (because it's extremely difficult for discover types otherwise).

For C++ et al this is things like the 'auto' keyword that allow skipping the type specification where the compiler can work it out.

The end-game is specifying just enough type information that it's still understandable to both humans and compilers.

throwaway189262 · on Oct 21, 2020

Typescript is extremely good at this. Rust is pretty good. Even Java is okay now.

I firmly believe type inference is the future. Most objects in dynamic languages are statically typed anyways, the type just isn't exposed at compile time.

_8ljf · on Oct 21, 2020

Type inference is good, but it’s not a full answer.

For instance, how often do you declare a variable as `int`, when what you really want to say is “an integer in the range 0...100”? A few languages (e.g. Eiffel) provide a formal mechanism for declaring these sorts of constraints, but most don’t, and you end up putting what should be declarative type-level information into the body of your code instead.

And then there’s “cutting-edge” stuff like dependent types, where you really want to express one argument’s type in terms of another argument’s, a classic example being an array indexing method, where you really want to declare the index at compile time as an integer in the range `0..<array.length`, and let the type system propagate that rule and its implications throughout the code that uses it. Whereas most “modern” languages chuck a run-time error if you’re lucky; or just ralph and dump stack like some antiquated 1970s throwback (yeah, looking at you, Apple’s Swift).

3/10 Could do much better.

smichel17 · on Oct 21, 2020

Huh. I generally enjoy Typescript, but have been frustrated by its inference at times, particularly with regard to generics -- there are places where it will do fine if you pass an arrow function directly, but if you pull it out into a const, the compiler will no longer be able to infer types (and thus fails to compile if you have noImplicitAny enabled).

throwaway189262 · on Oct 24, 2020

I've found it's inference to be great, but the error messages GCC-like

chubot · on Oct 20, 2020

I think the reasonable take-away is that if you ever intend your language to be used for large programs, just take the hit and start off with static types. Your future self will thank you.

I think this is missing something: the creators of those languages wanted languages that were fun/productive for writing small programs. That is, extremely flexible languages (and yes maybe they were unaware back in the 90's how this would limit their optimization potential)

And all big programs were once small programs. Facebook is probably one of the easiest to see, since it was "just" a bunch of PHP scripts (although people tend to underestimate/dismiss it for that reason).

Just like nobody ever says: "Well I estimate that in 10 years Facebook will consist of 10M lines of PHP with 5,000 programers working on it -- I should probably write it in another language".

Nobody ever says: "Well I think my language is going to be used by millions of people and will have billions of lines of code".

Well, there probably were people who thought that, but those were exactly the people who didn't make languages as useful as Python and JS :)

----

That said I think the phrase "irrational exuberance languages" referring to Python/JS is kinda funny, and in a way accurate ...

https://blog.sigplan.org/2020/10/12/from-heavy-metal-to-irra...

Although again I would say the unexpected part was not that they thought single core scaling would continue forever and make their languages fast, it's that those "slow" languages turned out to be the "best" ones for writing some of the most important systems of the last couple decades (not just commercial ones, but also Wikipedia, BitTorrent, etc.)

----

I'd make another analogy, to ISAs. If you talk to anyone who knows about CPU design, they'll say that x86 is shitty with a big pile of hacks.

"It would be better" if someone designed it from the beginning with current applications in mind. But if anyone actually did that back in 1980, they wouldn't have been successful.

And from my perspective, I mostly don't care, because the C compiler makes it all work for me (although I know the people who make it work care very much).

So I guess the point is that technology adoption proceeds by evolution, and trying to plan 10 or 20 years ahead of time never works.

----

Also, there is a pretty hard tradeoff between static types and metaprogramming. Recent languages have come closer to reconciling these features (Zig, Nim, D), but dynamic languages chose reflection/metaprogramming, and that's a primary reason why they became successful.

Ruby on Rails is a great example of that. It uses Ruby metaprogramming/reflection to a hilt, and lots of people who have no idea what that is love it, and they built tons of things with it (which now makes it an interesting optimization target)

munificent · on Oct 20, 2020

Yeah, there's a longer essay you or I could write about path dependence here. The language that enables you to get to the point where your codebase is that big in the first place may not be the language best suited to that codebase once it has.

That's the main argument in favor of optional type systems like TypeScript. Start small in a dynamically typed language and then add the types later when you need them.

But my personal experience is that I've never found types to cause much friction when programming in the small. What I have found painful is not having GC and not having type inference. If I was doing a startup and needed to be able to prototype and change quickly, then C++ or Java would be pretty painful. But C# or any modern typed, managed language with a decent modern IDE? I would be surprised if you were any less productive than someone using Python or Ruby.

chubot · on Oct 21, 2020

Yeah, "path dependence" pretty much sums up the comment. I should probably write something about that, because a huge factor in the design of https://oilshell.org. But I still get questions about it (like "Why can't we just get rid of shell and start over from scratch?")

A few points I would like to add on:

- I don't view the existence of TypeScript, MyPy, and Sorbet as evidence in favor of static typing. It's evidence in favor of gradual typing!

- The Oil project gave me a lot of experience with the relation between metaprogramming/reflection, dynamic types vs. static types, performance, and code length.

In particular the code moved from dynamic to static typing, and got a lot faster. But dynamic typing wasn't a mistake.

Short recap: Oil's code is 5-7x shorter than bash [1], and a lot of that is due to starting out as dynamically typed, with a lot of metaprogramming. (I still believe "size is code's worst enemy" -- big code is understood more poorly, which makes it harder to modify, regardless of static types.)

It is also something like 30x-50x slower than bash in Python! So, unusably slow. However the surprise is that I statically typed this code, and semi-automatically translated it to C++, and the result is now faster than bash. [2]

-----

So the high level, short code has enough semantic information to be fast (after you add explicit types).

The static typing process mainly evolved expanding metaprogramming to textual code generation! I estimate that this was at least 9 months of rewriting.

That's what your arguments are missing IMO. If you're writing Java in Python, then sure Python is going to seem like it offers no advantages, and it might as well be statically typed.

But that's not how people write programs in dynamic languages (and honestly I thought you would appreciate that more, having written so much about dynamic languages!) The porting process taught me exactly how much dynamism I was using, and it was a lot! It was pulling a lot of weight.

I should show all the code generators and generated code in an essay... it's a very concrete demonstration.

So there is the fallacy of "type inference" solves the problem -- it's not that we're too lazy to write down types; it's that we're using techniques that static type systems can't handle. Good thread about that: https://twitter.com/sliminality/status/1317331149354463232

I'm not saying the Oil experience generalizes, since it's an unusual project, but it's definitely not as simple with "go with static types so you don't get trapped". That said, the conundrum you're talking about is very real.

-----

But despite writing all that, I'm actually leaning in your direction, and I started a statically typed language :)

https://old.reddit.com/r/ProgrammingLanguages/comments/jb5i5...

I would start with an interpreter so it can have metaprogramming (e.g. like a constexpr interpreter, or what Zig does). I would like to add a gradual type system, but I don't really know how to write one, so the first cut will be a traditional static type system. (In this world, getting the 30-50x speedup relies on the program being 100% statically typed, yet gradual typing is still important IMO. It's very simple, no infinite treadmill of JIT work as you see in the "professional" projects.)

-----

This is probably something for an essay, but I would say dynamic types are demonstrably better than static types for at least 3 domains: UI, data science, and security/reverse engineering (and I have a bunch of experience to back this up). Basically anything that involves "learning about the world", or "schema discovery".

[1] http://www.oilshell.org/blog/2019/06/17.html#why-is-it-writt...

[2] https://www.oilshell.org/blog/2020/01/parser-benchmarks.html

Good thread about how successful Ruby and Python have been in YC companies: https://news.ycombinator.com/item?id=24279611

jbritton · on Oct 21, 2020

How did you do this? “semi-automatically translated it to C++”

Or is that part of why you said you needed to write an essay.

chubot · on Oct 21, 2020

Yeah at some point I should write a blog post, but here is a pretty good condensed summary of it:

https://old.reddit.com/r/ProgrammingLanguages/comments/jb5i5...

The mycpp tool I wrote can be thought of as a cross between this old "Shed Skin" project, and mypyc:

https://www.oilshell.org/cross-ref.html#mycpp

I kept extensive notes on the process on Zulip, list of threads ehre:

http://www.oilshell.org/blog/2020/03/recap.html#mycpp-the-go...

Really short summary: I used the MyPy front end, and basically printed its AST as C++. This involves a bunch of hacks, but it works.

The following conditions helped a lot:

- I control all the code in Oil, so I can statically type all of it, which involves both annotations and occasional patches. A shell doesn't have many Python library dependencies; i.e. it doesn't depend on BeautifulSoup or something like that. It's basically all string and data structure manipulation, with a few sys calls.

- Oil has extensive tests. I run the same tests against the translated and compiled C++, which flushes out bugs in the translator. (Something like 915 out of 1700 tests pass now, so the translation process isn't done.)

- I also used unit tests to generate some of the type annotations with pyannotate.

There is still some translation left to do if anyone is interested in helping. You will probably learn something about both Python and C++!

Scarbutt · on Oct 21, 2020

For general interest for those of us that don't know C#. What marginal benefits would C# have over Java in the case you are talking about?

munificent · on Oct 21, 2020

The "var" keyword for local type inference, nice lambdas with type inferred parameters, and good type inference for generic classes and methods.

byko3y · on Oct 22, 2020

>"It would be better" if someone designed it from the beginning with current applications in mind. But if anyone actually did that back in 1980, they wouldn't have been successful

Strongly disagree. Although x86 was dominant on desktop PCs with MS DOS, Nintendo Entertainment System used MOS 6502 which is a lighter version of m68000, while Sega Mega Drive used a full blown m68000. Apple II desktop also used MOS 6502, HP used PA-RISC for their HP-UX servers, and Sun replaced their 68000s with SPARC in late 80-s. Alpha achitecture was so successfull there was a Windows NT port for this achitecture. So it wasn't really game over as of 80s, and even in 90s the market was still diverse. It's only by 00s x86 together with Windows NT series achieved total dominance, even in small server segment.

>So I guess the point is that technology adoption proceeds by evolution, and trying to plan 10 or 20 years ahead of time never works

Right conclusion - wrong reasoning. Quality of product is never a main driver of sales. That's why to succeed you need to market it first and then elaborate some way to make it usable. That's where, for example, Motorolla, Alpha, MIPS failed, and that's where ARM won as an umbrella brand for actually 4 incompatible architectures (original ARM, AArch32, AArch64, and Thumb a.k.a. SuperH).

BerislavLopac · on Oct 21, 2020

One important point being that, while the languages mentioned have expanded to include static typing and native compiling, they haven't stopped being what they were originally -- they have expanded, rather than transformed. In other words, the meaning of "dynamic languages" can be taken to mean "everything a language might need to be efficient and useful".

machiaweliczny · on Oct 23, 2020

I think we could've 2 backends if possibile. Interpreter for prototyping and compiler for production. I usually don't want interpreter in PRD and don't want compiler when prototyping. That's why TS is sucessful IMO. You can just test stuff with using emit only while still having pieces of code highlighted that don't make sense yet.

girvo · on Oct 21, 2020

Core PHP itself has a JIT and is even today expanding its type hint system to be more expressive and powerful/strict

kazinator · on Oct 20, 2020

> Then later people add static types on the front end to help people maintain larger programs.

More like, people whose hobby it is to create static front ends for dynamic languages will invariably invade the ecosystem of any sufficiently popular dynamic language.

brokencode · on Oct 20, 2020

Most of the projects listed there are made by large companies, not hobbyists. I know there are a lot of dynamic language fans out there, but it really does seem that there is business value in using static languages.

kaliszad · on Oct 20, 2020

So where does Clojure and GraalVM fit in?

mtklein · on Oct 20, 2020

This reminds me of Unladen Swallow: https://www.python.org/dev/peps/pep-3146/

(So not holding my breath.)

tom_mellior · on Oct 20, 2020

This looks less naive than Unladen Swallow's "let's throw LLVM at it and hope for the best".

throwaway_pdp09 · on Oct 20, 2020

Quote from link

"With Unladen Swallow going the way of the Norwegian Blue..."

I love all this kind of thing that you don't get in commercial software.

nurettin · on Oct 20, 2020

Happens all the time at googleplex.

kzrdude · on Oct 22, 2020

I think unladen swallow caused this improvement https://bugs.python.org/issue4753 as an upstream effect which was a 15-20% interpreter speedup

kmod · on Oct 20, 2020

From my experience in this area:

- There are companies which would be willing to pay large amounts of money (more than he's asking for) for the speedups he's promising

- There are other similar efforts underway (such as ours) which have implemented some of his ideas and the gains are much smaller than he anticipates (for real code, microbenchmarks are another story)

tln · on Oct 20, 2020

Was your effort Pyston?

https://github.com/pyston/pyston

rich_sasha · on Oct 20, 2020

Atm this reads like a shopping list, _but_ if it goes ahead, and even if it doesn’t fully deliver, I think this could be the impulse that Python needs to remain relevant in scientific programming.

Clearly it is the leader right now by a long way, but Julia is growing really fast.

UncleOxidant · on Oct 20, 2020

Was Julia also part of the impetus for the recent pep for adding macros?

FridgeSeal · on Oct 20, 2020

I remember asking this question here or on reddit when that popped up, and was rebuffed, but I do suspect Julia having powerful macros was a driver.

spappal · on Oct 21, 2020

The author of that macro PEP [0] is the same Mark Shannon behind the speedup proposal under discussion.

[0]: https://www.python.org/dev/peps/pep-0638/

whb07 · on Oct 20, 2020

Anyone comment the obvious low hanging fruit?

Compile CPython in release mode, as most everyone uses the debug version.

Doing this gives a rough 10% performance increase.

macjohnmcc · on Oct 21, 2020

I've compiled some code using Microsoft tools and saw some huge performance gains on code I didn't think would benefit.

sgerenser · on Oct 21, 2020

If you’re talking about Debug vs Release mode in Visual Studio, there’s a huge performance hit with STL iterators in Debug mode. Like sometimes 10-20x slower for tight loops. This is obviously C++ not C, so it doesn’t apply to Python directly. But in general toolchains can have very unoptimized code paths in debug mode.

kzrdude · on Oct 21, 2020

This seems like a joke? CPython uses a threaded interpreter loop, so surely they optimize the release build?

_zamorano_ · on Oct 20, 2020

There's already PEP 640, targeted for 3.10, that seems will break C-API compatibility.

https://www.python.org/dev/peps/pep-0620/#the-c-api-blocks-c...

This seems like the first step towards such improvements, but I guess recompiling libraries will be mandatory, and already existing wheels won't be valid for Python 3.10

Isn't this more scary than the Python 3 migration?

xapata · on Oct 20, 2020

Dude, I had to recompile all my extensions for every minor version of Python as recently as ... 2013? Wheels haven't been around for that long. I'd get a new laptop and curse, "Why didn't I think to write down all those Fortran compile flags I used for SciPy!"

latenightcoding · on Oct 20, 2020

I favorited this post to read the discussion later, but looking into that "plan", it's mostly a bunch of nothing.

It's very hard to estimate how much speedup a JIT will get you on a dynamic language like python and x5 speedup seems unrealistic.

There are other lower hanging fruits, like optimizing core data structures (e.g: the implementation of python dicts) .

cuchoi · on Oct 20, 2020

What makes you think that the implementation of dicts is a low hanging fruit? Only asking because there has been a lot of work to make them faster: https://www.youtube.com/watch?v=npw4s1QTmPg

jammycrisp · on Oct 20, 2020

Agreed on optimizing core objects. I recently wrote a C base class (https://jcristharif.com/quickle/#structs-and-enums) for defining dataclass-like-types that's noticeably faster (~5-10x) to init/copy/serialize/compare than other options (dataclasses, pydantic, namedtuples...). For some applications I write this has a non-negligible performance impact, without requiring deep interpreter changes. Using the base class is nice - my application objects are still defined in normal python code, but all the heavy lifting is done in the c-extension.

However, this speedup comes at the cost of being less dynamic. I'm not sure how much more optimized core python objects could be without sacrificing some of the dynamism some programs rely on. Python dicts are already pretty optimized as is.

byko3y · on Oct 21, 2020

>I recently wrote a C base class (https://jcristharif.com/quickle/#structs-and-enums) for defining dataclass-like-types that's noticeably faster (~5-10x) to init/copy/serialize/compare than other options (dataclasses, pydantic, namedtuples...)

YouTube also encountered the same problem. Their solution sounds kinda like "never use pickle, because it's slow. Use custom serialization".

coldtea · on Oct 20, 2020

>It's very hard to estimate how much speedup a JIT will get you on a dynamic language like python and x5 speedup seems unrealistic.

You can have a 2x or 10x for many use cases speedup without a JIT, as PHP7 proved. You just need to start with a slow, not very optimized, implementation, which CPython pretty much is.

As for 5x, Javascript has had much more than speed bump than that with its JITs (compared to the interpreted Javascript pre-JSCore, Tracemonkey and V8 circa 1997-2005) and it's just as dynamic as Python...

>There are other lower hanging fruits, like optimizing core data structures (e.g: the implementation of python dicts).

Funny that you should mention it, because the author of the proposal has already done significant work (available since Python 3.3. or so) optimizing the dicts...

d0mine · on Oct 20, 2020

> x5 speedup seems unrealistic. I see x4 speedup https://speed.pypy.org

metalliqaz · on Oct 20, 2020

I'm not saying you're wrong, but last I checked (admittedly, years ago) there was at least 1 Python JIT with a lot of development and user hours behind it, with good results. Seems like that experience should yield some decent estimates on speedup.

evilotto · on Oct 21, 2020

Is there any comprehensive overview of why python is slow in the first place? There seems to be opportunities for small optimizations, but is that the only place where the slowness is? A few years ago I tried to find out what investigations had been done on the speed of bytecode generation and there was basically nothing. No one could even suggest how long bytecode generation took, only that "it's really slow". Isn't the common wisdom to measure, then optimize, not the other way around?

mjburgess · on Oct 21, 2020

It has nothing to do with it being interpreted (this is a common misidentified issue with dynamic langs).

The fundamental issue is that python is a pointer machine: everything requires a dynamic lookup in memory.

Eg.,

    x = [1, 2, 3]

    len(x)

Here `x` is an actual string in memory which is a key in a locals() dictionary which holds values. (cf. with C where it is just a memory address).

Likewise the list is a list of pointers (not a sequential array). And its heterogenous, ie., the contents can be of any type.

Likewise `len` is a string into a dictionary of functions which has to be looked up.

etc.

The whole thing is many levels of indirection. Applying an operation to a value (eg., even x + y) requires jumping around the memory of the machine many times.

This is necessary, in general, to deliver on the dynamic lang. features python provides.

Julia solves some of these issues by using static type information to ditch this dynamic behaviour. My suspicion is that python can follow a similar path (eg., above, x should be compiled to a static homogenous array of ints).

jashmatthews · on Oct 21, 2020

Local variables in CPython are stored in the stack frame and accessed via LOAD_FAST and STORE_FAST without doing anything with x as a string or a hash table: http://stupidpythonideas.blogspot.com/2015/12/how-lookup-wor...

mjburgess · on Oct 22, 2020

My claim there was only that it was reified in memory, not that in the case of loading x, you had to use it. Though it is worth noting that you dont.

In general the point stands, the reason for slowness is indirection & reification. (Not sure why i'm downvoted).

jashmatthews · on Oct 22, 2020

The locals dict is lazily created from the locals in the stack frame. No dict there unless you use it. The article I linked to before has a really good explanation of this.

len() only causes one dictionary lookup and then it's cached.

> This is necessary, in general, to deliver on the dynamic lang. features python provides.

It's the most obvious way to implement these features of dynamic languages but not at all necessary.

https://www.infoworld.com/article/2074780/avoiding-hash-look...

https://bibliography.selflanguage.org/_static/pics.pdf

chubot · on Oct 20, 2020

Yeah the initial page looked very thin, but this linked one has a tiny bit more info:

https://github.com/markshannon/faster-cpython/blob/master/ti...

Still I'm a bit skeptical ... his reasons for why others failed and he will succeed is not that convincing.

formerly_proven · on Oct 20, 2020

fwiw CPython is a really naive bytecode interpreter. Way more naive and basic than people seem to generally assume.

FridgeSeal · on Oct 20, 2020

It also does (to my knowledge) no real optimisation on user code. I've also read that a lot of pythons construction and language semantics make it difficult to implement more performance optimisations.

imtringued · on Oct 20, 2020

For all I know, Python is extremely slow. If you think a 5x improvement is not possible then you may be underestimating how slow Python is.

viraptor · on Oct 20, 2020

Or aware how dynamic it is. The ability to run 'locals()' for example makes a lot of possible improvements quite tricky / more heavy than necessary.

kzrdude · on Oct 21, 2020

pypy has been working on it, for 10 years though, and their average is 4x on their benchmarks. Sure, I think this means they do 10x or 20x performance on certain tasks, and that's impressive. But still, it's not easy to do 5x across the board.

byko3y · on Oct 21, 2020

Frankly speaking, I had this impression too. The reason why JVM and JS uses own bytecode for implementing JIT-compilation is because main means of optimization are inlining and vector scalarization (convert complex object to simple ones). E.g. you have a container that hold integer and does nothing else, thus when JIT-compiling you can skip the whole boxing/unboxing and manipulate a single integer value stored in register. However, you cannot do that in a simple ways if your integer container is a C extension black box — you cannot "inline" the container's functions and reduce its store-loads.

This is why PyPy reimplements standard library in RPython — so you can JIT-optimize it. But it feels like Mark Shannon knows nothing about these efforts — which is kinda strange considering his position of core CPython developer.

>There are other lower hanging fruits, like optimizing core data structures (e.g: the implementation of python dicts)

Unfortunately, you cannot easily implement efficient data containers without rewriting existent python code. The latter one relies heavily on dictionary-based access to pretty much everything, and you cannot easily convert "string hash" access into "record offset" access, because you cannot know a priori what object has what structure and converting hash into offset is basically the same dictionary lookup. For example:

a = A() a.field = varname + 1

What can you optimize here? What "varname" is? What A's structure is? Is "A" a class or a function? Not only you are unable tell the semantic of the code just by looking at the code — you can't even tell the semantic after you've examined the "A" and "varname" on some previous iteration, because somebody might've declared/modified those on outer scope or directly modified "A" or "varname".

Last year in my spare time I've been working on an unpublished library for python multitasking with shared memory structures (probably will make some blog post in few weeks and link it here), and I also encountered the problem of inherently inefficient implementation of python basic types. However, I'm yet to find the solution without breaking compatibility with existing code. For example, if you look at ctypes, they have some very efficient containers, but using them in a regular python code is a pain, and the c-python interface eats most performance benefits of efficient containers.

So what's really needed for optimization of python is some kind of python subset, like RPython but probably more human-friendly, so efficient containers can really become efficient while automatic optimizer can select or create automatically those efficient containers. Just like V8 JS engine does, which stores objects in records with static structure. It happens to works in JS for most cases. Countrary, in Python it does not work for most cases, that's why we have so much struggle optimizing the Python.

sciclaw · on Oct 20, 2020

I hope this goes forward. Not just for the speed, but the energy benefits.

One thing I love about C and golang is how fast they make hardware feel. They can do much more with less hardware. I love writing Python, but it does feel a bit heavy. If every machine using Python required half as much hardware/power that would be amazing.

kzrdude · on Oct 20, 2020

Every python program starting 50% faster would be awesome (importing packages is slow)

beagle3 · on Oct 20, 2020

Related (though seems abandoned:) https://github.com/microsoft/Pyjion

est · on Oct 21, 2020

Author's email contains an website, hotpy.org, which redirects to here https://sites.google.com/site/makingcpythonfast/

Which leads to an now deleted repo https://bitbucket.org/markshannon/hotpy_2

Searching for HotPy on HN yields another now 404 URL: http://www.dcs.gla.ac.uk/~marks/

linsomniac · on Oct 20, 2020

I feel the need. The need... for speed. See you in Iceland!

https://wiki.python.org/moin/NeedForSpeed

Was just talking about that trip with my wife last night, in relation to COVID statistics. "Measuring things sucks! I'm still traumatized from spending a week trying to reliably measure speedups in Python."

antpls · on Oct 21, 2020

While it's always an interesting topic, since I used Python for the first time 10 years ago, it has been tried and tried again to only realize the same thing again and again :

1) the Python language was never made for speed of execution, it was made for speed of programming. It is an awesome language for prototyping, teaching and scripting

2) If you want speed, it likely means you are doing math operations, for example ML. In that case, you'd better learn to use state-of-the-art math libraries, made of decades of hardware and mathematical expert knowledge you will never beat.

It would serve you better to :

- learn about existing state-of-the-art compiled librairies and tools in your problem space

- learn to profile your Python code

Python is a great orchestrator to glue libraries and external systems together. If you are reinventing everything in Python, you are not solving actual problems. It's matter of using the right tool for the right task.

Python is a mean to achieve something greater, it's not an end by itself.

quotemstr · on Oct 21, 2020

All the code speedups in the world won't help if the allocator remains slow and escape analysis doesn't yield a lot of stack allocation opportunities. A nice JIT (tracing or not) needs to be combined with a nice GC to generate the maximum benefit.

mark-r · on Oct 20, 2020

I have a method for turning integers into strings that is much faster than Python's built in method when used with very large integers. I wonder if there's a place for it in this project?

bastawhiz · on Oct 21, 2020

If your method is performant, then you should be able to simply file a pull request to improve what's there. I don't see a reason you'd need to tie that contribution to any existing effort.

mark-r · on Oct 21, 2020

Having never contributed to an open source project, particularly one as popular and complex as Python, I'm not sure of the amount of justification necessary. Would I need to write a PEP?

pansa2 · on Oct 21, 2020

> Would I need to write a PEP?

Only if you want to change the language itself, AFAIK. Optimisations that don’t change behaviour shouldn’t require a PEP.

bastawhiz · on Oct 21, 2020

I contributed a small fix for a bug in CPython in 2012 or 2013 and it was fairly trivial.

kzrdude · on Oct 20, 2020

Couldn't Mark go for being employed in a sponsoring/hosting company for 6-12 months? It shouldn't be impossible to pitch this, I think.

chenzhekl · on Oct 21, 2020

I also suspect the viability at first, but later I saw Mark's prior work. Hope this can provide some background for the discussion: https://sites.google.com/site/makingcpythonfast/

jbritton · on Oct 21, 2020

I don’t know anything about psyco, but is there some possibility of reviving that project. Is it just too much work?

cybersol · on Oct 21, 2020

The people who made psycho decided they needed a JIT to really make Python much faster. So they created PyPy and developed it instead.

jbritton · on Oct 21, 2020

It is my understanding that PyPy often cannot be used due to extension problems. I didn’t think psyco had this problem.

Waterluvian · on Oct 20, 2020

How do you estimate the performance improvement without actually implementing it first?

Are any of those details published?

mrich · on Oct 20, 2020

There needs to be a sensible way to measure progress, so a set of benchmarks needs to be chosen. Not sure all funders can agree on those. Also, I believe many large companies who are running code at truly large scale are already using low-level languages like C++ (Google).

roca · on Oct 20, 2020

There isn't any discussion in the document of why this effort will succeed when previous efforts along very similar lines failed.

coldtea · on Oct 20, 2020

Haven't seen this discussed on HN, and seems like huge news.

Why isn't this more known? Is it moving forward? Has it been announced officially...

pizza234 · on Oct 20, 2020

I personally think it's a bit premature to talk [edit: for the developers to talk] about it - they're attributing 2.25x to the addition of a JIT, which generally takes a long time to develop and tweak, adds a significant complexity, and makes the performance profile complex (ie. slow start, slow first requests, etc.).

I like though, to see the JIT wave across interpreted languages (see Ruby). I'm also curious to see what's going to be the implementation.

coldtea · on Oct 20, 2020

>I personally think it's a bit premature to talk about it

Has that stopped HN before? We have devoted several top posts to the V language, LightTable, and several clearly doomed-to-fail efforts. This, in comparison is rather tame, from a legitimate source, and even if it just gives 2x performance, that will still be something.

PHP, for one, pulled it off, with PHP7.

pizza234 · on Oct 20, 2020

I meant on the developer's side, rather than for the public to discuss it. I've edited the post to make it clearer.

fnord123 · on Oct 20, 2020

I personally think tracing JIT is a bad technology. It's over complicated and makes it hard to reason about runtime space and time efficiencies of code.

I don't know the best route for python to get more performance but it probably involves allowing people to opt out of dynamism and basically doing what cython does but for all possible code.

oefrha · on Oct 20, 2020

You're basically talking about mypyc: https://github.com/python/mypy/tree/master/mypyc

coldtea · on Oct 20, 2020

>I personally think tracing JIT is a bad technology. It's over complicated and makes it hard to reason about runtime space and time efficiencies of code.

Hasn't Strongtalk, Java, and JS put those concerns to sleep?

fnord123 · on Oct 21, 2020

No, trying to benchmark Java and JS are the reason for the concerns.

coldtea · on Oct 22, 2020

Yeah, if we only had closer speed to Java and JS for CPython, I'd have no concern...

jashmatthews · on Oct 21, 2020

None of the current Java or JS engines use tracing JITs, though?

fnord123 · on Oct 21, 2020

Is there a precise difference between how openjdk works where it needs a warmup to begin running performantly and how a tracing jit works?

If there a precise difference between how Ignition and Turbofan interact that is not a tracing jit? I really thought both Java and JS implementations were tracing jits. Because if I'm going to whinge about technology I don't like I want to whinge about the precise problem. :)

rurban · on Oct 21, 2020

Also LuaJit.

But just the most basic optimizations as done in PHP, a good lisp, scheme or lua would reach the 5x goal.

dna_polymerase · on Oct 20, 2020

The problem with this and all other efforts for speed in CPython is the interop with C extensions. For JITted Python look at PyPy, sure the JIT gives you some performance, but they broke all C extensions along the way.

ploxiln · on Oct 20, 2020

Many C extensions work with PyPy now. I have a service that uses pylibmc and pycurl, and it works with PyPy. Those extensions didn't rewrite with cffi or anything like that, PyPy added a compatibility layer, it's impressive.

But, you don't get the JIT performance if you use the C extensions a lot, it's better for PyPy performance if you find a pure-python alternative for extensions used in "hot paths".

chrisseaton · on Oct 20, 2020

The TruffleRuby approach is to run the C extensions in an interpreter with a JIT. This way you can change the C extension interface without changing the code, and the optimisations apply to both at the same time.

https://www.youtube.com/watch?v=YLtjkP9bD_U

MaxBarraclough · on Oct 20, 2020

Shouldn't a well-designed C extension interface work fine regardless of whether a JIT is used? Java's JNI interface, for instance, works fine with or without a JIT. Is it just that the interface was designed without JIT in mind?

chrisseaton · on Oct 20, 2020

> Shouldn't a well-designed C extension interface work fine...

Yes!

But we aren't working with one of those.

heavenlyblue · on Oct 20, 2020

> they broke all C extensions along the way.

It doesn't have to be this way though?

fizixer · on Oct 20, 2020

Is there a name for the idea of "adaptive JIT"?

where the code starts executing without JIT, while another thread is instantiated that is doing JITting in the background, and as soon as the JITting is done, JITted version takes over seamlessly for the rest of the execution.

chrisseaton · on Oct 20, 2020

> Is there a name for the idea of "adaptive JIT"?

Some people say 'dynamic JIT' (for example HotSpot) as opposed to a 'static JIT' (for example .NET.)

coldtea · on Oct 20, 2020

Most modern JITs have several tiers like that...

yxhuvud · on Oct 20, 2020

Because it is on the idea stage without funding.

My guess is that it will stay that way. That said, it is probably feasible to do some parts of it, similar to the 3x3 project Ruby has. I wouldn't be surprised if numpy glue is a lot more tractable for optimization than Rails, but the interaction with C libs would pose some serious issues.

dna_polymerase · on Oct 20, 2020

Maybe because the repo is 2 hours old, provides no actual information other than some hand waving. Also the author asks for $2 Million and doesn't seem to be a Python Core Developer. Given the repos age I'm probably answering to Mark Shannon himself.

Edit: This post wasn't supposed to be an hostile attack against anybody. Just found it odd to ask why an 2 hour old repo wasn't more known. I did not know that Mark Shannon is a core dev. Certainly did not pick that up from the repos contents.

nemetroid · on Oct 20, 2020

Mark Shannon definitely is a core developer, though I agree that it's odd to ask "why isn't this more known?" about a two hours old repo.

coldtea · on Oct 20, 2020

He is a core developer (and long time core contributor), and he doesn't ask for "$2 million dollars".

In fact the post says: "The PSF seems to be the obvious organization to coordinate funding".

And no, I'm not Mark, somebody send me the link. Conspiracy theory much?

antoinealb · on Oct 20, 2020

The repository linked mentions that each stage should cost about $500k and there are 4 stages. You can argue that he is not asking for it, but at least he is valuating it at $2M.

coldtea · on Oct 20, 2020

Didn't say he isn't valuating it at $2M. It's all on the post. What he doesn't do is say he should be the one to handle those (or even get those).

XorNot · on Oct 21, 2020

200k per year would not be the worst salary to pay a developer for this work, and I'd say he's outlined about 5 to 10 years of work. Feels like you could negotiate a package of escalating compensation contingent on delivery milestones.

ebg13 · on Oct 20, 2020

I think Mark Shannon is a Python core dev now. But I agree with your other reasons.

RantyDave · on Oct 20, 2020

Yeah, I dunno. The more I work with Python (CPython) the more I'm starting to like that it can be used as a scripting layer over "really complicated stuff".

IE if you want to store and mess about with some data, Python is fine. If you want to store and mess about with a lot of data, use pandas, numpy etc. etc.

cheph · on Oct 20, 2020

I wonder why they don't just shift to JVM or even .net core or mono where I think a lot of this is already solved problems.

coldtea · on Oct 20, 2020

Because then they have other problems...