(Mark Shannon has been a core dev since 2018-05-15: https://devguide.python.org/developers/ though his involvement dates back to 2011 according to git log.)
He, Inada Naoki and Victor Stinner on the optimization front are some developers that are capable of pulling a JIT of CPython.
This is something that I feel that Python need to do to keep the relevance. .Net Core, Rust and GoLang are examples of modern languages that deliver on the performance front.
I love Python and it is what use everyday. But very often I feel the need to a better code generation. Numba and Cython feels like a glue. PyPy it is slow with extensions, and this is a huge part of Python ecosystem. Although HPy (https://github.com/hpyproject/hpy) is trying to change this.
I feel that this is a must to Python, and would love to contribute money to this task.
It’s a chicken and egg situation. If Python weren’t so slow, there wouldn’t be such a heavy reliance on C extensions in the ecosystem, and consequently it would be easier to optimize CPython and PyPy because there would be less at stake with respect to breaking the ecosystem. Not only that, but lots of other things get better if C-extensions go away: much less need for executable project files which dramatically improves package manager performance (no need to recursively download a package and run setup.py to determine the next level of dependencies just to build the dependency tree), packaging gets dramatically easier (C projects have no standard reproducible build system so every C project including C extension projects must have its own package recipe and it’s often impractical to support more than a handful of target systems), etc.
Hpy appears to be the right approach to fixing this, but there needs to be a concerted effort to migrate the ecosystem toward it and then deprecate the old expansive interface. And then at some point in the distant future we can expect things to be as nice as other ecosystems are today.
This is absolutely the problem. Fast languages like C#, Java, Go and even JS almost nothing is native. It's normal to have even large projects with no FFI'd native code.
Python and Ruby are trapped in the C extensions spiral. Languages are slow so everything uses C bindings, which makes it hard to speed up the language
And in what concerns Java and .NET platforms, it has been acknowledged that too much of low level capabilities have been left out and those features are being added, even if slower than many of us would like to, so eventually even the stuff that might still require a native library today, won't be necessary in the future.
Translation layer. There ought to be a thick, fat, slow layer between Fast Python and CPython.
You can write fast native you. You can write fast code by calling into native. The two rarely cross.
If we were clever, we could even have one front-end and multiple backends to go into fast Python, CPython, Cython, embedded micropython, etc. If we were double-plus-clever, we'd add GPGPU, multicore, and compute cluster to the mix.
We'd need someone with deep pockets and high risk tolerance to pull it off. It's a big change. Big changes often fail. Still, when I look at the amount of Python at Google... Well, it'd sure be a pity if all of that became legacy code when we all jumped ship to Julia.
I do believe this will be hard to change, though: Coming from academia my first Python experience was as a front-end to actively developed Fortran code. In that segment, the easy link to external code is one of the main selling points of Python.
To extend what you said a bit: PyPy is not only slow with extensions, it's also massively slower for running short-lived code that isn't repeated much, which is precisely the scenario of many if not most command line tools written in Python. Although I'm not sure there are enough people concerned about Python's performance in this scenario.
And yeah, I would be happy to pitch in too if CPython can be 50% faster reliably. It only takes 500 people each pitching in $100 to reach $50k.
Edit: somehow got $50k etched in mind when it says $500k. That’s more difficult from individual funding.
OTOH, $2M is a sneeze for large corporations. I totally understand why they want to push community funding, but I wonder if they're open to having corporate sponsors or something along those lines. The idea would be "yes, you pay the bill, but you don't get to push an agenda with this".
I hope that discussion thread continues, because so far there's some really juicy sass happening in there.
> > 1. I already have working code for the first stage.
> I don't mean to be negative, or hostile, but this sounds like you are saying "I have a patch for Python that will make it 1.5 times faster, but you will never see it unless you pay me!"
> I have this thing, e.g an iPhone, if you want it you must pay me.
> I think that speeding CPython 50% is worth a few hundred iPhones.
I wonder if he considered publishing it under a restrictive license that doesn't allow real world use. Then people could scrutinize his claims but there would still be an incentive to pay up for him to relicense it.
Why not release this super-dooper Python interpreter under a proprietary license? Just have people pay $1000 for it. If it works, then it’s a steal for people who use Python for business. “Well, you have to take me at my word and pay me first” is highly suspect.
>The losers are the people who made Python what it is today, for free.
Why are they "losers"? Did they sign any contract that something is owned to them? They did it for fun/personal reasons/ideology, and what they created is used by millions.
If they wanted to get paid for doing it, they could very well have had as well - several core python contributors have worked in large companies doing Python (including core Python) work, most famous of all Guido.
I've been observing something akin to Zawinski's Law ("Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.") but for programming languages:
Every dynamically-typed programming language attempts to expand until it has static types and compiles to machine code.
* Common Lisp has long had type annotations that can be use to generate specialized code.
* JS started out as a simple bytecode VM, then got a JIT from V8 and others, then TypeScript came and gave it static types.
* Python has had several JITs over the years—Unladen Swallow, PyPy, etc. It got static types with mypy (and others) and is now adding type annotations directly to the core language.
* Ruby's 3x3 plan involves adding a type-specializing JIT. Sorbet adds static types and I believe Matz wants the core language to go in that direction.
* Facebook created Hack to statically type PHP and PHP core added "scalar type dependencies". Facebook's HipHop VM brings a JIT to PHP.
* LuaJIT brought a high performance JIT to Lua and there's been a number of projects that layer static type annotations onto the language.
* Dart started with an optional type system and moved to a fully sound static type system.
So what I see is a language that starts out simple and dynamically typed with a set of core libraries and idioms designed around dynamism. Then later people add static types on the front end to help people maintain larger programs. And they add type specializing JITs on the back end to generate faster code.
But right in the middle you're still stuck with a mountain of existing code designed around the assumption that code and data don't need to be statically shaped. So even though you end up doing all the work (and adding all the complexity) to design a static type system and native code generator, you don't get the full benefits.
The static type systems are almost always unsound in order to play nice with existing dynamic idioms, so the back end can't use the static types for optimization purposes. You end up with these fantastically complex type systems like TypeScript's and these incredibly complex JITs, but you still don't get the performance you get from a simple fully-statically typed language like Go or, hell, Pascal.
I think the reasonable take-away is that if you ever intend your language to be used for large programs, just take the hit and start off with static types. Your future self will thank you.
Not to beat this into the ground, but I realized I could have written a much shorter response.
This idea to "start with static types" presumes that there are no benefits to dynamically typed languages!
But the simplest argument is also the one I think that's the most true: dynamically typed languages let you explore the problem domain with faster feedback, leading to more successful software.
-----
Quote from John Ousterhout: The greatest performance improvement of all is when a system goes from not-working to working [1]
And we should never be cavalier about how hard that is. Most software projects fail; the successful ones are miracles!
I also take an expansive definition of "working" -- i.e. "useful to its users".
Also, TBH, types are kind of the wrong answer to the question. The question should be: How do we make the language efficiently expressive? For my money, that means defining constraints primarily on the code’s main interfaces, which a given constraint may be a combination of traditional generics, dependent types, and/or declarative run-time checks on input/output values. (e.g. Eiffel comes to mind.)
For instance, in my kiwi language I can define an argument as:
list (whole number (0, 100), 4, 4)
Which is to say a 4-item list, where each item is an integer between 0 and 100. And since I don’t want to type all that every time I can give that constraint a descriptive name:
define type (CMYK color, list (whole number (0, 100), 4, 4))
I can even include user documentation in that definition so the whole lot’s self-describing.
Kiwi’s very late-bound and interpreted, so its constraints are implemented solely as run-time coercions with optional bounds checks; nothing fancy. But the semantics are quite well formalized so a linter could be implemented as an assistant authoring tool for users, and if you can implement a linter then you can implement a type checker; and so on.
What matters is that the language has a formal mechanism by which it can guarantee that a given value will always satisfy one or more user-defined requirements; and whether those requirements are checked at compilation, execution, or some combination is secondary to that. How rigorous the user makes these declarations, and if/where she makes them, is entirely up to her.
Remember, the goal of any language is to please its users. Not the machines it runs on, nor the designers who created it. And users’ needs are not constant, not even across the development cycle of a single program, so a language that cannot adapt to those users’ changing requirements as it goes has already failed its first hurdle.
..
Perhaps if authors of existing languages like Python and C put less effort into chasing the constantly-diminishing returns of post-hoc micro-optimizations, and more into thinking how to design the next generation of languages so as to carry forward the good characteristics of their predecessors minus their original already-painted-themselves-into-a-corner limitations, we might actually have languages that tick all the boxes by now.
Current implementation is proprietary so not on my GH, alas. (I am working on a new one, but it’s early days.) Some of its ideas carry over into iris though, which is. HTH
> I've been observing something akin to Zawinski's Law ("Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.") but for programming languages:
Even more applicable, in this case, would be Greenspun's tenth rule:
“Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.”
I totally forgot that Lisp's List processing came from Fortran: a system called FLPL. This had the CAR and CDR naming (in an ugly form, with additional prefix and suffix characters: XCARF and XCDRF).
Statically typed languages are going the other way to. Like focusing on ergonomics for dealing with key-value data with heterogeneous types (like JSON objects), to generics/traits not needing a specific baseclass, and structural typing more generally, and type inference to avoid having to specify types.
So I think we are more seeing an ongoing convergence on combined static/dynamic typing approaches.
Static languages are going all in on type inference. Types aren't annoying if they're hardly ever specified. It's largely a win-win. Even Java has decent type inference on newer versions
I think the convergence here is toward 'automatically discoverable types'. For python et al this means type annotation (because it's extremely difficult for discover types otherwise).
For C++ et al this is things like the 'auto' keyword that allow skipping the type specification where the compiler can work it out.
The end-game is specifying just enough type information that it's still understandable to both humans and compilers.
Typescript is extremely good at this. Rust is pretty good. Even Java is okay now.
I firmly believe type inference is the future. Most objects in dynamic languages are statically typed anyways, the type just isn't exposed at compile time.
Type inference is good, but it’s not a full answer.
For instance, how often do you declare a variable as `int`, when what you really want to say is “an integer in the range 0...100”? A few languages (e.g. Eiffel) provide a formal mechanism for declaring these sorts of constraints, but most don’t, and you end up putting what should be declarative type-level information into the body of your code instead.
And then there’s “cutting-edge” stuff like dependent types, where you really want to express one argument’s type in terms of another argument’s, a classic example being an array indexing method, where you really want to declare the index at compile time as an integer in the range `0..<array.length`, and let the type system propagate that rule and its implications throughout the code that uses it. Whereas most “modern” languages chuck a run-time error if you’re lucky; or just ralph and dump stack like some antiquated 1970s throwback (yeah, looking at you, Apple’s Swift).
Huh. I generally enjoy Typescript, but have been frustrated by its inference at times, particularly with regard to generics -- there are places where it will do fine if you pass an arrow function directly, but if you pull it out into a const, the compiler will no longer be able to infer types (and thus fails to compile if you have noImplicitAny enabled).
I think the reasonable take-away is that if you ever intend your language to be used for large programs, just take the hit and start off with static types. Your future self will thank you.
I think this is missing something: the creators of those languages wanted languages that were fun/productive for writing small programs. That is, extremely flexible languages (and yes maybe they were unaware back in the 90's how this would limit their optimization potential)
And all big programs were once small programs. Facebook is probably one of the easiest to see, since it was "just" a bunch of PHP scripts (although people tend to underestimate/dismiss it for that reason).
Just like nobody ever says: "Well I estimate that in 10 years Facebook will consist of 10M lines of PHP with 5,000 programers working on it -- I should probably write it in another language".
Nobody ever says: "Well I think my language is going to be used by millions of people and will have billions of lines of code".
Well, there probably were people who thought that, but those were exactly the people who didn't make languages as useful as Python and JS :)
----
That said I think the phrase "irrational exuberance languages" referring to Python/JS is kinda funny, and in a way accurate ...
Although again I would say the unexpected part was not that they thought single core scaling would continue forever and make their languages fast, it's that those "slow" languages turned out to be the "best" ones for writing some of the most important systems of the last couple decades (not just commercial ones, but also Wikipedia, BitTorrent, etc.)
----
I'd make another analogy, to ISAs. If you talk to anyone who knows about CPU design, they'll say that x86 is shitty with a big pile of hacks.
"It would be better" if someone designed it from the beginning with current applications in mind. But if anyone actually did that back in 1980, they wouldn't have been successful.
And from my perspective, I mostly don't care, because the C compiler makes it all work for me (although I know the people who make it work care very much).
So I guess the point is that technology adoption proceeds by evolution, and trying to plan 10 or 20 years ahead of time never works.
----
Also, there is a pretty hard tradeoff between static types and metaprogramming. Recent languages have come closer to reconciling these features (Zig, Nim, D), but dynamic languages chose reflection/metaprogramming, and that's a primary reason why they became successful.
Ruby on Rails is a great example of that. It uses Ruby metaprogramming/reflection to a hilt, and lots of people who have no idea what that is love it, and they built tons of things with it (which now makes it an interesting optimization target)
Yeah, there's a longer essay you or I could write about path dependence here. The language that enables you to get to the point where your codebase is that big in the first place may not be the language best suited to that codebase once it has.
That's the main argument in favor of optional type systems like TypeScript. Start small in a dynamically typed language and then add the types later when you need them.
But my personal experience is that I've never found types to cause much friction when programming in the small. What I have found painful is not having GC and not having type inference. If I was doing a startup and needed to be able to prototype and change quickly, then C++ or Java would be pretty painful. But C# or any modern typed, managed language with a decent modern IDE? I would be surprised if you were any less productive than someone using Python or Ruby.
Yeah, "path dependence" pretty much sums up the comment. I should probably write something about that, because a huge factor in the design of https://oilshell.org. But I still get questions about it (like "Why can't we just get rid of shell and start over from scratch?")
A few points I would like to add on:
- I don't view the existence of TypeScript, MyPy, and Sorbet as evidence in favor of static typing. It's evidence in favor of gradual typing!
- The Oil project gave me a lot of experience with the relation between metaprogramming/reflection, dynamic types vs. static types, performance, and code length.
In particular the code moved from dynamic to static typing, and got a lot faster. But dynamic typing wasn't a mistake.
Short recap: Oil's code is 5-7x shorter than bash [1], and a lot of that is due to starting out as dynamically typed, with a lot of metaprogramming. (I still believe "size is code's worst enemy" -- big code is understood more poorly, which makes it harder to modify, regardless of static types.)
It is also something like 30x-50x slower than bash in Python! So, unusably slow. However the surprise is that I statically typed this code, and semi-automatically translated it to C++, and the result is now faster than bash. [2]
-----
So the high level, short code has enough semantic information to be fast (after you add explicit types).
The static typing process mainly evolved expanding metaprogramming to textual code generation! I estimate that this was at least 9 months of rewriting.
That's what your arguments are missing IMO. If you're writing Java in Python, then sure Python is going to seem like it offers no advantages, and it might as well be statically typed.
But that's not how people write programs in dynamic languages (and honestly I thought you would appreciate that more, having written so much about dynamic languages!) The porting process taught me exactly how much dynamism I was using, and it was a lot! It was pulling a lot of weight.
I should show all the code generators and generated code in an essay... it's a very concrete demonstration.
So there is the fallacy of "type inference" solves the problem -- it's not that we're too lazy to write down types; it's that we're using techniques that static type systems can't handle. Good thread about that: https://twitter.com/sliminality/status/1317331149354463232
I'm not saying the Oil experience generalizes, since it's an unusual project, but it's definitely not as simple with "go with static types so you don't get trapped". That said, the conundrum you're talking about is very real.
-----
But despite writing all that, I'm actually leaning in your direction, and I started a statically typed language :)
I would start with an interpreter so it can have metaprogramming (e.g. like a constexpr interpreter, or what Zig does). I would like to add a gradual type system, but I don't really know how to write one, so the first cut will be a traditional static type system. (In this world, getting the 30-50x speedup relies on the program being 100% statically typed, yet gradual typing is still important IMO. It's very simple, no infinite treadmill of JIT work as you see in the "professional" projects.)
-----
This is probably something for an essay, but I would say dynamic types are demonstrably better than static types for at least 3 domains: UI, data science, and security/reverse engineering (and I have a bunch of experience to back this up). Basically anything that involves "learning about the world", or "schema discovery".
Really short summary: I used the MyPy front end, and basically printed its AST as C++. This involves a bunch of hacks, but it works.
The following conditions helped a lot:
- I control all the code in Oil, so I can statically type all of it, which involves both annotations and occasional patches. A shell doesn't have many Python library dependencies; i.e. it doesn't depend on BeautifulSoup or something like that. It's basically all string and data structure manipulation, with a few sys calls.
- Oil has extensive tests. I run the same tests against the translated and compiled C++, which flushes out bugs in the translator. (Something like 915 out of 1700 tests pass now, so the translation process isn't done.)
- I also used unit tests to generate some of the type annotations with pyannotate.
There is still some translation left to do if anyone is interested in helping. You will probably learn something about both Python and C++!
>"It would be better" if someone designed it from the beginning with current applications in mind. But if anyone actually did that back in 1980, they wouldn't have been successful
Strongly disagree. Although x86 was dominant on desktop PCs with MS DOS, Nintendo Entertainment System used MOS 6502 which is a lighter version of m68000, while Sega Mega Drive used a full blown m68000. Apple II desktop also used MOS 6502, HP used PA-RISC for their HP-UX servers, and Sun replaced their 68000s with SPARC in late 80-s. Alpha achitecture was so successfull there was a Windows NT port for this achitecture. So it wasn't really game over as of 80s, and even in 90s the market was still diverse. It's only by 00s x86 together with Windows NT series achieved total dominance, even in small server segment.
>So I guess the point is that technology adoption proceeds by evolution, and trying to plan 10 or 20 years ahead of time never works
Right conclusion - wrong reasoning. Quality of product is never a main driver of sales. That's why to succeed you need to market it first and then elaborate some way to make it usable. That's where, for example, Motorolla, Alpha, MIPS failed, and that's where ARM won as an umbrella brand for actually 4 incompatible architectures (original ARM, AArch32, AArch64, and Thumb a.k.a. SuperH).
One important point being that, while the languages mentioned have expanded to include static typing and native compiling, they haven't stopped being what they were originally -- they have expanded, rather than transformed. In other words, the meaning of "dynamic languages" can be taken to mean "everything a language might need to be efficient and useful".
I think we could've 2 backends if possibile. Interpreter for prototyping and compiler for production. I usually don't want interpreter in PRD and don't want compiler when prototyping. That's why TS is sucessful IMO. You can just test stuff with using emit only while still having pieces of code highlighted that don't make sense yet.
> Then later people add static types on the front end to help people maintain larger programs.
More like, people whose hobby it is to create static front ends for dynamic languages will invariably invade the ecosystem of any sufficiently popular dynamic language.
Most of the projects listed there are made by large companies, not hobbyists. I know there are a lot of dynamic language fans out there, but it really does seem that there is business value in using static languages.
- There are companies which would be willing to pay large amounts of money (more than he's asking for) for the speedups he's promising
- There are other similar efforts underway (such as ours) which have implemented some of his ideas and the gains are much smaller than he anticipates (for real code, microbenchmarks are another story)
Atm this reads like a shopping list, _but_ if it goes ahead, and even if it doesn’t fully deliver, I think this could be the impulse that Python needs to remain relevant in scientific programming.
Clearly it is the leader right now by a long way, but Julia is growing really fast.
If you’re talking about Debug vs Release mode in Visual Studio, there’s a huge performance hit with STL iterators in Debug mode. Like sometimes 10-20x slower for tight loops. This is obviously C++ not C, so it doesn’t apply to Python directly. But in general toolchains can have very unoptimized code paths in debug mode.
This seems like the first step towards such improvements, but I guess recompiling libraries will be mandatory, and already existing wheels won't be valid for Python 3.10
Isn't this more scary than the Python 3 migration?
Dude, I had to recompile all my extensions for every minor version of Python as recently as ... 2013? Wheels haven't been around for that long. I'd get a new laptop and curse, "Why didn't I think to write down all those Fortran compile flags I used for SciPy!"
What makes you think that the implementation of dicts is a low hanging fruit? Only asking because there has been a lot of work to make them faster: https://www.youtube.com/watch?v=npw4s1QTmPg
Agreed on optimizing core objects. I recently wrote a C base class (https://jcristharif.com/quickle/#structs-and-enums) for defining dataclass-like-types that's noticeably faster (~5-10x) to init/copy/serialize/compare than other options (dataclasses, pydantic, namedtuples...). For some applications I write this has a non-negligible performance impact, without requiring deep interpreter changes. Using the base class is nice - my application objects are still defined in normal python code, but all the heavy lifting is done in the c-extension.
However, this speedup comes at the cost of being less dynamic. I'm not sure how much more optimized core python objects could be without sacrificing some of the dynamism some programs rely on. Python dicts are already pretty optimized as is.
>I recently wrote a C base class (https://jcristharif.com/quickle/#structs-and-enums) for defining dataclass-like-types that's noticeably faster (~5-10x) to init/copy/serialize/compare than other options (dataclasses, pydantic, namedtuples...)
YouTube also encountered the same problem. Their solution sounds kinda like "never use pickle, because it's slow. Use custom serialization".
>It's very hard to estimate how much speedup a JIT will get you on a dynamic language like python and x5 speedup seems unrealistic.
You can have a 2x or 10x for many use cases speedup without a JIT, as PHP7 proved. You just need to start with a slow, not very optimized, implementation, which CPython pretty much is.
As for 5x, Javascript has had much more than speed bump than that with its JITs (compared to the interpreted Javascript pre-JSCore, Tracemonkey and V8 circa 1997-2005) and it's just as dynamic as Python...
>There are other lower hanging fruits, like optimizing core data structures (e.g: the implementation of python dicts).
Funny that you should mention it, because the author of the proposal has already done significant work (available since Python 3.3. or so) optimizing the dicts...
I'm not saying you're wrong, but last I checked (admittedly, years ago) there was at least 1 Python JIT with a lot of development and user hours behind it, with good results. Seems like that experience should yield some decent estimates on speedup.
Is there any comprehensive overview of why python is slow in the first place? There seems to be opportunities for small optimizations, but is that the only place where the slowness is? A few years ago I tried to find out what investigations had been done on the speed of bytecode generation and there was basically nothing. No one could even suggest how long bytecode generation took, only that "it's really slow". Isn't the common wisdom to measure, then optimize, not the other way around?
It has nothing to do with it being interpreted (this is a common misidentified issue with dynamic langs).
The fundamental issue is that python is a pointer machine: everything requires a dynamic lookup in memory.
Eg.,
x = [1, 2, 3]
len(x)
Here `x` is an actual string in memory which is a key in a locals() dictionary which holds values. (cf. with C where it is just a memory address).
Likewise the list is a list of pointers (not a sequential array). And its heterogenous, ie., the contents can be of any type.
Likewise `len` is a string into a dictionary of functions which has to be looked up.
etc.
The whole thing is many levels of indirection. Applying an operation to a value (eg., even x + y) requires jumping around the memory of the machine many times.
This is necessary, in general, to deliver on the dynamic lang. features python provides.
Julia solves some of these issues by using static type information to ditch this dynamic behaviour. My suspicion is that python can follow a similar path (eg., above, x should be compiled to a static homogenous array of ints).
The locals dict is lazily created from the locals in the stack frame. No dict there unless you use it. The article I linked to before has a really good explanation of this.
len() only causes one dictionary lookup and then it's cached.
> This is necessary, in general, to deliver on the dynamic lang. features python provides.
It's the most obvious way to implement these features of dynamic languages but not at all necessary.
It also does (to my knowledge) no real optimisation on user code. I've also read that a lot of pythons construction and language semantics make it difficult to implement more performance optimisations.
pypy has been working on it, for 10 years though, and their average is 4x on their benchmarks. Sure, I think this means they do 10x or 20x performance on certain tasks, and that's impressive. But still, it's not easy to do 5x across the board.
Frankly speaking, I had this impression too. The reason why JVM and JS uses own bytecode for implementing JIT-compilation is because main means of optimization are inlining and vector scalarization (convert complex object to simple ones). E.g. you have a container that hold integer and does nothing else, thus when JIT-compiling you can skip the whole boxing/unboxing and manipulate a single integer value stored in register. However, you cannot do that in a simple ways if your integer container is a C extension black box — you cannot "inline" the container's functions and reduce its store-loads.
This is why PyPy reimplements standard library in RPython — so you can JIT-optimize it. But it feels like Mark Shannon knows nothing about these efforts — which is kinda strange considering his position of core CPython developer.
>There are other lower hanging fruits, like optimizing core data structures (e.g: the implementation of python dicts)
Unfortunately, you cannot easily implement efficient data containers without rewriting existent python code. The latter one relies heavily on dictionary-based access to pretty much everything, and you cannot easily convert "string hash" access into "record offset" access, because you cannot know a priori what object has what structure and converting hash into offset is basically the same dictionary lookup. For example:
a = A()
a.field = varname + 1
What can you optimize here? What "varname" is? What A's structure is? Is "A" a class or a function? Not only you are unable tell the semantic of the code just by looking at the code — you can't even tell the semantic after you've examined the "A" and "varname" on some previous iteration, because somebody might've declared/modified those on outer scope or directly modified "A" or "varname".
Last year in my spare time I've been working on an unpublished library for python multitasking with shared memory structures (probably will make some blog post in few weeks and link it here), and I also encountered the problem of inherently inefficient implementation of python basic types. However, I'm yet to find the solution without breaking compatibility with existing code. For example, if you look at ctypes, they have some very efficient containers, but using them in a regular python code is a pain, and the c-python interface eats most performance benefits of efficient containers.
So what's really needed for optimization of python is some kind of python subset, like RPython but probably more human-friendly, so efficient containers can really become efficient while automatic optimizer can select or create automatically those efficient containers. Just like V8 JS engine does, which stores objects in records with static structure. It happens to works in JS for most cases. Countrary, in Python it does not work for most cases, that's why we have so much struggle optimizing the Python.
I hope this goes forward. Not just for the speed, but the energy benefits.
One thing I love about C and golang is how fast they make hardware feel. They can do much more with less hardware. I love writing Python, but it does feel a bit heavy. If every machine using Python required half as much hardware/power that would be amazing.
Was just talking about that trip with my wife last night, in relation to COVID statistics. "Measuring things sucks! I'm still traumatized from spending a week trying to reliably measure speedups in Python."
While it's always an interesting topic, since I used Python for the first time 10 years ago, it has been tried and tried again to only realize the same thing again and again :
1) the Python language was never made for speed of execution, it was made for speed of programming. It is an awesome language for prototyping, teaching and scripting
2) If you want speed, it likely means you are doing math operations, for example ML. In that case, you'd better learn to use state-of-the-art math libraries, made of decades of hardware and mathematical expert knowledge you will never beat.
It would serve you better to :
- learn about existing state-of-the-art compiled librairies and tools in your problem space
- learn to profile your Python code
Python is a great orchestrator to glue libraries and external systems together. If you are reinventing everything in Python, you are not solving actual problems. It's matter of using the right tool for the right task.
Python is a mean to achieve something greater, it's not an end by itself.
All the code speedups in the world won't help if the allocator remains slow and escape analysis doesn't yield a lot of stack allocation opportunities. A nice JIT (tracing or not) needs to be combined with a nice GC to generate the maximum benefit.
I have a method for turning integers into strings that is much faster than Python's built in method when used with very large integers. I wonder if there's a place for it in this project?
If your method is performant, then you should be able to simply file a pull request to improve what's there. I don't see a reason you'd need to tie that contribution to any existing effort.
Having never contributed to an open source project, particularly one as popular and complex as Python, I'm not sure of the amount of justification necessary. Would I need to write a PEP?
There needs to be a sensible way to measure progress, so a set of benchmarks needs to be chosen. Not sure all funders can agree on those. Also, I believe many large companies who are running code at truly large scale are already using low-level languages like C++ (Google).
I personally think it's a bit premature to talk [edit: for the developers to talk] about it - they're attributing 2.25x to the addition of a JIT, which generally takes a long time to develop and tweak, adds a significant complexity, and makes the performance profile complex (ie. slow start, slow first requests, etc.).
I like though, to see the JIT wave across interpreted languages (see Ruby). I'm also curious to see what's going to be the implementation.
>I personally think it's a bit premature to talk about it
Has that stopped HN before? We have devoted several top posts to the V language, LightTable, and several clearly doomed-to-fail efforts. This, in comparison is rather tame, from a legitimate source, and even if it just gives 2x performance, that will still be something.
I personally think tracing JIT is a bad technology. It's over complicated and makes it hard to reason about runtime space and time efficiencies of code.
I don't know the best route for python to get more performance but it probably involves allowing people to opt out of dynamism and basically doing what cython does but for all possible code.
>I personally think tracing JIT is a bad technology. It's over complicated and makes it hard to reason about runtime space and time efficiencies of code.
Hasn't Strongtalk, Java, and JS put those concerns to sleep?
Is there a precise difference between how openjdk works where it needs a warmup to begin running performantly and how a tracing jit works?
If there a precise difference between how Ignition and Turbofan interact that is not a tracing jit? I really thought both Java and JS implementations were tracing jits. Because if I'm going to whinge about technology I don't like I want to whinge about the precise problem. :)
The problem with this and all other efforts for speed in CPython is the interop with C extensions. For JITted Python look at PyPy, sure the JIT gives you some performance, but they broke all C extensions along the way.
Many C extensions work with PyPy now. I have a service that uses pylibmc and pycurl, and it works with PyPy. Those extensions didn't rewrite with cffi or anything like that, PyPy added a compatibility layer, it's impressive.
But, you don't get the JIT performance if you use the C extensions a lot, it's better for PyPy performance if you find a pure-python alternative for extensions used in "hot paths".
The TruffleRuby approach is to run the C extensions in an interpreter with a JIT. This way you can change the C extension interface without changing the code, and the optimisations apply to both at the same time.
Shouldn't a well-designed C extension interface work fine regardless of whether a JIT is used? Java's JNI interface, for instance, works fine with or without a JIT. Is it just that the interface was designed without JIT in mind?
where the code starts executing without JIT, while another thread is instantiated that is doing JITting in the background, and as soon as the JITting is done, JITted version takes over seamlessly for the rest of the execution.
My guess is that it will stay that way. That said, it is probably feasible to do some parts of it, similar to the 3x3 project Ruby has. I wouldn't be surprised if numpy glue is a lot more tractable for optimization than Rails, but the interaction with C libs would pose some serious issues.
Maybe because the repo is 2 hours old, provides no actual information other than some hand waving. Also the author asks for $2 Million and doesn't seem to be a Python Core Developer. Given the repos age I'm probably answering to Mark Shannon himself.
Edit: This post wasn't supposed to be an hostile attack against anybody. Just found it odd to ask why an 2 hour old repo wasn't more known. I did not know that Mark Shannon is a core dev. Certainly did not pick that up from the repos contents.
The repository linked mentions that each stage should cost about $500k and there are 4 stages. You can argue that he is not asking for it, but at least he is valuating it at $2M.
200k per year would not be the worst salary to pay a developer for this work, and I'd say he's outlined about 5 to 10 years of work. Feels like you could negotiate a package of escalating compensation contingent on delivery milestones.
Yeah, I dunno. The more I work with Python (CPython) the more I'm starting to like that it can be used as a scripting layer over "really complicated stuff".
IE if you want to store and mess about with some data, Python is fine. If you want to store and mess about with a lot of data, use pandas, numpy etc. etc.
https://mail.python.org/archives/list/python-dev@python.org/...
(Mark Shannon has been a core dev since 2018-05-15: https://devguide.python.org/developers/ though his involvement dates back to 2011 according to git log.)