I always wondered how Python can be one of the world's most popular languages without anyone (company) stepping up and make the runtime as fast as modern JavaScript runtimes.
A big part of what made Python so successful was how easy it was to extend with C modules. It turns out to be very hard to JIT Python without breaking these, and most people don’t want a Python that doesn’t support C extension modules.
The JavaScript VMs often break their extensions APIs for speed, but their users are more used to this.
Which is why I'm shocked that Python's big "we're breaking backwards compatibility" release (Python 3) was mostly just for Unicode strings. It seems like the C API and the various __builtins__ introspection API thingies should've been the real focus on breaking backwards compatibility so that Python would have a better future for improvements like this.
On the other hand, rewriting the C modules and adapting them to a different C API is very straightforward after you've done 1 or 2 of such modules. Perhaps it's even something that could be done by training an LLM like Copilot.
That's breakage you'd have to tread carefully on; and given the 2to3 experience, there would have to be immediate reward to entice people to undertake the conversion. No one's interested in even minor code breakage for minor short-term gain.
anyone (company) stepping up and make the runtime as fast as modern JavaScript runtimes.
There are a lot of faster python runtimes out there. Both Google and Instagram/Meta have done a lot of work on this, mostly to solve internal problems they've been having with python performance. Microsoft has also done work on parallel python. There's PyPy and Pythran and no doubt several others. However none of these attempts have managed to be 100% compatible with the current CPython (and more importantly the CPython C API), so they haven't been considered as replacements.
JavaScript had the huge advantage that there was very little mission critical legacy JavaScript code around they had to take into consideration, and no C libraries that they had to stay compatible with. Meaning that modern JavaScript runtime teams could more or less start from scratch. Also the JavaScript world at the time were a lot more OK with different JavaScript runtimes not being 100% compatible with each other. If you 'just' want a faster python runtime that supports most of python and many existing libraries, but are OK with having to rewrite some your existing python code or third party libraries to make it work on that runtime, then there are several to choose from.
JS also had the major advantage of being sandboxed by design, so they could work from there. Most of the technical legacy centered around syntax backwards compatibility, but it's all isolated - so much easier to optimize.
Python with it's C API basically gives you the keys to the kingdom on a machine code level. Modifying something that has an API to connect to essentially anything is not an easy proposition. Of course, it has the advantage that you can make Python faster by performance analysis and moving the expensive parts to optimized C code, if you have the resources.
Google/Instagram have done bits, but the company that's done the most serious work on Python performance is actually Oracle. GraalPython is a meaningfully faster JIT (430% faster vs 7% for this JITC!) and most importantly, it can utilize at least some CPython modules.
They test it against the top 500 modules on PyPI and it's currently compatible with about half:
Node.js and Python 3 came out at around the same time. Python had their chance to tell all the "mission critical legacy code" that it was time to make hard changes.
As much as I would have loved to see some more 'extreme' improvements to python, given how the python community reacted to the relatively minor changes that python 3 brought, anything more extreme would very likely have caused a Perl 6 style situation and quite possibly have killed the language.
Part of the issue with 3 is that the changes were so minor that they were just annoying. Like 2/3 now equals 0.66 instead of 0, thanks for the hard-to-find bugs. `print "foo"` no longer works, cause they felt like it. Improvements like str being unicode made more sense but were quite disruptive and could've been avoided too, just add a new type.
What I would've preferred is they leave all that stuff alone, add nice features like async/await that don't break existing things, and make important changes to the runtime and package manager. Python's packaging is so broken that it's almost mandatory to have a Dockerfile nowadays, while in JS that's not an issue
Python is already fast where it matters: often, it is just used to integrate existing C/C++ libraries like numpy or pytorch. It is more an integration language than one where you write your heavy algorithms in.
For JS, during the time that it received its JITs, there was no cross platform native code equivalent like wasm yet. JS had to compete with plugins written in C/C++ however. There was also competition between browser vendors, which gave the period the name "browser wars". Nowadays at least, the speed improvements for the end user thanks to JIT aren't also that great, Apple provides a mode to turn off JIT entirely for security.
Having recently implemented parallel image rendering in corrscope (https://github.com/corrscope/corrscope/pull/450), I can say that friends don't let friends write performance-critical code in Python. Depending on prebuilt C++ libraries hampers flexibility (eg. you can't customize the memory management or rasterization pipeline of matplotlib). Python's GIL inhibits parallelism within a process, and the workaround of multiprocessing and shared memory is awkward, has inconsistencies between platforms, and loses performance (you can't get matplotlib to render directly to an inter-process shared memory buffer, and the alternative of copying data from matplotlib's framebuffer to shared memory wastes CPU time).
Additionally a lot of the libraries/ecosystem around shared memory (https://docs.python.org/3/library/multiprocessing.shared_mem...) seems poorly conceived. If you pre-open shared memory in a ProcessPoolExecutor's initializer functions, you can't close them when the worker process exits (which might be fine, nobody knows!), but if you instead open and close a shared memory segment on every executor job, it measurably reduces performance, presumably from memory mapping overhead or TLB/page table thrashing.
> Python's GIL inhibits parallelism within a process, and the workaround of multiprocessing and shared memory is awkward, has inconsistencies between platforms, and loses performance
Well, imho the biggest problem with this approach to paralellism is that you're stepping out of the Python world with gc'ed objects etc. and into a world of ctypes and serialization. It's like you're not even programming Python anymore, but more something closer to C with the speed of an interpreted language.
ProcessPoolExecutor doesn't let you supply a callback to run on worker process exit, only startup. Perhaps I could've looked for and tried something like atexit (https://docs.python.org/3/library/atexit.html)? In any case I don't want to touch my code at the moment until I regain interest or hear of resource exhaustion, since "it works".
Fair enough. If you ever do decide to touch it to address that, I suggest (in preference order):
- Subclassing ProcessPoolExecutor such that it spawns multiprocessing.Process objects whose runloop function wraps the stdlib "_process_worker" function in a try/finally which runs your at-shutdown logic. That'll be as reliable as any try/finally (e.g. SIGKILL and certain interpreter faults can bypass it).
- Writing custom destructors of objects in your call arguments which are aware of and can do appropriate cleanup actions for associated SharedMemory objects. This is less preferred than subclassing because of the usual issues with custom destructors: no real exception handling, and objects sneaking out into long-lived/global caches can cause destructors to run late (after the interpreter has torn down things your cleanup logic needs) or not at all.
- Atexit, as you suggest. This is least-preferred because the execution context of atexit code is .... weird, to say the least. Much like a signal handler or pthread_atfork callback, it's not a place that I'd put code that does complicated I/O or depends on the rest of the interpreter being in ordinary conditions.
JavaScript has to be fast because its users were traditionally captive on the platform (it was the only language in the browser).
Python's users can always swap out performance critical components to another language. So Python development delivered more when it focussed on improving strengths rather than mitigating weaknesses.
In a way, Python being slow is just a sign of a healthy platform ecosystem allowing comparative advantages to shine.
I think the thing with python is that it's always been "fast enough" and if not you can always reach out to natively implemented modules. On the flipside javascript was the main language embedded in web browsers.
There has been a lot of competition to make browsers fast.
Nowadays there are 3 main JS engines, V8 backed by google, JavaScriptCore backed by apple, and spidermonkey backed by mozilla.
If python had been the language embedded into web browsers, then maybe we would see 3 competing python engines with crazy performance.
The alternative interpreters for python have always been a bit more niche than Cpython, but now that Guido works at microsoft there has been a bit more of a push to make it faster
Because it's already fast enough for most of us ? Anecdote, but I've had my share of slow things in Javascript that are not slow in Python. Try to generate a SHA256 checksum for a big file in the browser...
Have you tried to generate a SHA256 checksum for a file in the browser, no matter what crypto lib or api is available to you ?
Have you tried to generate it using Python standard lib ?
I did, and doing it in the browser was so bad that it was unusable. I suspect that it's not the crypto that's slow but the file reading. But anyway...
> SHA256 in pure Python would be unusably slow
None would do that because:
> Python's SHA256 is written in C
Hence why comparing "pure python" to "pure javascript" is mostly irrelevant for most day to day tasks, like most benchmarks.
> Javascript is fast. Browsers are fast.
Well, no they were not for my use case. Browsers are really slow at generating file checksums.
I thought that perhaps the difference could be due to the JavaScript version having to first read the entire file before getting started on hashing it , whereas the Python does it incrementally (which the browser API doesn't support [0]). But changing the Python version to work like the JavaScript version doesn't make a big difference: 30 vs 35 ms (with a ~50 MB file) on my machine.
The slowest part in the JavaScript version seems to be reading the file, accounting for 70–80% of the runtime in both Firefox and Chromium.
Maybe 8 years is not much in a career ? Maybe we had to support one of those browsers that did not support it ? Maybe your snarky comment is out of place ? And even to this day it's still significantly slower than Python stdlib according to the tester. So much for "why python not as fast as js, python is slow, blah blah blah".
The Pytthon standard lib calls out to hand optimized assembly language versions of the crypto algos. It is of no relevance to a JIT-vs-interpreted debate.
It absolutely is relevant to the "python is slow reee" nonsense tho, which is the subject. Python-the-language being slow is not relevant for a lot of the users, because even if they don't know they use Python mostly as a convenient interface to huge piles of native code which does the actual work.
And as noted upthread that's a significant part of the uptake of Python in scientific fields, and why pypy despite the heroic work that's gone into it is often a non-entity.
This is a major problem in scientific fields. Currently there are sort of "two tiers" of scientific programmers: ones who write the fast binary libraries and ones that use these from Python (until they encounter e.g. having to loop and they are SOL).
This is known as the two language problem. It arises from Python being slow to run and compiled languages being bad to write. Julia tries to solve this (but fails due to implementation details). Numba etc try to hack around it.
Pypy is sadly vaporware. The failure from the beginning was not supporting most popular (scientific) Python libraries. It nowadays kind of does, but is brittle and often hard to set up. And anyway Pypy is not very fast compared to e.g. V8 or SpiderMonkey.
The major problem in scientific fields is not this, but the amount of incompetence and the race-to-the-bottom environment which enables it. Grant organizations don't demand rigor and efficiency, they demand shiny papers. And that's what we get. With god awful code and very questionable scientific value.
There are such issues, but I don't think they are a very direct cause of the two language problem.
And even these issues are part of the greater problem of late stage capitalism that in general produces god awful stuff with questionable value. E.g. vast majority of industry code is such.
fyi: the author of that post is a current Julia user and intended the post as counterpoint to their normally enthusiastic endorsements. so while it is a good intro to some of the shortfalls of the language, I'm not sure the author would agree that Julia has "failed" due to these details
Yes, but it's a good list of the major problems, and laudable for a self-professed "stan" to be upfront about them.
It's my assesment that the problems listed in there are a cause why Julia will not take off and we're largely stuck with Python for the foreseeable future.
It is worth noting that the first of the reasons presented is significantly improved in Julia 1.9 and 1.10 (released ~8 months and ~1 month ago). The time for `using BioSequences, FASTX` on 1.10 is down to 0.14 seconds on my computer (from 0.62 seconds on 1.8 when the blog post was published).
There is pleeeenty of mission critical stuff written in Python, for which interpreter speed is a primary concern. This has been true for decades. Maybe not in your industry, but there are other Python users.
The point of Python is quickly integrating a very wide range of fast libraries written in other languages though, you can't ignore that performance just because it's not written in Python.
In lots of applications, all the computations already happen inside native libraries, e.g. Numpy, PyTorch, TensorFlow, JAX etc.
And if you have a complicate computation graph, there are already JITs on this level, based on Python code, e.g. see torch.compile, or TF XLA (done by default via tf.function), JAX, etc.
It's also important to do JIT on this level, to really be able to fuse CUDA ops, etc. A generic Python JIT probably cannot really do this, as this is CUDA specific, or TPU specific, etc.
Because the reason why Python is one of the world's most popular language (a large set of scientific computing C extensions) is bound to every implementation details of the interpreter itself.
There have been several attempts. For example, Google tried to introduce a JIT in 2011 with a project named Unladen Swallow, but that ended up getting abandoned.
Unladen Swallow was massively over-hyped. It was talked about as though Google had a large team writing “V8 for Python”, but IIRC it was really just an internship project.
You might want to checkout Mojo, which is not a runtime but a different language, but also designed to be a superset of Python. Beware though that it's not yet open source, which is slated for this Q1
1. Javascript is a less dynamic language than Python and numbers are all float64 which makes it a lot easier to make fast.
2. If you want to run fast code on the web you only have one option: make Javascript faster. (Ok we have WASM now but that didn't exist at the time of the Javascript Speed wars.) If you want to run fast code on your desktop you have a MUCH easier option: don't use Python.
You could probably optimistically optimise some code, assuming it doesn't use any of the dynamic features of Python. You're going to get crazy performance cliffs though.
New runtimes like NodeJS have expanded JS beyond web, and JS's syntax has improved the past several years. But before that happened, Python on its own was way easier for non-web scripts, web servers, and math/science/ML/etc. Optimized native libs and ecosystems for those things got built a lot earlier around Python, in some cases before NodeJS even existed.
Python's syntax is still nicer for mathy stuff, to the point where I'd go into job coding interviews using Python despite having used more JS lately. And I'm comparing to JS because it's the closest thing, while others like Java are/were far more cumbersome for these uses.
I think python is very well suited to people who do computation in Excel spreadsheets. For actual CS students, I'd rather see something like scheme be a first language (but maybe I'm just an old person)
They do both Python and Scheme in the same Berkeley intro to CS class. But I think the point of Scheme is more to expand students' thinking with a very different language. The CS fundamentals are still covered more in the Python part of the course.
Always interested in replies to this kind of comment, which basically boil down to "Python is so slow that we have to write any important code in C. And this is somehow a good thing."
I mean, it's great that you can write some of your code in C. But wouldn't it be great if you could just write your libraries in Python and have them still be really fast?
When i was a scientist, speed was getting the code written during my break, and if it took all afternoon to run that's fine because i was in the lab anyway.
Even as i moved more into the software engineer direction, and started profiling code more, most of the bottlenecks come from things like "creating objects on every incovation rather than pooling them", "blocking IO", "using a bad algorithm" or "using the wrong datasctructure for the task". problems that exist in every language, though "bad algorithm" or "using the wrong datasctructure" might matter less in a faster language you're still leaving performance on the table.
> "Python is so slow that we have to write any important code in C. And this is somehow a good thing."
The good thing is that python has a very vibrant ecosystem filled with great libraries, so we don't have to write it in C, because somebody else has. We can just benefit from that when the situation calls for it
>I mean, it's great that you can write some of your code in C. But wouldn't it be great if you could just write your libraries in Python and have them still be really fast?
That really depends.
To make the issue clear, let's think about a similar situation:
bash is nice because you can plug together inputs and outputs of different sub-executables (like grep, sed and so on) and have a big "functional" pipeline deliver the final result.
Your idea would be "wouldn't it be great if you could just write your libraries in bash and have them still be really fast?". Not if you make bash into C, tanking productivity. And definitely not if that new bash can't run the old grep anymore (which is what usually is implied by the proposal in the case of Python).
Also, I'm fine with not writing my search engines, databases and matrix multiplication algorithm implementations in bash, really. So are most other people, I suspect.
Also, many proposals would weaken Python-the-language so it's not as expressive anymore. But I want it to stay as dynamic as it is. It's nice as a scripting language about 30 levels above bash.
As always, there are tradeoffs. Also with this proposal there will be tradeoffs. Are the tradeoffs worth it or not?
For the record, rewriting BLAS in Python (or anything else), even if the result was faster (!), would be a phenomenally bad idea. It would just introduce bugs, waste everyone's time, essentially be a fork of BLAS. There's no upside I can see that justifies it.
But wouldn't it be great if you could just write your libraries in Python
Everybody obviously wants that. The question is are you willing to lose what you have in order to hopefully, eventually, get there. If Python 3 development stopped and Python 4 came out tomorrow and was 5x faster than python 3 and a promise of being 50-100x faster in the future, but you have to rewrite all the libraries that use the C API, it would probably be DOA and kill python. People who want a faster 'almost python' already have several options to choose from, none of which are popular. Or they use Julia.
The reason this approach is so much slower than some of the other 'fast' pythons out there that have come before is that they are making sure you don't have to rewrite a bunch of existing libraries.
That is the problem with all the fast python implementations that have come before. Yes, they're faster than 'normal' python in many benchmarks, but they don't support the entire current ecosystem. For example Instagram's python implementation is blazing fast for doing exactly what Instagram is using python for, but is probably completely useless for what I'm using python for.
Yes, but not so good when the JIT-ed Python can no longer reference those fast C code others have written. Every Python JIT project so far has suffered from incompatibility with some C-base Python extension, and users just go back to the slow interpreter in those cases.
> basically boil down to "Python is so slow that we have to write any important code in C. And this is somehow a good thing."
I think that's a pretty ignorant interpretation. Python has been built to have a giant ecosystem of useful, feature-complete, stable, well built code that has been used for decades and for which there is no need to reinvent the wheel. If that already describes the universe of libraries that you /need/ to be extremely fast and the rest of your code is IO limited and not CPU limited, why reinvent the wheel?
That makes your comment even more inaccurate because you likely don't need to write any "important" (which you are stretching to mean "fast") code in C -- you utilize existing off the shelf fast libraries that are written in Fortran, CUDA, C, Rust or any other language a pre-existing ecosystem was built in.
Try and think of a language that has mature capabilities for domains as far away as what Django solves for, what pandas solves for, what pytorch solves for, and still has fantastic tooling like jupyter and streamlit. I can't think of any other language that has the combined off the shelf breadth and depth of Python. I don't want to have to write fast code in any language unless forced to, because the vast majority of the time I can customize a great off the shelf package and only write the remaining 1% of glue. I can't see why a professional engineer would 99% of the time would need to take a remotely different approach.
languages don’t need to all be good at the same thing. Python currently excels as a glue language you use to write drivers for modules written in lower-level languages, which is a niche that (afaik) nobody else seems to fill right now.
While I’m all for making Python itself faster, it would be a shame to lose the glue language par excellence.