Hacker News new | past | comments | ask | show | jobs | submit login
RustPython – A Python-3 (CPython >= 3.11.0) Interpreter written in Rust (github.com/rustpython)
229 points by yla92 on March 7, 2023 | hide | past | favorite | 136 comments



One of the awesome things about Rust is how easy it is to build Rust projects. Looking at the readme, it's a simple `cargo run` to compile and run the project. Versus CPython where you have to worry about Makefiles and configuration and third party libraries in your system.


For a lot of Rust projects this is true.

But if they interface with external libraries then some extra steps may be needed.

Not to take away from the excitement. I love Rust and cargo is great.


Extra steps i.e. install pkg-config and said library on your system


It’s gonna depend on what kind of system you use and what kind of external things are needed.

It can be more involved, it can be less involved.

For example https://github.com/servo/servo seems a bit involved to set up for development but at the same time not too bad either.


That's what Nix is for ;)


I kind of hate Rust, but I do think cargo looks great!


That's the biggest killer feature for me vs. C/C++. Any time I write C++, I avoid libraries like the plague unless they do a ton of stuff because it's such a hassle.


Does Cargo support building C or C++ dependencies? I see a lot of projects with a mix of Rust and C/C++ and I’d assumed that just meant I’d have to deal with 2 build systems.

I looked at Nix to help with multi-language builds (and cross compiling), but only had limited success.


It does support calling into other compilers and toolchains through build scripts and such. Take cc-rs[0] for example: this allows building C and C++ files natively without even calling an executable yourself.

In practice, I'd expect libraries to just call make/cmake/ninja for you, or (like openssl-sys) ask you to install the necessary libraries using your favourite package manager.

[0]: https://github.com/rust-lang/cc-rs


That’s pretty much what I was looking for, thanks!


Yes, through a `build.rs` file and crates such as `pkg-config`. Quite a few projects depends on c/c++ source and are merely rust bindings. However, the majority of crates don't require external dependencies


It could technically do it with a build.rs script, but it typically expects to find the .so/.a on your system.


Yes rust has user friendly tooling that's far superior to pip poetry pipenv and venv etc

Since rust Python can theoretically access Tokio does it make it avoid the cpython gil issues


RustPython still uses the GIL. The GIL is not a consequence of technical limitations in the language used for the interpreter, meaning you could also remove the GIL in CPython. But it’s a design decision that makes many things in the interpreter and the code simpler.


RustPython doesn't use GIL at all. RustPython use individual lock for each data type. So RustPython has more multithreading bugs.


You mean more multithreaded bugs are possible in the Python code that it's running, right? Not that RustPython itself has more multithreading bugs?


RustPython itself has more bugs. e.g. `dict()` doesn't work well on some multithread scenario yet.


Why does rust python need to use Gil

Is it an artificial artifact or is it necessary to make rust python work?


pip/poetry/venv are all for python code, whereas the op is talking about building the CPython interpreter itself which is written in C. Hence makefiles etc.


cpython isn't particularly hard to compile. Pretty much just git clone, ./configure, and make in my experience.


At first I thought this would be about performance, but now I think it's not.

Their github.io page mentions interesting use cases though:

> RustPython can be embedded into Rust programs to use Python as a scripting language for your application, or it can be compiled to WebAssembly in order to run Python in the browser


Yeah the performance comparison doesn't look too great yet: https://rustpython.github.io/benchmarks

Currently it's more about clarity and a nice clean code base, but it can only get faster from here


This seems like a very optimistic take. For some reason, it made me think about spectre and heartbleed and how my computer just kept getting slower a couple years back.


Any idea where the slowness is coming from?


RustPython didn't adapt the optimization designs yet. One of the current major goal is this optimization: https://github.com/RustPython/RustPython/issues/3244

RustPython even cannot run `1+1` without calling `int.__add__` yet. And it is working on https://github.com/RustPython/RustPython/pull/4615

So it still has long way to go not C vs Rust but from the design level.


I've been playing with making a rust interpreter and reading other people's work. Here's an interesting blog post (not mine) on making a interpreter in rust match the speed of a c implementation. It seems folks often have to resort to unsafe rust to get the performance they want. I don't claim to fully understand that and am looking for more information myself.

http://www.dannyvankooten.com/blog/2022/rewriting-interprete...


With webservers we already saw that the way to make new things fast in Rust is to use lots of unsafe Rust when needed, optimize the code, and after that think about the right abstractions that minimize the unsafe code surface.


Well there's no production class garbage collector in Rust yet, so in RustPython they're just using Rc's I think. That's one place to start. The other will probably be the quality of the JIT.


Does Python have a GC these days?


It uses refcounting in the common case and then a second cycle-detecting GC for the less common cases


Ah, I forgot the cycle-detector.

Thanks!


This is what Rust game engines need.

Some sort of sensible scripting language to do the game logic in.


Lua has been used a lot for that kind of embedding in the past. I think because it's easy to move data between Lua and C? Python would certainly be a big step up but I don't know how hard that kind of integration is.


Lua is a great language for game scripting. Perhaps that’s a better option.


Why is it such a great language? As a developer entirely divorced from the gaming industry aside from occasionally dabbling in modding, lua has always seemed completely inscrutable to me when I have encountered it.


Its obscenely easy/consistent memory layout. It makes embedding it into another application straight-forward and easy to manage. Extending the host interface to the embedded scripts is very simple because of this. It's also serialize-able to a degree.

Python is also a decent glue language but creating a Python interpreter is a bunch more work and the API is a bit harder to work with.


They have a clean API for embedding. All Lua interpreter state is kept in a struct that you alloc. If you alloc two distinct structs, you have two completely independent Lua vms and you can run them on independent threads if you want to no problem, and so on. (Few language runtimes are clean in this way.)


yes, python is basically impossible to embed in a multithreaded app.


Inscrutable?

The language is _tiny_ (and kept that way on purpose)

Aside from stuff like metatables(which may require you to play around for a bit to understand their value), you can pick it up in a couple of hours. I'm not even kidding. So much so that you see people modifying code without even looking at documentation.

https://www.lua.org/manual/5.4/

https://learnxinyminutes.com/docs/lua/


I think the main attractions are that it has pretty minimal syntax + core libraries, does a good job binding to C FFIs, and the interpreter is fast.


LuaJIT is insanely good


The C API is pretty straightforward and has a stack object to avoid lots of reference counting you would need with Python's c API.

It's also a tiny codebase which is easy to vendor and ship.


It’s a tiny but capable language that’s easy to embed, and can be surprisingly fast with LuaJIT.


Sorry, Lua is not simply not a great language.

The first problem is "Which Lua?" Lua, by itself, is almost completely useless. You first need to compile/install a bunch of things in order to make Lua useful. This has the Perl problem that everybody uses a different dialect of Lua.

The second part is that the Lua constructs for programming in the large are very weak. I believe that Adobe had a postmortem about this (Lightroom, I think?)

And then there is the language, itself. For me, the 1-based indexing is death in this day and age. Sorry. Zero-based is the dominant ecosystem and not fitting into that is simply not acceptable.


> You first need to compile/install a bunch of things in order to make Lua useful.

How is this any different from pretty much any language out there? From a quick eyeball of RustPython's Cargo.toml[1], there are about 70 different dependencies which all need to be compiled. I haven't worked too much with Autoconf, but I am pretty sure CPython has quite a few dependencies.

[1]: https://github.com/RustPython/RustPython/blob/main/Cargo.tom...

> The second part is that the Lua constructs for programming in the large are very weak.

This is deliberate because it forces you to use the tools you are given instead of reimplementing features that can already be implemented by using other primitives in Lua.

> I believe that Adobe had a postmortem about this (Lightroom, I think?)

I can't find this article. Has anyone else had any luck?


Lua is most hated lang


It seems reasonable in my ~300 line neovim config :D

and yes; i know that's a ridiculous example that's not really comparable for anything significant.


> Lua is most hated lang

Impossible. Most developers outside the games industry don't even know it exists.

... and if they knew about it, they would use it much more.


do you work in games


Civ4 used Python scripting, and it was a horrendous performance bottleneck :/


The Civ4 core game dll was its own bottleneck. It's is terribly inefficient c++. Years ago, I was working on my own mod to optimize it because I got tired of waiting 15 minutes for the AI to complete a turn on a huge map late game. It was not hard to get it down to just a few minutes. I think it could be used as a poster child for a code base written by someone expecting the optimizer to work miracles. Lots of macros hiding expensive call chains used in tight nested loops, repeated calls in loops to get the same item that should have been called once outside of the loop(s), etc.

Python was not the bottleneck, just the scapegoat for badly written code.


Yes, but (AFAIK) Python itself also became a bottleneck in the Civ4 gigamods, like the (still active) Caveman2Cosmos.


How did you even start optimizing Civ4? Did you replace some exported dll functions with your own? How did you access the state in memory?


Firaxis provided the source for the core DLL for modding. Don't recall exactly when, but it may have been with the Beyond the Sword expansion.


Python is the most incredible language because everything's dynamic. But it comes at an enormous cost - you can't make many assumptions or optimizations easily. It has tremendous utility for many applications. It's this utility and the wide body of existing software packages which makes it so tempting to use as a scripting engine. But it would be a bad move for any scripting that's in some critical region.


It would be nice to have an official, more performant statically typed/less dynamic subset of Python.

Yeah we have mypy, but its not official and not integrated. And subsets that do have a performance benefit (like TorchScript) are niche and/or not sustainably supported.


Not statically typed, but rpython is a statically analyzable subset of python that is used to write pypy: https://rpython.readthedocs.io/en/latest/rpython.html


That looks very reasonable. Converting a Python codebase would be much easier than porting to a new language.

Now all your imports, on the other hand...


Well -- other languages have shown up to try and fill this niche.


I think the greatest idea for Rust scripting is just supporting WASM scripts. I've had some success toying with wasmer in bevy.


If experiences I heard from previous projects(1) (successfully) embedding cython are to go by python isn't necessary good choice for this.

(1): e.g. awesome, blender


I think the worst thing about Python and Blender was not necessarily the choice of language, but the frequent API changes and not having anything close to an IDE-like environment.

Not only did old examples often just not work, there was no serious code completion to facilitate creating anything from scratch.


Exactly this. One of the reasons Maya remains favored in studio environments is its strong API stability guarantees. Python is excellent for automation when performance is less of a concern.


I'm curious what you've heard about them that suggests this?


As far as I can tell the maintenance overhead to mapping all the available type from/to python as nicely usable python APIs is a major pain and not just because of cythons current implementation but of e.g. conceptually mismatch between "good" python and "good" C++/Rust/whatever implementations. I.e. you could say that you have a paradigm mismatch between the underlying implementation (of whatever you extend) ad the python API which makes the already hard problem of keeping an API stable even harder.

Furthermore tooling to work in such contexts (as a API consumer) often is very limited or does not exists, but the API producer seldomly has the time/money to produce such tooling either.

Lastly there is the remaining problem of Python by itself being really slow, for it's main use-cases this often doesn't matter as e.g. the scientific computations are run by native extensions python delegates the work, too. But this would mean that you have to also support native extensions support or hope that no extension needs to be really fast (not needing to be really fast is surprisingly often the case) or similar solutions. In turn choosing something which has a good JIT, AOT or similar compilation has benefits.



Both of those things are already possible with CPython. For the former, I'd recommend PyO3; for the latter, pyodide. These projects take CPython as a dependency (though pyodide has to apply a few patches on top) and automate the annoying finicky FFI parts of getting it to talk to Rust or JavaScript.


surprised they didn't mention packaging a python project into a static executable, that sounds like a huge use case.


Thank you. That sounds like an important feature.


to be clear i don't know if the project supports the feature out of the box; it just seems to me like something it would enable, and that i would have expected to see called out as a use case in the README.


Thanks for clarifying. Does not look like its there today but hopefully down the road it will.


python on the browser is a good use-case, a kinda edge computing scenario where a python API responds with data and python code for client to post-process before display/acting-on.


You can use just the parser if you want to: https://rustpython.github.io/blog/2020/04/02/thing-explainer...

Which is great for embedding or customizing your scripting language needs.


I believe ruff [0] is using this for their linting / fixing

[0]: https://beta.ruff.rs/docs/


Yeah that's right, we use the RustPython parser and lexer internally. Works great! And it's actually seen a bunch of improvements over the past few months -- we contributed back some CPython compatibility improvements and a couple new language features (parenthesized with statements, match statements, *except support).


The first link on that page 404s. Anyone know the correct link to the parser code?



Ah, thanks. Fixing this now.


Now it would be cool to see a Rust compiler written in Python, to close the circle.


Python has many qualities, but I wouldn't use it to write a compiler.

That being said, I've seen much worse languages being used to write compilers, so... who knows?


Python has really nice lexer and parser libraries. It isn’t Haskell or ML but it works. It is slow as expected.


Lexing and parsing aren't the scary part.

I'm much more nervous about maintaining invariants as you implement compilation passes, especially in the absence of powerful pattern-matching.


Cool project. But I think any project like this, if they want to get real world adoption with the masses, should create a "batteries included" distribution similar to Anaconda that comes bundled with all the most popular libraries, where everything has been tested and known to work. Something like Anaconda without a GIL that just worked and had much better multi-threading support would be amazing.


Anaconda is a for-profit venture, while RustPython seems to be a loosely collected group of FOSS developers wanting to hack on a Python interpreter in Rust. Building a "batteries included" distribution seems like a noble goal but might be a bit too much to chew at the current stage.


For sure, but I don't think it has to be as complete or stable/reliable as Anaconda. Just something that bundles the most popular and important libraries, like requests, httpx, pandas, pytorch, sqlalchemy, fastapi, etc. Stuff that most Python devs are going to want or need, but which they probably won't be willing to spend too much time hacking to get to work with a non-standard Python implementation. Without that, I think it gets relegated to some Fibonacci.py type examples that, while cool, aren't of much use to most devs.


conda-forge isn't. And the conda tooling is not "for profit". It's only the anaconda distribution that is.


Yes, conda and miniconda is FOSS while Anaconda is not. I don't think I said something that is not correct.


It could run with a significant performance penalty, and I would be thrilled if I knew it could package numpy, scipy, pandas, matplotlib, and jupyter. Those are the heavy hitter libraries which are likely to be dependent upon some hard packaging requirements.


Interested to know how well their JIT works and in what scenarios they see improvement


RustPython doesn't have working JIT in the common meaning of JIT yet.


For jitted python, check PyPy.org


Is there any chance that a project like RustPython could supplant CPython?


> For your big question about a CPython replacement I think that this is our current goal. I do not think people will be using RustPython in their production any time soon but I do think we can be competitive in WASM. Having said that I do not think we should be 100% compatible with CPython. For example we have chosen to implement threading without the GIL. Being compatible with CPython is very helpful as we can use the documentation and tests of CPython.

https://github.com/RustPython/RustPython/issues/1940#issueco...


Not having a GIL makes this very interesting for all production uses. I hope this works out.


It also rules out using most C extensions which rely on it. These libraries are most of the draw of the Python ecosystem


TLDR: No

The hardest part of supplanting CPython is the fact that the FFI is already validated on a huge amount of implementations. I believe Cinder and Pyston work out of the box but they are on older versions of python but Pyston wants to be merged back into CPython and so does Cinder (or at least what is relevant). On the other hand JIT in CPython can be achieved by other python packages. Pyston has also extracted the JIT and can be added to 3.7 - 3.10 by installing it. See https://pypi.org/project/pyston/

Pypy is a separate reference implementations that has tried to achieve something different but hasn't supplanted python for years. It has had a JIT and the ability to change garbage collectors but the system stuffers when interacting with C FFIs

Pyodide is inthe browser so its something completely different but has no support for CFFIs.

IronPython (.net), Jython are inactive. Also GIL removal from CPython has been attempted multiple times but now that CPython is being funded by Microsoft to become faster makes it more unlikely that CPython will be supplanted.


GraalVM also has a nascent Python runtime, which could be extremely performant.

But I think OS devs dont really want to put optimization work into GraalVM, as their hard worked could get sucked up and locked behind the Enterprise Edition.


Mainly resource exhaustion from developers makes the other runtimes fade into the background while resources being put into web deployment (pyodide) and microcontrollers (micropython and circuitpython) are a much better idea.


What's graalvm and where does it fit in with the larger Python ecosystem


Its a "universal" VM from Oracle that supports many languages and other bells and whistles like AOT compilation of interpreted languages: https://www.graalvm.org/

But its Java support is most prominant, and the its runtime is significantly faster than OpenJDK.

Which is why its so promising for Python. I'd argue that OpenJDK is way ahead of the Pyton runtime, and GraalVM is way ahead of OpenJDK.

But its also kinda iffy because:

- Python support isn't very good now

- Oracle is Oracle. Hence they have the better optimizations walled off behind a "Enteprise Edition" registration and license.


I wouldn't touch it with a ten foot pole because other than Oracle suing Google over API reimplementation they can eventually abandon the runtime.


Depends...

There are already other implementations. But CPython is the reference implementation. That's part of why we're only now seeing optimizations that reduce code clarity for reading. (i think, dont quote me on this)


No need to quote you, but GvR himself [0]. That's the Faster CPython initiative, which has been ongoing for about 2 or 3 years I think. 3.11 got some nice speedups from it, and 3.12 is on its way to get more. [1]

[0] https://github.com/faster-cpython/ideas/blob/main/FasterCPyt...

[1] https://github.com/faster-cpython/ideas


To solve what problem exactly?


To solve the problem that is C.


Any constructive comments or we’re /r/ProgrammerHumor now?


it wasn't suppose to be funny. C has specific problems. Rust is a solution to those specific problems.


I think the question is how those problems are applicable specifically to CPython's use of C.

I agree that it would be better, all else being equal, if CPython were implemented in a memory-safe language (though it would almost certainly need to use unsafe escape hatches in some places), but I think this'd be more viable if it were done by incrementally migrating the existing codebase with the involvement of the current maintainers, rather than as a third-party rewrite from scratch.


Then there is python written in python called pypy


If they support python packages exposing the C ABI and expose a C ABI for embedding, I can't see why not.


They're doing threading without the GIL, which means it's not compatible with existing C extensions.

CPython is also building a -no-gil compile-time option. If that gets traction, some of the big performance-oriented C extensions like Numpy etc would likely build a non-GIL extension version too.

Then, if RustPython's non-GIL implementation is compatible with CPython's, and their C ABI is compatible, those C extensions might work.


gil or no gil is mostly irrelevant for numpy, an extension that does most work with gil released.


Relying on the cpython gil and releasing the gil are two different thing:

  1. releasing the gil means multithreading is opt-in for a given code section in NumPy. Only very specific parts of the code need to be threadsafe.
  2.  not relying on a gil in cpython runtime means multithreading becomes opt out. Now all the code by default needs to be threadsafe, including the libs you depend on.
A lot of C/C++/Fortran scientific code is not thread safe, and the whole scientific python ecosystem depends heavily on those codebases.


I understand what you are saying, but I don't see how it's relevant? Numpy doesn't rely on the gil, which is why it releases it.


It releases the GIL only in very specific sections. Most of the Numpy C code runs under the gil.

A quick check on master shows only 10-20 calls using NPY_BEGIN_ALLOW_THREADS (which is an alias to Py_BEGIN_ALLOW_THREADS).

A lot of the NumPy code manipulates python runtime objects, and doing so without thread safety would likely break everywhere. A lot of efforts would be needed to gradually make large C extension thread safe.


Only if the CPython devs will it so.


With HPy yes


I really hope they didn’t also port the gil too


The problem is that if you don't port the GIL, you can't port the C API, and thus you can't support most of the Python library ecosystem.


Are there really that many Python libraries implemented in C? Genuine question - I thought there were a few very high profile ones, but I always figured there weren't many (like 10% or less, total finger-in-the-air guess)


The number of them isn't as relevant as their usage. Lots of data science related modules are effectively wrappers around C code. Here's a few:

  NumPy 
  pandas 
  Matplotlib 
  TensorFlow 
  PyTorch  
These modules are critical to many of the workloads using Python (ML / AI / Data Science)


Right this is what I thought - a small number of popular ones, rather than an overall majority across pypi packages


In the data science space, which is probably one of the biggest users of Python, an interpreter that doesn't support these is dead in the water.


I think the question is whether it would suffice to port the few most popular native modules to directly use this project's bespoke Rust API instead of the standard CPython API. This'd probably be less work than building a real emulation layer for the CPython API, especially if they don't want to include the GIL. But if Python users need a lot of different native modules and not just a few popular ones, then that won't suffice.


"there were a few very high profile ones"

The absolute number (or fraction) implemented in C is less important than the high profile ones. numpy/scipy alone are used in a huge number of projects so if you can't support those two libraries alone it's pretty much a non starter.


Oh I’m not downplaying them, if they were small in number and minor then I’m sure CPython would’ve happily dumped the GIL already.


> Are there really that many Python libraries implemented in C?

There are a lot that use the C API, even if some of those aren't implemented in C (some are actually implemented in Rust.)


This is what I meant, “implemented in C” and “use the C API” are basically the same thing at some level - if they depend on the C/Python interface at all


The ones that are fast are implemented in something other than Python. Most of them C, some Rust, but all native code. Python itself is just too slow to be useful in many scenarios.


Porting the GIL is trivial compared to what all it would take to support the C API


I have some good news for you


HN needs a new tab for titles that end in written in Rust


Why don't you write a 10 line JS Userscript for yourself and make this a reality?



Less than 10 lines, but only does half the job. Where's the new tab that has only Rust stories?


Left as an exercise for the inquisitive reader :)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: