One of the awesome things about Rust is how easy it is to build Rust projects. Looking at the readme, it's a simple `cargo run` to compile and run the project. Versus CPython where you have to worry about Makefiles and configuration and third party libraries in your system.
That's the biggest killer feature for me vs. C/C++. Any time I write C++, I avoid libraries like the plague unless they do a ton of stuff because it's such a hassle.
Does Cargo support building C or C++ dependencies? I see a lot of projects with a mix of Rust and C/C++ and I’d assumed that just meant I’d have to deal with 2 build systems.
I looked at Nix to help with multi-language builds (and cross compiling), but only had limited success.
It does support calling into other compilers and toolchains through build scripts and such. Take cc-rs[0] for example: this allows building C and C++ files natively without even calling an executable yourself.
In practice, I'd expect libraries to just call make/cmake/ninja for you, or (like openssl-sys) ask you to install the necessary libraries using your favourite package manager.
Yes, through a `build.rs` file and crates such as `pkg-config`. Quite a few projects depends on c/c++ source and are merely rust bindings. However, the majority of crates don't require external dependencies
RustPython still uses the GIL. The GIL is not a consequence of technical limitations in the language used for the interpreter, meaning you could also remove the GIL in CPython. But it’s a design decision that makes many things in the interpreter and the code simpler.
pip/poetry/venv are all for python code, whereas the op is talking about building the CPython interpreter itself which is written in C. Hence makefiles etc.
At first I thought this would be about performance, but now I think it's not.
Their github.io page mentions interesting use cases though:
> RustPython can be embedded into Rust programs to use Python as a scripting language for your application, or it can be compiled to WebAssembly in order to run Python in the browser
This seems like a very optimistic take. For some reason, it made me think about spectre and heartbleed and how my computer just kept getting slower a couple years back.
I've been playing with making a rust interpreter and reading other people's work. Here's an interesting blog post (not mine) on making a interpreter in rust match the speed of a c implementation. It seems folks often have to resort to unsafe rust to get the performance they want. I don't claim to fully understand that and am looking for more information myself.
With webservers we already saw that the way to make new things fast in Rust is to use lots of unsafe Rust when needed, optimize the code, and after that think about the right abstractions that minimize the unsafe code surface.
Well there's no production class garbage collector in Rust yet, so in RustPython they're just using Rc's I think. That's one place to start. The other will probably be the quality of the JIT.
Lua has been used a lot for that kind of embedding in the past. I think because it's easy to move data between Lua and C? Python would certainly be a big step up but I don't know how hard that kind of integration is.
Why is it such a great language? As a developer entirely divorced from the gaming industry aside from occasionally dabbling in modding, lua has always seemed completely inscrutable to me when I have encountered it.
Its obscenely easy/consistent memory layout. It makes embedding it into another application straight-forward and easy to manage. Extending the host interface to the embedded scripts is very simple because of this. It's also serialize-able to a degree.
Python is also a decent glue language but creating a Python interpreter is a bunch more work and the API is a bit harder to work with.
They have a clean API for embedding. All Lua interpreter state is kept in a struct that you alloc. If you alloc two distinct structs, you have two completely independent Lua vms and you can run them on independent threads if you want to no problem, and so on. (Few language runtimes are clean in this way.)
The language is _tiny_ (and kept that way on purpose)
Aside from stuff like metatables(which may require you to play around for a bit to understand their value), you can pick it up in a couple of hours. I'm not even kidding. So much so that you see people modifying code without even looking at documentation.
The first problem is "Which Lua?" Lua, by itself, is almost completely useless. You first need to compile/install a bunch of things in order to make Lua useful. This has the Perl problem that everybody uses a different dialect of Lua.
The second part is that the Lua constructs for programming in the large are very weak. I believe that Adobe had a postmortem about this (Lightroom, I think?)
And then there is the language, itself. For me, the 1-based indexing is death in this day and age. Sorry. Zero-based is the dominant ecosystem and not fitting into that is simply not acceptable.
> You first need to compile/install a bunch of things in order to make Lua useful.
How is this any different from pretty much any language out there? From a quick eyeball of RustPython's Cargo.toml[1], there are about 70 different dependencies which all need to be compiled. I haven't worked too much with Autoconf, but I am pretty sure CPython has quite a few dependencies.
> The second part is that the Lua constructs for programming in the large are very weak.
This is deliberate because it forces you to use the tools you are given instead of reimplementing features that can already be implemented by using other primitives in Lua.
> I believe that Adobe had a postmortem about this (Lightroom, I think?)
I can't find this article. Has anyone else had any luck?
The Civ4 core game dll was its own bottleneck. It's is terribly inefficient c++. Years ago, I was working on my own mod to optimize it because I got tired of waiting 15 minutes for the AI to complete a turn on a huge map late game. It was not hard to get it down to just a few minutes. I think it could be used as a poster child for a code base written by someone expecting the optimizer to work miracles. Lots of macros hiding expensive call chains used in tight nested loops, repeated calls in loops to get the same item that should have been called once outside of the loop(s), etc.
Python was not the bottleneck, just the scapegoat for badly written code.
Python is the most incredible language because everything's dynamic. But it comes at an enormous cost - you can't make many assumptions or optimizations easily. It has tremendous utility for many applications. It's this utility and the wide body of existing software packages which makes it so tempting to use as a scripting engine. But it would be a bad move for any scripting that's in some critical region.
It would be nice to have an official, more performant statically typed/less dynamic subset of Python.
Yeah we have mypy, but its not official and not integrated. And subsets that do have a performance benefit (like TorchScript) are niche and/or not sustainably supported.
I think the worst thing about Python and Blender was not necessarily the choice of language, but the frequent API changes and not having anything close to an IDE-like environment.
Not only did old examples often just not work, there was no serious code completion to facilitate creating anything from scratch.
Exactly this. One of the reasons Maya remains favored in studio environments is its strong API stability guarantees. Python is excellent for automation when performance is less of a concern.
As far as I can tell the maintenance overhead to mapping all the available type from/to python as nicely usable python APIs is a major pain and not just because of cythons current implementation but of e.g. conceptually mismatch between "good" python and "good" C++/Rust/whatever implementations. I.e. you could say that you have a paradigm mismatch between the underlying implementation (of whatever you extend) ad the python API which makes the already hard problem of keeping an API stable even harder.
Furthermore tooling to work in such contexts (as a API consumer) often is very limited or does not exists, but the API producer seldomly has the time/money to produce such tooling either.
Lastly there is the remaining problem of Python by itself being really slow, for it's main use-cases this often doesn't matter as e.g. the scientific computations are run by native extensions python delegates the work, too. But this would mean that you have to also support native extensions support or hope that no extension needs to be really fast (not needing to be really fast is surprisingly often the case) or similar solutions. In turn choosing something which has a good JIT, AOT or similar compilation has benefits.
Both of those things are already possible with CPython. For the former, I'd recommend PyO3; for the latter, pyodide. These projects take CPython as a dependency (though pyodide has to apply a few patches on top) and automate the annoying finicky FFI parts of getting it to talk to Rust or JavaScript.
to be clear i don't know if the project supports the feature out of the box; it just seems to me like something it would enable, and that i would have expected to see called out as a use case in the README.
python on the browser is a good use-case, a kinda edge computing scenario where a python API responds with data and python code for client to post-process before display/acting-on.
Yeah that's right, we use the RustPython parser and lexer internally. Works great! And it's actually seen a bunch of improvements over the past few months -- we contributed back some CPython compatibility improvements and a couple new language features (parenthesized with statements, match statements, *except support).
Cool project. But I think any project like this, if they want to get real world adoption with the masses, should create a "batteries included" distribution similar to Anaconda that comes bundled with all the most popular libraries, where everything has been tested and known to work. Something like Anaconda without a GIL that just worked and had much better multi-threading support would be amazing.
Anaconda is a for-profit venture, while RustPython seems to be a loosely collected group of FOSS developers wanting to hack on a Python interpreter in Rust. Building a "batteries included" distribution seems like a noble goal but might be a bit too much to chew at the current stage.
For sure, but I don't think it has to be as complete or stable/reliable as Anaconda. Just something that bundles the most popular and important libraries, like requests, httpx, pandas, pytorch, sqlalchemy, fastapi, etc. Stuff that most Python devs are going to want or need, but which they probably won't be willing to spend too much time hacking to get to work with a non-standard Python implementation. Without that, I think it gets relegated to some Fibonacci.py type examples that, while cool, aren't of much use to most devs.
It could run with a significant performance penalty, and I would be thrilled if I knew it could package numpy, scipy, pandas, matplotlib, and jupyter. Those are the heavy hitter libraries which are likely to be dependent upon some hard packaging requirements.
> For your big question about a CPython replacement I think that this is our current goal. I do not think people will be using RustPython in their production any time soon but I do think we can be competitive in WASM. Having said that I do not think we should be 100% compatible with CPython. For example we have chosen to implement threading without the GIL. Being compatible with CPython is very helpful as we can use the documentation and tests of CPython.
The hardest part of supplanting CPython is the fact that the FFI is already validated on a huge amount of implementations. I believe Cinder and Pyston work out of the box but they are on older versions of python but Pyston wants to be merged back into CPython and so does Cinder (or at least what is relevant). On the other hand JIT in CPython can be achieved by other python packages. Pyston has also extracted the JIT and can be added to 3.7 - 3.10 by installing it. See https://pypi.org/project/pyston/
Pypy is a separate reference implementations that has tried to achieve something different but hasn't supplanted python for years. It has had a JIT and the ability to change garbage collectors but the system stuffers when interacting with C FFIs
Pyodide is inthe browser so its something completely different but has no support for CFFIs.
IronPython (.net), Jython are inactive. Also GIL removal from CPython has been attempted multiple times but now that CPython is being funded by Microsoft to become faster makes it more unlikely that CPython will be supplanted.
GraalVM also has a nascent Python runtime, which could be extremely performant.
But I think OS devs dont really want to put optimization work into GraalVM, as their hard worked could get sucked up and locked behind the Enterprise Edition.
Mainly resource exhaustion from developers makes the other runtimes fade into the background while resources being put into web deployment (pyodide) and microcontrollers (micropython and circuitpython) are a much better idea.
Its a "universal" VM from Oracle that supports many languages and other bells and whistles like AOT compilation of interpreted languages: https://www.graalvm.org/
But its Java support is most prominant, and the its runtime is significantly faster than OpenJDK.
Which is why its so promising for Python. I'd argue that OpenJDK is way ahead of the Pyton runtime, and GraalVM is way ahead of OpenJDK.
But its also kinda iffy because:
- Python support isn't very good now
- Oracle is Oracle. Hence they have the better optimizations walled off behind a "Enteprise Edition" registration and license.
There are already other implementations. But CPython is the reference implementation. That's part of why we're only now seeing optimizations that reduce code clarity for reading. (i think, dont quote me on this)
No need to quote you, but GvR himself [0]. That's the Faster CPython initiative, which has been ongoing for about 2 or 3 years I think. 3.11 got some nice speedups from it, and 3.12 is on its way to get more. [1]
I think the question is how those problems are applicable specifically to CPython's use of C.
I agree that it would be better, all else being equal, if CPython were implemented in a memory-safe language (though it would almost certainly need to use unsafe escape hatches in some places), but I think this'd be more viable if it were done by incrementally migrating the existing codebase with the involvement of the current maintainers, rather than as a third-party rewrite from scratch.
They're doing threading without the GIL, which means it's not compatible with existing C extensions.
CPython is also building a -no-gil compile-time option. If that gets traction, some of the big performance-oriented C extensions like Numpy etc would likely build a non-GIL extension version too.
Then, if RustPython's non-GIL implementation is compatible with CPython's, and their C ABI is compatible, those C extensions might work.
Relying on the cpython gil and releasing the gil are two different thing:
1. releasing the gil means multithreading is opt-in for a given code section in NumPy. Only very specific parts of the code need to be threadsafe.
2. not relying on a gil in cpython runtime means multithreading becomes opt out. Now all the code by default needs to be threadsafe, including the libs you depend on.
A lot of C/C++/Fortran scientific code is not thread safe, and the whole scientific python ecosystem depends heavily on those codebases.
It releases the GIL only in very specific sections. Most of the Numpy C code runs under the gil.
A quick check on master shows only 10-20 calls using NPY_BEGIN_ALLOW_THREADS (which is an alias to Py_BEGIN_ALLOW_THREADS).
A lot of the NumPy code manipulates python runtime objects, and doing so without thread safety would likely break everywhere. A lot of efforts would be needed to gradually make large C extension thread safe.
Are there really that many Python libraries implemented in C? Genuine question - I thought there were a few very high profile ones, but I always figured there weren't many (like 10% or less, total finger-in-the-air guess)
I think the question is whether it would suffice to port the few most popular native modules to directly use this project's bespoke Rust API instead of the standard CPython API. This'd probably be less work than building a real emulation layer for the CPython API, especially if they don't want to include the GIL. But if Python users need a lot of different native modules and not just a few popular ones, then that won't suffice.
The absolute number (or fraction) implemented in C is less important than the high profile ones. numpy/scipy alone are used in a huge number of projects so if you can't support those two libraries alone it's pretty much a non starter.
This is what I meant, “implemented in C” and “use the C API” are basically the same thing at some level - if they depend on the C/Python interface at all
The ones that are fast are implemented in something other than Python. Most of them C, some Rust, but all native code. Python itself is just too slow to be useful in many scenarios.