For those interested in uv (instead of pip), uv massively sped up the release process for Home Assistant. The time needed to make a release went down from ~2.5 hours to ~20 minutes. See https://developers.home-assistant.io/blog/2024/04/03/build-i... for details. I'm just a HA user btw.
I would say, as someone who works on performance of pip, no one else was able to reproduce OPs severe performance issue, not saying it didn't happen, just it was an edge case on specific hardware (I am assuming it was this issue https://github.com/pypa/pip/issues/12314).
Since it was posted a lot of work was done on areas which likely caused performance problems, and I would expect in the latest version of pip to see at least a doubling in performance, e.g. I created a scenario similar to OPs that dropped from 266 seconds to 48 seconds on my machine, and more improvements have been made since then. However OP has never followed up to let us know if it improved.
Now, that's not to say you shouldn't use uv, it's performance is great. But just a lot of volunteer work has been put in over the last year (well before uv was announced) to improve the default Python package install performance. And one last thing:
> for a non-compiler language?
Installing packages from PyPI can involve compiling C, C++, Rust, etc. Python's packaging is very very flexible, and in lots of cases it can take a lot of time.
Python is slow compared to Rust, obviously. Beyond that, pip is at this point carrying a bunch of legacy decisions because the ludicrously large number of hard left turns the Python packaging ecosystem has taken over the last 20 years.
Home Assistant is an absolute behemoth of a project, especially with regard to dependencies. Dependency resolution across a project of that size is nuts. There are probably few currently projects that’d see as big an improvement aa HA.
I'm not even sure how their packaging even works. Last I checked they had some kind of extra installer that works at runtime?
I have no idea how they keep version conflicts from breaking everything. Do integrations have isolation or something?
I wish python had a native way to have different libraries in the same project depend on different versions of a transitive dependency, seems like that would make a lot of stuff simpler with big projects.
If no hard versions are given, but only e.g. <= 2.1, pip will download EVERY SINGLE VERSION until 2.1 to look for metadata. That easily can take hours if it happens multiple times.
When testing previous versions of uv, I saw it do that too. But uv uses other tricks to speed things up: it downloads in parallel, it takes advantage of PEP-658 metadata (which doesn't need to download the package) and if that metadata is missing it will next try byte range requests to grab just the metadata of the wheel, and so on. pip was learning some of these tricks in recent releases too.
One problem we have is that support for any repository features beyond PEP-503 (the 'simple' html index) is limited or entirely missing in every repo implementation except warehouse - the software that powers pypi. So if you use artifactory, AWS codeartifact, sonatype nexus, etc, because you are running an internal repository, PEP-658 & PEP-691 support will be missing, and uv runs slower; you may not even have accept-ranges support. (and if you want dependabot, you need to have your repository implement parts of the 'warehouse json api' - https://warehouse.pypa.io/api-reference/json.html - for it to understand your internal packages)
I've been playing with https://github.com/simple-repository/simple-repository-serve... as a proxy to try to fix our internal servers to suck less; it's very small codebase and easy to change. Its internal caching implementation isn't great so I wrapped nginx round it too, using cache agressively and use stale-while-revalidate to reduce round trips, it made our artifactory less painful to use, even with pip.
pip will not do that, it will attempt to use the latest version under the user requirements, only if there is a conflict between two packages will it backtrack on old versions of a package, uv does exactly the same.
Further, if a package index supports PEP 658 metadata, pip will use that to resolve and not download the entire wheel.
uv does the same but adds extra optimizations, both clever ones that pip should probably adopt, and ones which involve assumptions that don't strictly comply to the standards, which pip should probably not adopt.
I know python packaging has it's issues, but so far I personally have gotten pretty far with plain pip. The biggest shift to me was switching from original virtualenv to built-in venv module. On the other hand, if I wanted to be really serious about dependency management, I'd steal a page from FAANG, and build a monorepo and avoid all this hassle with package managers.
I would really encourage you to try your hand at at a monorepo. I manage a python monorepo in prod and dependency management is hell. Poetry has some newer features that I am looking at trying to implement, but the state of the ecosystem wrt big monorepos is horrible.
Poetry support for mono repos is really horrible and all the maintainers are saying is „it’s a tool indented to publish packages, not manage your dev environment“.
What is this non-package mode you are referring to?
I was talking about the maintainers not wanting to include features a lot of the userbase would like (like monorepo stuff), because they are saying their target audience is package authors, while in fact most of the users aren't.
I’ve been working on a thing [1] to make monorepo (“workspace”) builds easier, works well with Rye/uv. Doesn’t do anything during dev, just removes the need for hacks and scripts at build-time.
Would be curious if anyone thinks this is a useful direction, ultimately hope uv/hatch include something like this.
Bingo, this is what I was referring to in my comment. And yes, I assume it is quite a lot of labor. But I feel that lot of this perceived friction is from people trying to cut corners and avoid doing that labor, while still getting the benefits.
the overarching dream of computing in general is for a few people to do the labour and then for everyone to benefit through the magic of free replication.
if some smart and dedicated engineers can do the work to build a tool that lets everyone trivially manage a monorepo, that is certainly the best possible situation to end up in.
Package A depends on C 1.0 and B depends on C 2.0. How much work it is to get down to one version of C in your dependencies is up to how different 1.0 & 2.0 is, and how A & B use it. But if you want them resolved, it's up to you to do the engineering to A or B.
Correct. The solution is to modify packages A and or B, which is a high cost approach (hence why FAANGs could throw warm bodies at it, but most everyone else throws their hands up in exasperation).
So the canonical solution to this is to use Bazel. The reason I did not end up using Bazel is because it is a pita to manage and I didn't have enough time to allocate to this problem to implement the FAANG solution.
The shell scripts I wrote are a painful, but less painful than dealing with Bazel.
How is this personal? I have no idea who the person is... I'm commenting on something they wrote, not on something they are.
If a person in a home-cooking forum advises someone to use a butter knife to slice tomatoes, it's highly appropriate to tell them they are incompetent and shouldn't be advising anyone. This post is exactly the same: an absurd advise that can only be explained by either an honest mistake or incompetence.
it's highly appropriate to tell them they are incompetent
It's not on HN because it's a personal swipe and the HN guidelines ask you not to do those. It might work on your cooking forum but we know, empirically, that it doesn't work well here.
They didn’t say they were doing something goofy like that. They said they were running into problems at work. It could well be that they have a million-LOC repo that’s enormously complex with decades of legacy setup to consider. Presume that they’re competent at their job, and maybe what they’re trying to do isn’t trivially easy.
which means you need to come up with a bespoke dependency management solution. in either case, you're doing dependency management.
I encourage you to google "monorepo {python, poetry, pip}" and you're going to land on many multiple page blog posts describe arcane dependency management solutions.
The second option is traditionally what people mean by "monorepo" (multiple executables in a repo with a dependency tree). First one you'd call a "monolith" (single executable in a repo).
The second one you'd want to manage with a tool like Bazel, which will use whatever plugins are appropriate for the language (pip in Python's case).
It's a legitimate ambition to try and build a tool within a single language's ecosystem to manage the second option, but it's a really hard problem that Bazel (and others) have already solved well, so you might as well use them instead.
So you can do a somewhat hacky thing with poetry workspaces to get the first one to contain multiple binaries like i.e. https://github.com/bleu/balpy-poc
But yes, you're right -- Bazel is the most correct monorepo solution. We started with a combination of the second structure, poetry, and shell scripts but have since moved to nix.
Yea nix is an interesting angle of approach. It rhymes a lot with Bazel (reproducibility, explicit/declarative dependency tree, etc.), but nix approaches the problem from a different, lower layer of abstraction (by replacing the OS-level "pip" in apt, pacman, snap, brew, and the like).
I think, in the long run, something like nix will win out. It makes sense for your OS "package build system" to be the same as your project build system.
If your python project depends on some mildly obscure python lib, do you write your own nix "package" (or whatever nix chooses to call them) wrappers for each release you end up using? Any other tripwires in your experience so far?
Nix calls them "derivations", but yeah you're right again. Lots of weird language in this space.
Specifically, we use this nix project [1] which provides a nice translation layer between a poetry project and a nix derivation. It allows our devs to use poetry where they want (with local relative paths in the pyproject.toml) and ci/cd to have more granular control of deps at build time.
There are some corner cases. As you correctly guessed, some more obscure python libraries might require a few extra lines (i.e. to specify that this package needs setuptools etc). of code.
You should never use requirements.txt, nor pyproject.toml and this has nothing to do with whether you put multiple projects into the same repository or not. This is just an all-around bad idea.
But, anyways. In a monorepo project you would have nothing to do with requirements.txt or pyproject.toml because all dependencies are already there. There's no need to install anything from anywhere...
1. In monorepo you don't have dependencies. This is the whole point of having a monorepo: everything is included. You can build on airgapped system, no unexpected inputs into your builds, no network partitioning problems. This is the whole reason why people do that. Total control and stability. But you pay for it by having to maybe manage third-party code yourself. By having to use more space. Probably, you will need more infrastructure to side-step tools that cannot be made to not use network etc. Hope this also answers the question about external dependencies: they become internal dependencies.
2. Why you should never use requirements.txt: the tradition of using this approach comes from total misunderstanding of the goals of project deployment. This is surprising and upsetting, because this isn't a difficult concept. This is due to most developers not wanting to understand how infrastructure of their project works, and the willingness to settle on the first "solution" that "worked". The goal of deploying a project must be that byte-compiled Python code is placed in the platlib directory, accompanying data in the data directory and so on. The reliable way to accomplish this is to make a Python package and install it. Requirements.txt plays no role in this process, since package requirements need to be written into META file in the package info directory. Instead, the process that involves using requirements.txt typically ends up installing "something" that at the time of writing allowed the authors of the project to somehow make their code work, due to the combination of such factors as current working directory, some Python path files placed without their knowledge into platlib etc. This is a very fragile setup and projects designed in this way usually don't survive multiple years w/o updates that keep modifying requirements.txt to chase the latest changes in the ever changing environment.
3. pyproject.toml had more potential, in principle, but turned out to be a disaster. The idea behind this contraption was to organize configurations of multiple Python-related tools under one roof. But the chosen format was way too simplistic to realistically replace the configuration of other tools, and the configuration started to slide down two bad directions: either "string programming" (i.e. a technique where strings in the code develop special meaning and parsing), or delegation (i.e. pyproject.toml would contain a minimal code necessary to redirect to the real configuration). Second problem with pyproject.toml esp. when it comes to dependencies: there was no plan on how to solve this problem. No system. So, every tool that decided to support pyproject.toml decided to do it in a way that suits it. So, in the end, it's always better to just use the native configuration format of the tool that does the package building, instead of involving the middle-man: pyproject.toml. It always stands between the developer and their ability to debug the code, to fine tune the tool they need to run. It's the MS Windows registry all over again, but even more poorly executed.
----
So, what should you do instead? -- Well, there's a problem... there aren't any good Python tools :( And, like I mentioned before, the reason is even not the tools / their authors, it's that the underlying design of Python infrastructure is broken. So, you are bound to choose between the bad and the worse.
On the bright side: the problem, when you really understand what needs to be done is very simple. For any individual project you can solve all your deployment and packaging problems very easily writing in any language that can do infrastructure-related work. I've done this multiple times (as a result of frustration with Python's own tools) and had never a reason to regret it.
So this example is a single repo with many services? I don’t think OP was talking about that. Why would you have a single repo with different requirements? At that point they should be independent projects
> So this example is a single repo with many services?
Yes, that's the meaning of monorepo. If it was a single service it would be a monolith.
> Why would you have a single repo with different requirements?
Real world example: I implemented a client/server system using Python, and they have completely different requirements (even the Python versions are different). I still want to share code between client and server codebases, so a monorepo is the perfect choice.
You still have dependencies. But they are included in the repo.
What happens when you have 2 different apps in your monorepo but one uses an older version of Django and wasn't upgraded yet. The monorepo doesn't handle that automatically, you need tooling. It's not as black and white as you say.
In a company I worked for that used monorepo we had multiple versions of Linux (CentOS 6 variants) all at the same time. Trust me, something like different versions of Django is not a really big problem in comparison.
But, to answer your question in a more practical way: what are you going to do with these two versions of Django? Are you planning on running a single pre-fork server with two different Django application servers? Are there going to be two different pre-fork servers? Do both Django versions have to be loaded by the same Python interpreter? Or maybe they don't even need to be deployed on the same compute node?
Once you can answer questions like these, the solution becomes obvious. Most likely, what you want is something like two pre-fork servers proxying HTTP traffic into two separate instances of Django-based Python Web applications. So, in your deployment script, you create eg. two virtual environments and place two different Django packages into these two environments, together with associated code for the Web application.
There are, of course, ways to improve on that. Deploying while using Python packaging is, in general, wasteful. A better way is to merge all the packages you need to deploy your application into a single filesystem snapshot (removing all the info directories, and, potentially, all the source files, replacing them with bytecompiled ones, while also pruning all other irrelevant data from Python packages, s.a. readmes, test files etc.) Going even further, you can Cythonize your code, or even embed Python interpreter into your http server so that you deploy your application as a single binary. And there are plenty more of other options. The sky is the limit really.
This is maybe common, but this contradicts the definition of monorepo. You just use the word incorrectly. "Mono" means "one". If you pull packages from elsewhere, that stops being "mono". There's really no difference in this situation between your team publishing multiple packages from multiple repositories and then assembling them together for the purpose of deployment, or doing so, but with the third-party packages.
I would really encourage you to think more on this. Good enough for you... is not really the goal. We want standard tools that help with packages and virtual environments that can scale to an organization of 2 python devs... all the way up to hundreds or thousands of devs. Otherwise, it fragments the ecosystem and encourages bugs and difficult documentation that prevents the language from continuing to evolve effectively.
It would be great if that tool existed, but it doesn’t seem to right now. I can appreciate the instinct to improve packaging, but from an occasional Python developer’s perspective things are getting worse. I published a few packages before the pandemic that had compiled extensions. I tried to do the same at my new job and got so lost in the new tools, I eventually just gave up.
One of Python’s great strengths is the belief there should be one, obvious right way to things. This lack of unity in the packing environment is ruining my zen.
From ecosystem point of view I think all language specific package managers are crap. Instead of fragmenting the wider ecosystem on language boundaries, we should be looking more for language independent solutions. Nix is cool, as are buck2/bazel.
Normally I’d agree with you, but they’re talking about the built in tooling. That isn’t going away anytime soon. If it’s good enough for their purposes they should continue using it, especially with no clear consensus on a “better way”.
As long as you don’t care about Python versions I too find pip + venv sufficient.
But there is no way for you to define which Python version your project is built against.
If you’re building a package you probably need to test against multiple versions. If you are building a “project” that isn’t an installable, distributed package but a bunch of code that is shared with a few developers and does something (run a machine learning model, push itself to a cloud function, generate a report) you probably want to target 1 and only 1 Python version.
As for the mono repo approach are you suggesting to copy paste numpy and pandas into the repo?
What you mean by "project" is a service. A service includes the Python interpreter. Choosing and provisioning the interpreter seems out of scope for pip or any other Python-specific tooling IMO. What if your service needs another interpreter like Ruby, or a database, message broker etc? Your Python tool won't help with that. This is what docker, and higher-level orchestrators like docker compose and kubernetes, and pyinstaller are for.
Sure, but you can only document that choice in your pyproject.toml, which is well-established. You have no idea how someone is going to provide that interpreter. You don't know what OS or how to provision an interpreter. If you are the person provisioning that interpreter then just use the tools you already know, not yet another tool to do basic stuff like install an interpreter.
At first I was excited to see that a new tool would solve the Python "packaging" problem. But upon further reading, I realized that this was about _package management_, not so much about packaging a Python application that I've built.
Personally I haven't had many problems with package management in Python. While the ecosystem has some shortcomings (no namespaces!), pip generally works just fine for me.
What really annoys me about Python, is the fact that I cannot easily wrap my application in an executable and ship it somewhere. More often than not, I see git clones and virtualenv creation being done in production, often requiring more connectivity than needed on the target server, and dev dependencies being present on the OS. All in all, that's a horrible idea from a security viewpoint. Until that problem is fixed, I'll prefer different languages for anything that requires some sort of end user/production deployment.
> What really annoys me about Python, is the fact that I cannot easily wrap my application in an executable and ship it somewhere.
You are not wrong, but let's unpack this. What you're saying is that there is a need to make it easy for another person to run your application. What is needed for that? Well you need a way for the application to make its way to the user and to find some Python there and for that process to be transparent to the user.
That's one of the reasons why I wanted Rye (and uv does the same) to be able to install Python and not in a way where it fucks up your system in the process.
The evolved version of this is to make the whole thing including uv be a thing you can do automatically. You can even today already (if you want to go nuts) have a fully curl to bash installer that installs uv/rye and your app into a temporary location just for your app and never break your user's system.
It would be nice to eventually make that process entire transparent and not require network access, to come with a .msi for windows etc. However the per-requisite for this is that a tool like uv can arbitrarily place a pre-compiled Python and all the dependencies that you need at the right location for the platform of your user.
The cherry on the top that uv could deliver at one point is that fully packaged thing and it will be very nice. But even prior to this, building a command line tool with Python today will no longer be an awful experience for your users which I think is a good first step. Either via uvx or if you want you can hide uv entirely away.
To put a finer point on this idea, even if one could be against even this idea of an “installation flow”, work done to improve uv or rye would also likely flow into alternative strategies.
The more we invest into all of this and get it right in at least one way, the easier it will be for alternatives (like pyoxidizer!) to also “just work”. At least that’s my belief.
There are tactical reasons to focus on one strategy, and even if it’s not your favorite… everything being compatible with _something_ is good!
Exactly. And in this particular case pyoxidizer and the work that went into it originally in many ways gave birth to parts of it. The python standalone builds were what made me publish Rye because there was something out there that had traction and a dedicated person to make it work.
Prior to it, i had my own version of Rye for myself but I only had Python builds that I made for myself and computers I had. Critical mass is important and all these projects can feed into each other.
Before Rye, uv, and pyoxidizer there was little community demand for the standalone python builds or the concept behind it. That has dramatically changed in the last 24 months and I have seen even commercial companies switch over to these builds. This is important progress and it _will_ greatly help making self contained and redistributable Python programs a reality.
It depends where and what you are trying to install, but generally for every sensible deployment target there's a tool (sometime more than one) which handles building a installable binary for that target (e.g. an OS specific installer). There's now even tooling to deploy to unusual places such as Android or iOS or in the browser. Naturally in some cases specific packages won't work on specific targets, but because there are standard interfaces if the code is capable of being run somewhere, the tooling for that target should be able give you something that works.
After the whole npm VC rugpull + Microsoft acquisition, and OpenAI showing legal non-profit status is toothless marketing to VC-path-entangled leaders, I'm reluctant to cede critical path language infra to these kinds of organizations. Individual contributors to these are individually great (and often exceptional!), but financial alignment at the organizational level is corrupted out of the gate. Fast forward 1-4 years, and the organization is what matters. "Die a hero or live long enough to become the villain."
So fast lint, type checking, code scans, PR assistants, yes, we can swap these whenever. But install flow & package repo, no.
That is unfortunate given the state of pip and conda... But here we are.
That battle has already been lost for Python. Microsoft owns Python, they just don't make it public.
This is how I came to believe this is the case:
Few years ago I wanted to write Python bindings to kubectl. I discovered that in order for that to work cross-platform, I need to make CGo use the same compiler on all platforms as does Python. Unfortunately, on MS Windows, CGO uses MINGW while Python uses MSVC. I wrote to Python dev. mailing list (which still existed at that time) and asked why did they choose to use a proprietary compiler for their "open-source" project. The answer I received in a round-about way was that MSVC was a historical choice, which cannot be presently changed because MS provides Python Foundation with free infrastructure to run CI and builds, and it also provides developers to work on Python (i.e. MS employees get paid by MS to work on Python interpreter). And that they are under orders not to drop MS tools from the toolchain.
Year after year the situation was getting worse. Like in a lot of similar projects, success created a lot of ground for mediocre nobodies to reach positions of power. Python foundation and satellite projects like PyPA started to be populated by people whose way into these positions was not through contributing any useful code, but rather writing pages of code of conduct. This code of conduct and never-ending skirmishes around controlling positions eventually led to some old-timers leaving or being outright kicked out (latest such event was the ban of Tim, the guy who, beside other things, wrote Tim sort, which is a somewhat famous feature of Python).
Year after year MS was pushing its usual agenda they do in every project they get their hands on: add crapload of useless features for the sake of advertising. Make the project swing every way possible, but mostly follow the fashion trends as hard as possible. This is how Python is now devoted to adding as much of ML-style types as possible (in the language with a completely different type system...), AoT compilation and JIT (in the language that's half of the time used to dynamically glue native libraries...) and so on. Essentially, making it a C#, but without curly braces.
MS is smart enough to understand that publicly announcing their ownership of Python will scare a lot of people away from the technology, so they don't advertise it much. But they keep working on ensuring developers' dependency on their tooling, and eventually they will come to collect on their investment.
There are many correct observations, but Python is owned by Microsoft, Instagram, RedHat and Bloomberg. Google fired the Python team this year, which makes it an attractive work place.
Nodejs drove npm's initial popularity but didn't require open governance. 5 years later, npm repo owners incorporated as for-profit and went full on VC mode, didn't make as much money as they wanted, and sold to Microsoft
The PSF and PyPA are more than welcome to get their act together and tell a good packaging story for Python. Unfortunately all they’ve had to offer is a billion contributor blog posts essentially saying “the system we set up for ourselves limits our ability to be helpful, and that’s not our fault, somehow.”
They’re all so caught up in their internal systems and politics[1] that they don’t seem to know why they’re actually there anymore.
So if someone’s gonna go do an actually-good job and capture the market in the way that Astral is, then that’s exactly what we as a community deserve.
[1]: I mean internal politics. This is a weird alt right “DEI HIRE!” rant.
Maybe a good example here is how the mamba team (quantstack) got frustrated with conda speed, splintered off to massively speed up things, and afaict it's now the new default solver. Even better, they seem to have a healthy consultancy doing projects like this, which solves the sustainability problem better than VC funding, which is a non-answer.
I'm for small teams doing big pushes, and the example shows such efforts can coexist with healthier long-term governance strategies. It's sad that the effort was needed, but nonetheless, a clear demonstration of separation of concerns working for small engineering team => bigger governance org.
It's also a bad look for the engineering priorities of anaconda inc on their flagship product. I wonder 'how we got there': Anaconda devs help build GPU python stacks, numba, etc, so they are quite good on what they do work on. Is every for-profit packaging-oriented company ultimately not paid enough to keep the packaging core fast? So VC $ just speeds up the schedule to mediocrity?
VC funding redirects the company’s focus on whatever is going to drive profits in the short or medium term. This does not necessarily align with great engineering :/ It’s often a poisoned chalice.
There’s one problem left with those tools: authority. They’re not pypa endorsed, that’s what makes those different from cargo.
At the same time pypa wasn’t able to provide a comprehensive solution over the years, and python packaging and development tools multiplied - just 3-4 years ago poetry and pipenv seemed to solve python packaging problems in a way that pip+virtualenv couldn’t.
We need pypa to now jump on the astral.sh ship - but will they do that without a certain amount of control?
I stopped worrying about what PyPA said when they endorsed Pipenv above other options, largely due to what seemed to me to be personal relationship reasons. Pipenv was disastrous at that time. It meant well, but at work we found ourselves waiting an hour for it to update its lockfile, and we had a relatively small number of widely-used dependencies in each repo. It just didn't work.
As a practical matter, I hugely appreciate the hard technical work PyPA does. However, I don't concern myself much with which toolsets they're recommending at the moment. Use what the community uses and don't worry about the "official" suggestions.
As a somewhat outside observer I don't quite understand what the PyPA actually is. In some sense there is an unclear number of participants and it's not even sure to which degree the PyPA relates to core python or the PSF.
I think the real endorsement that could help would be the core Python project itself. In a perfect world the official Python tutorial would start with "here is how you install python" and it starts by installing uv, the same way as the official Rust docs point to rustup and cargo.
I hope strongly that the PSF will manage to establish some sort of relationship with Astral which would enable that to eventually be a reality.
Here are few things that... well, won't really work well:
* Python package installation, package format and loading of modules are defective. The design is bad. It means that no implementation blessed by PSF or not isn't going to solve the packaging problem. So, there's no point to ask PSF or PyPA to adopt any external tool. If the external tools are better than pip in some way, then it could be the speed or memory footprint etc. They will not solve the conceptual problems, because they don't have the authority to do that.
* PyPA and PSF, but maybe to a lesser extent, are populated today by delusional mediocre coders who have no idea where the ship is going or how to stir it. They completely lack vision and understanding of the problems they are to deal with. They add "features", but they don't know if those features are needed, and, in most cases it's just noise and bloat. From a perspective of someone who has to deal with fruits of their labor, they just ensure that my job of "someone who fixes Python packaging issues" will never go away.
So... as "someone who fixes Python packaging issues" I kind of welcome the new level of hell coming from Astral. From where I stand, it is pouring more gas into a big dumpster fire. Just one more tool written in a non-mainstream, non-standardized, quickly evolving language, impossible to debug without a ton of instrumentation, with the source code hard to write and hard to understand. It's just another deposit towards my job security.
In my day job in a company I no longer work for I wrote a program that took apart a list of Wheels (that's what Python packages are called) and combined them into a single Wheel. The reason to do this was to expedite deployment (at that time pip installing from a list of wheels saved locally would take about a minute per wheel...). Later, I also wanted to write a Python packaging tool, but somewhere down the line I've lost interest (the project, while not functional, can be found here: https://gitlab.com/python-packaging-tools/cog ). This is just to set the context for where I'm coming form.
So, in order to work on the projects listed above I had to study the spec. I won't dwell on all the problems I've discovered, here are just some highlights: the first time my eyebrows began to rise was when I read that Wheel despite being a binary package discourages programmers from putting binary artifacts in it... Yes, you read this correctly.
Now, to elaborate on the matter: when Python interpreter loads Python source files installed in platlib (and possibly some other locations, I haven't researched this subject in-depth) it byte-compiles the sources for future "expedited" loading. At first, the byte-compiled Python code used to be stored alongside the sources, but later it migrated into \_\_pycache\_\_ directory. The unfortunate decision to byte-compile when loading rather than when installing is what back in the days created a lot of frustration for the dummy Python users who would try to install Python packages on their Linux system when running as sudo, but later unable to run their programs because Python interpreter would break trying to put byte-compiled files in a directory not owned by the user running the interpreter. And this is how --user option of pip was born. A history of kludges and bad fixes for a self-inflicted problem.
So... the advise to not put the byte-compiled Python code in the Wheels is what tripped the poor unwitting Python programmers: had they put the byte-compiled files in there, the problem would've gone away, and they could happily install and run their programs in a simple and straight-forward way. Today, the number of kludges around this problem grew by a lot, and simply undoing this advise will not work, but this isn't the point.
Anyways, the motivation for this advise? -- premature optimization. The authors of Wheel format decided to "save space" for programmers publishing their packages. In their mind, if someone published a binary packages, but with... sources in it instead of binaries, that package could be applicable to more than one combination of OS/architecture/Python version. A huge improvement, considering most Python packages are hundreds Kilobytes big! Not to mention that anyone who packages native libraries with Python packages has to packages them for all those combinations anyways. And that's like half of all the useful Python packages.
This is what should've been done instead: copy from Java JARs. Have packages with byte-compiled code, have them used for deployment, and have source packages for those who want editor intellisence etc.
This story is just a drop in a bucket of all the bad decisions made when designing the Wheel format, but for the lack of space and time I will not go further into details.
Similarly, I will only touch on some problems with installation, just to give an example. Python allows specifying arbitrary URL as a package dependency. This makes auditing or even ensuring stable builds a huge issue. I.e. you might think that all the packages you are installing are coming form the PyPI index, unless you configured pip to use something else... but it's possible that some dependency will specify to load from a URL that was convenient for the author submitting the package at that time.
Another problem is what happens when a package for the desired combination of OS/architecture/Python version doesn't exist in the index known to pip: in this case, instead of failing, pip will download the source archive and will try to build the package locally. This means that the users will get the version of the package the authors of the package are guaranteed to never even have run... And, unfortunately, quite often this process "succeeds" in the sense that some package is produced and installed. Infrequently, but still often enough for it to be a problem, such packages will have bugs related to API version mismatch. Some such problems may be sometimes swept under a rug. I've personally encountered a bug that resulted from this situation where some NumPy arrays were assumed to have 16 bit integers but in fact had 32 bit integers. Those arrays represented channels in ECG data (readings from electrodes attached to patient's scalp). The research was made and the paper was published before this hilarity came to light. (You may say that ECG is a borderline scam anyways, but still...)
Now, to the last part: the module loading. The whole reason why Python came up with the kludge of virtual environment is due to how modules are loaded. Python source doesn't have a way of specifying package version when requesting to import a module. Therefore, if more than one version of a package is found on sys.path, there's no telling what will be loaded.
What should've been done instead of virtual environments: Python modules should've only been loaded from platlib (not from the project source directory as it's often done during development). When loading modules, the package info directory should've been examined, and the dependencies from the META file parsed. These dependencies then would be kept in memory and refined every time new module loading request is made to narrow down the selection of versions that could be imported. This would allow Python to install and use multiple versions of the same package in the same Python installation without conflicts. This might not be a huge deal for developers working on their (single) projects, but it would be huge for developers packaging their projects for system use. Today many Python-based projects available on Linux are packaged each with their entire virtual environment and often even the Python interpreter and a bunch of accompanying libraries. I.e. in order to install a project that has single digits Megabytes of useful code, often hundreds of Megabytes of duplicated code are pulled into the system.
As for the money part: dealing with Python problems pays the bills! :) I even sometimes get my name on scientific publications because that's often the kind of projects I have to deal with. So, I have both fame and compensation in good order (at least for now). But thank you for suggestion!
I frankly don’t give a fuck about PyPA at this point. They’ve proven themselves to be largely irrelevant in this context. Anything they get their hands on just seems to rot, so I’d unfortunately like them to stay far away from this. It pains me to say this. It’s against my philosophy. But it’s merely a reflection of the current state of things. I’ve been working with Python full-time for a decade. Other packaging ecosystems have essentially lapped Python’s.
At this point I also don’t care about nuances like “is it execution issues with PyPA, or that their set remit is faulty?” I’m sick of getting drawn into that stuff, too.
Armin advocates for 'uv' to dominate the space, but acknowledges it could be rug-pulled due to its VC backing. His solution to this potential issue is that it's "very forkable." But doesn't forking inherently lead to further fragmentation, the very problem he wants to solve?
Any tool hoping to dominate the Python packaging landscape must be community-driven and community-controlled, IMO.
Forking doesn't inherently lead to further fragmentation: the level of fragmentation post forking can still be much lower than before consolidating on the rug-pulled tool
(also, how many more decades does this imaginary community need to create a great dominant tool?)
I was looking this morning at migrating our software from poetry to uv at my company, due to poetry's slowness. And so far i've been reading a lot of doc and not getting a lot things done. I did the previous migration to poetry as well which was vastly simpler. So far it seems that poetry tried to make a simple package manager that works like any other, while uv is keeping quite a bit of the python package insanity around.
At least uv doesn't lead to an absolute clusterfuck of poetry conflicting with virtualenv. Or broken package.toml format with minor poetry changes. Or amazingly dumb "sources" which doesn't work for transitive dependencies and leads to even longer resolve times in case of multiple indexes.
How does it conflict with virtualenv? Last I checked it's default behavior was to automagically create a virtualenv for you if you run it from system python (a bit weird but ok) or if you were in an virtualenv, it would use it.
For our projects we use pyenv-virtualenv (so we can have specific python versions per project) and then poetry "just works" (though can be slow, hence rye, uv and friends).
Probably the entirety of the current python package management ecosystem. From the outside, it all looks insane but I suppose most users are Stockholm-syndrome-d into thinking everything is fine.
For Python, specifically? As in, pyproject.toml-compatible tooling, with virtual environments and the like? I've see Nix being used for compiled languages, and maybe I'm missing something, but I haven't really seen it used to manage real-world projects. How do I manage a modern Python project with nix? How do I publish once to PyPI?
Agree. Don't think gp has any idea what they're saying. Using nix to install python dependencies is a disaster, it's only good for the python versions themselves.
Works extremely well. Just don't expect to do version soup of random versions of everything. nixpkgs provides one of each that are chosen and known to work together.
> it's only good for the python versions themselves
Versioning Python isn't hard. pyenv, asdf, mise, now uv... I honestly don't see what Nix brings to the Python ecosystem. I can see using it to version Python if you already use it, but that's it.
What Nix can bring to the Python ecosystem is the ability to actually manage all of the non-Python dependencies that drive most useful Python code, and to install them in a portable way.
On my team we use Nix for actually distributing our Python programs, both directly onto machines for local development, and to build containers for deployment in the cloud. We use Poetry for development and generate the Nix package from our pyproject.toml.
Actually plugging Python into a general-purpose package manager for native dependencies is admittedly a pretty clunky experience today because Python packages lack sufficient metadata and packaging formats within the ecosystem are so fragmented. But with a sane implementation of something like PEP-725, that could actually make that pretty smooth, including for system package managers other than Nix if that's not your cup of tea.
I really like this framing - lots of incremental work by lots of people over time got us to the point where ~a few people at one company can radically improve the situation with a medium amount of work.
The churn is interesting. In 2019, I made a python version-manager and dependency manager written in rust. I gave up after it seemed like no one wanted to use it. Everyone not satisfied with Pip was on Poetry or Pipenv; I made the one I did because they both had user-interface problems; of the sort I would run into immediately. (I believe Poetry would default to Python 2, and not give you a choice by default, or something to that effect). Now there is a new batch.
The biggest challenge was dealing with older pacakges that used non-standard packaging and ran arbitrary code; generally ones that didn't have wheels.
From the article:
> As of the most recent release, uv also gained a lot of functionality that previously required Rye such as manipulating pyproject.toml files, workspace support, local package references and script installation. It now also can manage Python installations for you so it's getting much closer.
These are all things that dead project I wrote could do.
The harsh reality is that this sort of tooling requires a lot of publicity to be successful. Nobody is going to try an unknown package manager from some random developer, but people will enthusiastically adopt anything pushed by a "famous" developer like Ronacher.
TY to the astral team for making my quality of life so much better and Armin for being brave enough to pass the torch.
Strong +1 on a one tool wins approach - I am so tired of burning time on local dev setup, everything from managing a monorepo (many packages that import into each other) to virtual environments and PYTHONPATH (I’ve been at it for like 8 years now and I still can’t grok how to “just avoid” issues like those across all pkg managers, woof!)
I am really excited to see what’s next. Especially looking for a mypy replacement and perhaps something that gives compiling python a “native” feeling thing we can do
Those people seem to have a passion for developing package managers (instead of just seeing it as a tool that needs to do the job), and as long as it is the case, I don't see how we wouldn't end up with one new package manager every year.
Rye was started by Armin as a collection of existing tools (mostly) with a frontend similar to cargo.
Then Astral came out with uv which aims to be a frontend into a collection of their own tools similar to cargo.
Armin and Astral agreed for Astral to take over Rye some time during uv development with (I assume) the goal for uv to fully replace Rye.
Use uv. As of 0.3.0 it covers most of rye now anyway. Especially if you’re writing projects and not consumable libs/apps (I haven’t used uv for anything other than package management so far).
With declarative tooling - unlike setup.py - we've lost the ability to install to locations not managed by Python. Most of my projects used to have a python-code config file that I could include both in my code and in setup.py. Last I checked none of that is now possible. Want to install a system binary - ship an RPM or a flatpak.
I don't understand how we could lose so much flexibility and yet gain so little in return.
P.S I've only ever encountered minor dependency issues in my admittedly small projects using just pip and venv.
That's entirely appropriate imo - system packager should be running pip or whatever and then putting the output where it wants, it's not for setuptools or whatever to declare that.
What path would you put in setup.py anyway? A different one for different distro preferences, a different one again for Windows, for macOS?
I'm kind of interested in this space -- can anyone point me at an article that goes over why this is harder for python than it seems to be for, e.g., ruby? Is there something inherent about the way code is imported in python that makes it less tractable? Or is it just that the python world has never quite all come together on something that works well enough?
(Note that I can certainly complain about how `bundler` works in ruby, but these discussions in python-land seem to go way beyond my quibbles with the ruby ecosystem)
The big problem is that Python is used in a lot of different contexts and has too many valid ways in which people are able to build, compile, release non-Pythonic extensions in their package. Original way for packaging code was effectively an executable script which made some things easy, but other things much harder. Modern efforts are trying to limit the flexibility of the package definitions without breaking all of this legacy code. I unfairly think of Ruby as only for Ruby on Rails, so only ever dealing with the web domain.
Here is a story about the nightmare it takes to compile the Fortran code that supports much of SciPy (backbone numerical computing library with algorithms for wide swaths of disciplines)
https://news.ycombinator.com/item?id=38196412
It's not. Just that Guido never cared about packaging, so it was left to a ragtag unpaid motley crew to piece together and later learn from industry practices that solidified a decade or two after they started.
Python's import system, as it is, makes some things slightly more complicated due to the need to use virtual environments (copies of the interpreter with separate library paths) as opposed to something like node_modules. This could easily be changed, if the powers-that-be actually cared about packaging. Packaging is handled by a separate group, which likes design by committee, and which enables the creation of separate third party tools instead of standardising on one good tool (uv, rye, and poetry, are all popular tools made by people who are explicitly not members of the packaging group).
looking at the uv docs it seems it doesn't support conda packages. That's a non-starter for us as we need both conda and pypi (some packages are in one, some in the other). So we'd probably look at pixi as a possible replacement for conda.
so UV and PIXI are both nice in that they provide per-project environments (though using conda envs works just fine for us), but they don't solve the actual python packaging problem in that they still depend on either the pypi or conda packages (and neither support both)
Yes. They are tools to manage the user side of packaging (i.e. installing, managing versions, locking, environments) but can't do anything to fix the problematic ecosystem. That is a much harder problem to solve.
Is there any chance that computer scientists will analyse the software distribution situation for several language eco systems and finally find a general solution so that we can stop wasting so much time with these things?
It feels like we were driving cars since 50 years and still haven’t figured out a way to distribute gas.
Is there any research going on? The situation is totally crazy, especially for python.
I would like to see this done by top scientists. I would love to never have to spend any time again on the newest packaging tool.
What is the core of this problem and why is it not solved?
The linked post is the author of Rye's take on that.