Hacker News new | past | comments | ask | show | jobs | submit login
How to improve Python packaging (chriswarrick.com)
292 points by Kwpolska on Jan 15, 2023 | hide | past | favorite | 200 comments



This post is prompted by the survey of Python users, their feedback, and a current thread [0] on the Python forms discussing a way forward. I read the thread the other day, it's long, and there are a lot of opinions.

Towards the end of this post there is an interesting observation:

> Discourse, the platform that the discussion was held on, shows the number of times a link was clicked. Granted, this count might not be always accurate, but if we assume it is, the link to the results summary was clicked only 14 times (as of 2023-01-14 21:20 UTC). The discussion has 28 participants and 2.2k views. If we believe the link click counter, half of the discussion participants did not even bother reading what the people think.

Along with that, my concern is that there are too my cooks in the kitchen. A packaging system is never going to cater for more than 90% of use cases, but there will likely be a higher than average representation from people who would sit in the 10%. There is a danger of them never being able to agree on a solution that solves all problems, and then never launch anything.

It would be far better to get something out, have a stake in the ground, and build from there. It also desperately needs to be "Core Python", my understanding is that the PyPA (Python Package Authority) is somewhat disconnected from the core Python development.

Ultimately they need one system that is front and centre, that is the "officially endorsed" tool, and very clear messaging on that.

0: https://discuss.python.org/t/python-packaging-strategy-discu...


> Ultimately they need one system that is front and centre, that is the "officially endorsed" tool, and very clear messaging on that.

I think this is important; otherwise there is a risk behind switching to a new packaging tool and then it being subject to neglect/lack of resources - that makes people averse to switching to it. That is why virtualenv/pip is the lowest common denominator - everyone knows that worst case, those will always continue to work. The "official" tool needs to inspire that sort of confidence.


Yes, there was some questionable comms around pipenv being "officially recommdnded" which upon closer inspection seemed to have come from the people who wrote it and not really been that official so far as I could see! That seems to have been walked back after a while but not before it gained traction and randoms bring up that it's official even now.


I really hate to be this “inside baseball” and negative about it all, but I think that the Kenneth Reitz factor is…unique (within the Python community), and uniquely dangerous in these situations. I am so deeply fed up with the Python packaging experience. At this point, both my eyes are set on the fastest way to some sort of standardisation and improvement. Its because of this that I want to clearly acknowledge that IMO over the last 5 years we were thrown off course in no small part by one person’s lust for personal notoriety when the community clearly needed to bring their heads together on a solution. Pipenv was touted as the be all end all way too quickly, as you said not necessarily by ‘official sources’, and those shaky shaky years left everyone a bit battered.

When I did my every-12-months checkin to see how Python dependency management was going, I sure enough saw that Pipenv is ‘under’ PyPA now. I then learned that a PyPA endorsement or association. doesn’t mean a HUGE amount for the reasons that you and others have already noted. Sure enough, PyPA’s own packaging documentation is hesitant to strongly prefer a front end tool let alone any other part of the stack.

A significantly improved Python dependency management story could very well include Pipenv on the front end for all I care. But not acknowledging the elephant in the room risks repeating past mistakes. Lest we end up with more of the celebrity developer culture we see in communities like JS, but without the seemingly limitless effort and resources.

None of this is to understate the technical challenges, especially taking into account pulling together the current fragmented ecosystem.


> there will likely be a higher than average representation from people who would sit in the 10%

And also an under-representation of "average users".

Python the language has always benefitted from its relative simplicity, which I attribute to people like GVR saying "no" to specialized features that accrue on languages like barnacles on a ship (looking at you C++).

With newer languages we see core design teams being much more opinionated about tooling for build and dependendency/packaging management. This is no doubt in response to the dumpster fires of C++ and Python, where "the ecosystem will sort itself out" clearly hasn't worked.


I know Python since version 1.6, its simplicity is deceptive, the language is as rich as C++, beginners only think otherwise because the language looks simple on the surface.

When one starts exploring the standard library, everything that is exposed in meta-programming and runtime internals, or the usual language changes even across minor versions, the picture changes dramatically.

Once upon a time I always devoured the release notes for each release, eventually I lost track of changes.


A few years ago, I worked on a (company-strategic) experimental project that used Python. Apparently, it only worked on some specific versions of Python, due to changes in the internal order of sets, a fact that was of course not documented or checked in the tool.

Not the reason for which the project failed, but the weeks lost trying to replicate results between teams certainly didn't help.

Python is very readable, but I agree that the stdlib is anything but simple.


I always like to say that's barring divine inspiration, it's impossible to make the right choice. It's far more possible to make a choice and then work hard to make it right


I think you're right. At risk of making a "lowbrow dismissal" the very length of TFA is a clear symptom of the problem: Design-by-committee (The PyPA (Python Package Authority) reminds me of the movie Brazil. Don't get me started.)

- - - -

Someone should do for packaging what pathlib did for file I/O: make a Pythonic model/API for it. Then deployment becomes simple, scriptable, testable, repeatable, etc...


This might be a fun thought to entertain, but the PyPA doesn't really form design committees (at least, I haven't been on one). You can see exactly how Python packaging standards are made: they're done with PEPs[1], exactly the same as with every other Python standardization effort.

Indeed, most packaging PEPs start out exactly the way you've laid out: a tool or service writes a Pythonic API or model, and it gets standardized so that other tools can rely on it. TFA's problems (which are real ones!) stem mostly from the error before there were serious standardization efforts among Python packaging tools.

[1]: https://peps.python.org/topic/packaging/


> Don't get me started.


Thanks, this is an interesting nugget. Did you intend to append a link reference for [0]?


Yes, edited. Thanks.


Over time, I've grown to appreciate "pip-tools." Since it's a dead-simple extension of pip, I wish it could be upstreamed into pip itself; that seems like the most straightforward way of fixing a number of Python's packaging issues.


I think a lot of people will wrinkle their nose at pip-tools because it's a more manual workflow, but I really enjoy that aspect of it. Other package managers I've used are convenient at first, but at some point I end up fighting with them and they won't give ground. With pip-tools, at the end of the day I'm in charge.

Plus I really like that the end result is a requirements.txt that can be used with any plain Python+pip environment.


Yup. I like to make it so that each executable has a symlink to a shell script.

The shell script checks if there’s a virtual environment set up with the packages in the requirements.txt installed (it takes a snapshot of the file because virtualenv doesn’t have a DB to query cheaply). Once the environment is set up, it dispatches to running from the virtualenv.

That way when you update a requirements.in file it recompiles it (if the txt is out of date) and installs any new packages, removes packages that shouldn’t be there anymore and updates ones whose version changed (if there’s any changes found). It also lets you trivially run tools with disparate requirements.in without conflicts because each is siloed behind its own virtualenv.

This makes it a trivial experience to use these tools in a shared repo because there’s no worrying about packages / needing to remember to run some command before the right environment is set up. You just modify your code and run it like a regular command-line tool and packages automatically get deployed. It’s also amenable to offline snapshotting / distribution. In fact, I used this to distribute support tooling for factory lines of the Pixel Buds and it worked extremely well.


You’ve described a worse, homegrown version of poetry.


That's pretty rude and ungenerous considering that I did this in 2015 before poetry even existed. Also, I could be wrong, but briefly taking a look, it seems like it still has the problem that it doesn't automatically update your virtualenv when the dependencies for an executable have changed (i.e. someone merges in code that uses a new dependency, you still have to know to run some commands to update your own local virtualenv).


Yes. It usually gets glossed over, but if you're building a Python application it's all you need.


I work on mutlirepos and i really dislike gits subtree, subrepo i use https://pypi.org/project/zc.buildout/. Yes i know i can do [1]. But editing on multirepos at the same time i can only do with zc.buildout [2] Still not perfect but it does the job.

[1]

   $ cat req.txt
   requests
   git+ssh://git@gitlab.com/foo/bar/amber.git@0.4.0
   git+ssh://git@gitlab.com/foo/bar/rebam.git@0.4.0
[2]

   $ cat buildout.cfg
   [buildout]
   extends = versions.cfg
   extensions = mr.developer
   auto-checkout = \*
   develop = .
   show-picked-versions = true
   update-versions-file = versions.cfg
   sources-dir = git-sources

   parts = py

   eggs = 
       amber
       rebam
       hammer

   [sources]
   amber = git git@gitlab.com/foo/bar/amber.git@0.4.0
   rebma = git git@gitlab.com/foo/bar/rebma.git@0.4.0

   [py]
   recipe = zc.recipe.egg
   eggs =
       ${buildout:eggs}
   interpreter = py-backend
   dependent-scripts = true


Ah, I haven't used buildout in years (I remember using it a lot when working with Plone).

I used it for a personal project but gave up a few years ago as some piece of the puzzle seemed broken and abandoned (something hadn't been updated to use a newer version of TLS or something).

I liked buildout though - it was a good system with its bin directory.


I've pretty much settled on poetry for most things at this point.

It still has a ton of rough edges like error messages still closer to a stack trace than something actionable and it's slow as hell resolving but it does the job 90 percent of the time without issue.

Also running everything in docker now anyway which side steps some other problems.


For anyone curious, the reasoning for the slowness is briefly described here: https://python-poetry.org/docs/faq/#why-is-the-dependency-re...


That's where we are as well, but yes the problems you didn't sidestep with poetry+docker are still there: the pypi fecosystem out there does not stop quantum fluctuations just because you want it to. If you pin too much you're not getting security fixes and if you don't pin enough, you get bitten every time a tertiary dep changes that you never asked for. Oh yeah and there are now trojans upstream almost daily.


Yeah but that trade off is entirely separate.

We trade security for speed(or "velocity" if you want to be jargon about it).

I just pin everything and go through my projects every couple weeks and bump the deps (unless some really big CVE hits the news).


Does any FOSS package ecosystem have a good solution to this particular set of problems?

I know Go has historically done something interesting with selecting the "minimum viable version number" for each dependency, but that's the only relevant idea that comes to mind.

With PyPi, at least it's relatively trivial to self-host a pypi instance and self-manage dependency updates.

In the Python ecosystem (vs e.g. Node), at least the total set of first+second-order dependencies for most normal projects is quite small (e.g. medium double digits to low triple digits). It doesn't feel too painful to manage that magnitude of upgrades on a monthly or quarterly basis.


In the JVM ecosystem you get the version your immediate dependency was built/released against, transitive dependencies are never silently upgraded (though if you want to upgrade then it's one command). IME that's a much better default; it does mean that when a security issue is found in a low-level library you get everyone doing a bunch of releases that just bump their dependencies, but the consistent behaviour is very much worth it.


Poetry still doesn't seem to support PEP621, instead requiring its custom, vendor-specific shape of pyproject.toml.


Have they spoken about why they don't support it? Or plans to support it in the future?

It seems like the shortest path to a universal package management tool for Python is to add this capability to Poetry, rather than to build something new.



It's not very well documented, but the PyPA tools do provide a unified experience when used correctly.

Here's a PyPA project (FD: one I work on) that uses a single pyproject.toml to handle all aspects of packaging (and most non-packaging tool configuration, to boot)[1]. With a single file like that, the only thing you need to do to start a local development environment is:

    python -m venv env && . env/bin/activate
    python -m pip install .[dev]
(We provide a `make dev` target that does that for you as well.)

Similarly, to build distributions, all you need is `build`:

    python -m build
[1]: https://github.com/pypa/pip-audit


Ultimately it needs to be "Python.org" that endorses the tool, not PyPA, no one in the scheme of things know who PyPA are and if it's the "one true way".

If you go to Python.org and follow through to the beginners guide [0] this is what's suggested:

> There are several methods to install additional Python packages:

> Packages can be installed via the standard Python distutils mode (python setup.py install).

> Many packages can also be installed via the setuptools extension or pip wrapper, see https://pip.pypa.io/.

That is so out of date and fundamentally confuses people coming to Python for the first time. How is pip secondary, and no mention of venv?

The PyPA need to get buy in from Python Core to put one tool front and centre first. It needs to be like Rust with Cargo, literally the first thing you learn to use and core to all beginners guides.

That's not to diminish the work of PyPA, you are all amazing I just want your work to be more obvious!

0: https://docs.python.org/3/using/mac.html#installing-addition...


The future is here, it’s just inconsistently distributed. There’s also the problem of python.org not wanting to pick winners, i.e. not promoting activestate vs conda vs pypa.

https://packaging.python.org/en/latest/tutorials/installing-...

https://packaging.python.org/en/latest/tutorials/packaging-p...


Which is just, utterly utterly ridiculous. The Python packaging story will continue to be a mess right up until this particular inconsistency is resolved.


Nowhere in pypa documentation is your simple workflow described or mentioned. Instead it's a jumble of links to a myriad of tools including hatchling, flit, pdm, etc. and basically just a shoulder shrug, 'I don't know, figure it all out yourself' message. This article makes a great point that the current pypa 'guidance' is too confusing and vague for actual end users (i.e. people that don't work directly on pypa or haven't been studying python packaging for decades).


I agree that it's confusing. That being said, there is an official PyPA tutorial that goes through the exact steps need to produce the commands I suggested, so I don't think it's accurate to say that it's nowhere to be found[1].

Edit: Unfortunately it's easy to confuse the above tutorial with this one[2], which is specifically for setuptools. So I can appreciate end user confusion around the documentation, particularly in that respect!

[1]: https://packaging.python.org/en/latest/tutorials/packaging-p...

[2]: https://packaging.python.org/en/latest/guides/distributing-p...


No I'm looking at link 1 and specifically have an issue with the huge table of options for how to setup your pyproject.toml--it has tabs for hatchling, flit, pdm, etc. but _zero_ description of why I would want to use one of those tools. It's just more confusing, like you're being thrown a bag of parts and no description of what to build or how to do it.

To be honest that entire pypa doc should be like two paragraphs long instead of 1000+ words. It should be basically, "ok to package your python app just run <official python tool to init a package>, that's it you're done!". Every decision should be made for me, like it is with npm, cargo, etc. I shouldn't have to think beyond running one command.

That's what python end users need. We don't need a million tools and huge docs with seemingly no vision or goal. We need to get shit done and the less time we faff about with packaging and tooling the better.


Yep. There being no documentation to really empower someone to work out what to use, to me very directly says “there’s some political spat going on that I’m being exposed to here”.


that "political spat" was actually targeted harassment towards a former PyPA member the last time someone actively working on packaging.python.org tried to be remotely opinionated. Since this ended with them stepping away and no one else has volunteered, it remains unopinionated.


Sounds like the steering council or whatever that runs python nowadays needs to step up. If people are being targeted for trying to improve packaging then the steering council should make a decision and direct all commentary, etc. to them, not individuals documenting best practices.

The whole 'let the community figure it out' seems to have failed and is causing nothing but more confusion and now attacks on people. The council needs to step up and say, "this is how python packaging will work, period. End of story, end of debate. There is no more discussion on this, the decision is final. All other python packaging tools now are non-standard and not recommended for use anymore".

That was the one good thing a BDFL model for leadership could achieve, making a hard decision in the face of many strong opinions.


Edit: My sibling comment has another answer, which makes me believe I'm lacking context. So I've removed this comment, in the interest of not offering opinions outside of what I know.


A few weeks ago, I was attempting to help introduce someone to Python. I felt embarrassed at trying to paper-over how complicated and confusing the initial configuration is for a newbie. No officially endorsed best-practices to which I could point. Half a dozen tribes with their own competing solutions and guidance.

A definitive statement from Python.org as to The Way would go so far.


AFAIK in the python 2 era it was the de-facto standard (just swap "python -m venv" with "virtualenv"). All these new tools have just made it more complicated and it's not clear to me what they gain, seems to me like they've simply succeeded in convincing people it's complicated by obscuring what's actually going on.


Correct. And most people back then were just using `pip install -r requirements.txt`, including different files for dev/test/prod as needed.

I worked at one company that used buildout back then, which seemed relatively rare, but buildout was more flexible and allowed you to do things like installing system packages or running your own scripts. We used it to pull down config files that weren't checked into the repo but still wanted to be shared, helping setup the local developer environment, and a few other things.


First time I hear PyPA, and looking up their website [1]: It is not a good sign if a packaging authority has a broken logo link on their mainpage [2].

[1]: https://www.pypa.io/ [2]: https://pypi.org/static/images/logo-large.svg/


So, the project you linked has a pyproject.toml file uses flit:

    requires = ["flit_core >=3.2,<4"]
    build-backend = "flit_core.buildapi"
Python.org defaults to using hatchling in the docs[1].

PyPA has a sample repository using setuptools[2]. This project is linked from Python.org on this page.[3]

Python.org recommends using setuptools under packaging recommendations as well, with no mention of hatchling or flit even on that page.[4]

No wonder so many devs are frustrated about this. I've been writing mostly Python for work and fun the past 11 years and I'm f*cking confused what tools to use. When I start a new side project I don't know whether a tool is going to exist and still work in a few years, because there's 10 tools trying to do the same things and it's not clear what the community is actually getting behind.

[1]: https://packaging.python.org/en/latest/tutorials/packaging-p...

[2]: https://github.com/pypa/sampleproject/blob/main/pyproject.to...

[3]: https://packaging.python.org/en/latest/guides/distributing-p...

[4]: https://packaging.python.org/en/latest/guides/tool-recommend...


That's still about 4 times as much command as I'd consider normal/reasonable, compared to say "mvn install".


That repository doesn't seem to pin dependency versions. How do you integrate that in this workflow?


You could use `pip-compile` if you want full pinning. That's what we do on another project -- we use GitHub Actions with `pip-compile` to provide a fully frozen copy of the dependency tree for users who'd like that[1].

In the context of `pip-audit`, that makes a little less sense: most of our dependencies are semantically versioned, and we'd rather users receive patches and fixes to our subdependencies automatically, rather than having to wait for us to release a corresponding fix version. Similarly, we expect users to install `pip-audit` into pre-existing virtual environments, meaning that excessive pinning will produce overly conservative dependency conflict errors.

[1]: https://github.com/sigstore/sigstore-python/tree/main/instal...


Or if you don't want to install something else and are willing to just use version numbers (instead of also hashes like pip-compile in that link), "pip freeze" is built in.


The tricky thing with `pip freeze` is that it dumps your environment, not your resolved set: your environment also contains things like your `pip` and `setuptools` versions, any developing tooling you have, and potentially your global environmental state (if the environment has global access and you forget to pass `--local` to `pip freeze`).

In other words, it's generally a superset of the resolutions collected by `pip-compile`. This may or may not be what you want, or what your users expect!


By default (at least in python3 nowadays) it also excludes pip, setuptools, distribute, and wheel; you need "--all" to include them in the output.


Oh, that's news to me! I stand corrected.

(The point about other development tooling is, I believe, still accurate -- if you e.g. have `black` installed, `pip freeze` will show it.)


Which is a huge limitation of many of the other tools. I have some beef with poetry, but it did at least get one thing correct: differentiating between the code required libraries and the development tooling (pytest, black, etc). There are hacky workarounds to accomplish this with other tools, but codifying this differentiation is incredibly valuable.


With pip-tools you can use a requirements.txt for production and a requirements-dev.txt for your development environment. The latter imports the former.


where is the

    install .[dev]
format/syntax defined ? I was trying to find what was possible and how to make sense of it in setup.cfg tools

I found one mention in the docs but no more.


PEP-508[0] explains the grammar for "extras":

> Optional components of a distribution may be specified using the extras field:

  identifier_end = letterOrDigit | (('-' | '_' | '.' )* letterOrDigit)
  identifier    = letterOrDigit identifier_end*
  name          = identifier
  extras_list   = identifier (wsp* ',' wsp* identifier)*
  extras        = '[' wsp* extras_list? wsp* ']'
as well as explaining their behavior, albeit briefly:

> Extras union in the dependencies they define with the dependencies of the distribution they are attached to.

The resolution on . is explained by the pip documentation[1]:

> pip looks for packages in a number of places: on PyPI (if not disabled via --no-index), in the local filesystem, and in any additional repositories specified via --find-links or --index-url. There is no ordering in the locations that are searched. Rather they are all checked, and the “best” match for the requirements (in terms of version number - see PEP 440 for details) is selected.

[0]: https://peps.python.org/pep-0508/#grammar

[1]: https://pip.pypa.io/en/stable/cli/pip_install/#finding-packa...


you may have done

   pip install jupyter[notebook]
this is the same thing, just for . (current directory) and dev variant.


Yep. The key difference is that the former is specified in PEP 508, while `.` and its variants are a convenience feature that pip (and maybe some other tools) provide.


yeah but I'm missing data on how variant interacts with various requirements defined in setup.cfg (testing dependencies for instance)


It’s a pip-ism; as far as I know, it’s not defined in any PEP. It should be in their documentation, however.


gonna dig deeper there then, thanks


Because of the insanity of python I run everything in Docker (through compose). No more issues with it not working on some dev's computer because of a missing wheel, or that they need to have X and Y c++ toolchain packages installed locally etc. No more trying to fix a broken setup after upgrading python or poetry versions, just "docker compose build" and you're up and running. No spending days getting a freshly cloned project up and running.

Then I point Pycharm to use that as my remote interpreter through docker compose.

Not watertight, but a hundred times better dev experience.


When the solution to basic problem like this is "use Docker", you realise how deeply flawed modern software development (in Python, at least) is.


Not sure I agree with "deeply flawed" verdict.

The flaws of python's packaging system (as well as the GIL and other hangups) emerge from the tradeoffs of inventing a language optimised for reusing and connecting as many disparate system binaries as possible.

Its not surprising that such a server scripting language is tightly coupled with the server environment that it runs in. Docker is just one way to ensure a reproducible server environment across machines.

What do you think it could have done differently?


What could have been done differently:

- Make dependency resolution deterministic by default. Unfortunately the whole ecosystem has to buy into it, but the benefits are huge.

- Stop building castles of sand by making more layers of tooling that have to run in Python (and will therefore run in some random poorly managed Python). The language install can't manage the build system install - if anything the build system install should manage the language install. Adding pip to the language distribution was such a backwards decision that it marks the point where I gave up on Python ever fixing their stuff.

- Ignore Linux system package managers (apt etc.), they have a fundamentally broken model and will infect your language ecosystem with that breakage if you try to cater to them.


to be fair to Python, almost every ecosystem is in "use Docker" land, even Go binaries will offer docker images!


I've tried this a couple of times but just generally not liked the experience and gone back. When you get a shell into the container so you can use your project's python env, the tools you have installed on your machine (bashrc, rg, aliases) aren't available. I ultimately prefer the hassle of managing environments natively over forfeiting the use of my tools or juggling shells.


I secretly believe Docker only exists because of how f'up Python's distribution story is.


I also have often wondered how much Docker owes its success to the shortcomings of python.


.deb packages…


On that note, why are we trying to use language specific packaging tools to build packages rather than just building OS specific packages where dependencies are handled by apt for deb packages or dnf for rpm packages?

These are language agnostic and can get the job done.


Because I have dozens of colleagues, each with their own os/distribution, developing an application that needs to lock down dot releases of dependencies to the versions running in production and update them at the same time. There is no way to do that using distribution specific tools without going insane.


>> why are we trying to use language specific packaging tools to build packages rather than just building OS specific packages where dependencies are handled by apt for deb packages or dnf for rpm packages?

>>

>> These are language agnostic and can get the job done.

> Because I have dozens of colleagues, each with their own os/distribution, developing an application that needs to lock down dot releases of dependencies to the versions running in production

I assume that the production environment isn't running dozens of os versions and distributions. For development, using a VM or container running the same os version and distro that's used in production and using that OS's package format for packaging the software and installing it in the dev environment (on the container or VM) would work. You're testing to see if the software works in the production environment, not someone's preferred os/distribution.

> and update them at the same time

I'm not as familiar with apt, but dnf has a version lock feature that would allow you to lock down the dependencies to specific versions. You could update them and test them during development and update the version lock file to pull down the updated version of the dependency when updating production.


For every dependency that's missing from the upstream distribution in the exact version we need, we'd need to package that appropriately. We have nothing to gain here, nobody pays for that.


Once you package a dependency, updating to a new version requires minor changes unless it's a major version change. What you do gain is the ability to easily upgrade and downgrade a particular dependency and verify the integrity of the installed files (something that pip, for example, doesn't provide as far as I'm aware, but the OS package manager does).


This is just a crazy amount of work, given the alternative is updating your whatever lock file and let the transitive dependency resolution of your language of choice do the rest. Note that I very much like proper clean .debs as an end user, and if that's my customer base I'd publish like that as well. But if I'm last in the chain and my customers are intern, I'd never ever in a million years take the route you propose.


because where you have to build 1 or 2 packages for windows and macOS (97% of the computers used by end users), you have to build tens of packages for the main linux distribs (and not even all)


> you have to build tens of packages for the main linux distribs (and not even all)

This is something that's typically handled by the package maintainers for each linux distro, rather than the ones who developed the application. Some developers maintain their own public repositories for packages they built and instruct end users to add their repositories to their package manager config, but they typically will also include the source archive (if it's open source) and build instructions for those running distributions they haven't built packages for.

Looking at Virtualbox[1], for example, in addition to Mac and Windows, they built packages for Redhat, Ubuntu, Debian, OpenSUSE, and Fedora, and they provided the sources along with build instructions[2] for those who are running distributions where there's no pre-built package. In fact, the last option is the most flexible one though it requires a bit more work for the end user.

[1] https://www.virtualbox.org/wiki/Linux_Downloads

[2] https://www.virtualbox.org/wiki/Downloads


I just checked and there are 1.3 millions packages in npm, 600k in pypi and 100k in nuget. While I'm sure that most of them may be either obsolete or useless, that's still an order of magnitude bigger than the packages proposed by a Linux distributions (60k for Ubuntu)

And virtualbox is actually a good exemple of what I'm saying: despite being one of the major software in one of the field where Linux is strong if not dominant.

- they feel obliged to distribute themselves their Linux packages - they have to distribute 1 package for windows, 2 for macOS but 12 for Linux.


I'm not sure what goes into the decisions package maintainers make in terms of which packages to include in the OS repository, but, in my experience with python applications, most dependencies we needed could be found in the OS repository or some other supported package repositories (for RPM, repositories like epel or rpmfusion). The few we weren't able to find weren't that difficult to package and add to our internal package repository.

But this also brings up the issue of vetting dependencies. If you're pulling in a dependency that pulls in 10s of other dependencies (direct and indirect), it gets difficult to vet them. PyPi and npm have already had issues with malicious packages being uploaded. On the other hand, I haven't really found the large number of dependencies being an issue for python packages available in the OS package repositories, and I'm not aware of any incidents with those repositories unlike PyPi and npm.


Last thing the Linux package maintainers need is the entirety of PyPI dumped onto their lap to package and maintain.


Do you happen to have a link to a python application that ships it's own dependencies and has state-of-the art debian packaging?


It depends what you mean by state-of-the-art, but it's possible with dh-virtualenv: https://github.com/vincentbernat/pragmatic-debian-packages/t... (not something that can become an official package as it goes against Debian policies)


> that ships it's own dependencies

You mean a link to a malformed .deb packages?

It's not difficult to do but it's also the wrong way to do it.


Honestly, there is a lot more I'd like to do with my life than caring about some debian packaging practices when all I want to do is ship a company's internal application with it's exact dependency versions. But it needs to work, and, e.g. "The virtualenv relocate-able flag has been always experimental, and never really worked;"

So in practice we're back to shipping whole distributions using Docker (or, like we did, abandon Python as an option for development but keep .deb packages for deployment).


No way. Docker is not designed for security.


Well, it's probably more secure than doing `pip install` and letting every dependency run whatever to install itself on your host...


Actually not at all assuming you are using a VM for a production workload.

Additionally, even if you only rely on a container - still no: other container engines have a lower attack surface.


I'm talking about local dev here.


It's interesting to note that PEP 582 (i.e. __pypackages__) mode used to be the PDM default -- and was sort of its flagship feature -- before being made opt-in with 2.0 release, the explanation being that the PEP had stalled, and that editor and IDE support for virtualenvs is much better (https://www.pythonbynight.com/blog/using-pdm-for-your-next-p...).

If you read through the discussion on the Python forum, one of the counterarguments to the PEP is that "If existing tools can already support this, why do we need a PEP for it?" But presumedly a PEP would help push the ecosystem to accommodate __pypackages__, and to solve the aforementioned problems (like broader editor and IDE support).

For what it's worth (as someone that builds Python tooling full-time): I'm generally a fan of moving to a node_modules-like model.


That tooling argument is quite weak. PDM is the only tool I was able to find that has support for __pypackages__, the paths it uses seem to be slightly different to the PEP wording, and it uses a $PYTHONPATH/shell hack (`pdm --pep582` adds a folder with `sitecustomize.py` that does the magic). How do you change the $PYTHONPATH used by an IDE (and its magic features)?


I agree!


Oh, also, I think I've read the entire Discussions thread on this PEP, and from what I can tell, the PEP is missing champions who are eager to push it forward.

Part of the problem, I'm guessing, is that the PEP kind of sits in a weird space, because it's not up to the PyPA to "approve" it, or whatever -- PEPs are approved by the Python Steering Council.

Per Brett Cannon (Steering Council Member):

> Either consensus has to be reached (which it hasn’t) and/or someone needs to send this to the SC to make a decision (although it is a weird PEP in that it also impacts packaging, so it isn’t even clear who would get final say).

https://discuss.python.org/t/pep-582-python-local-packages-d...


Today I was preparing a minimal project for something I needed to share with someone else. I wasn’t sure what tools they would already have installed, so I checked out Python’s official packaging docs and browsed around.

One page defaulted to giving instructions by default using “hatch”, while another page says the official recommendation is setuptools with a setup.cfg and only dynamic things declared in setup.py. Meanwhile, pyproject.toml support is being pushed elsewhere and in beta support for a lot of setup tools features.

There’s way too many tools and too much confusion around which ones to use and which ones are the best to choose going forward. Why can’t we just be like Rust and have one tool that builds and formats and runs tests and everything else?

Pipenv is crap but it’s somehow gained support of the PyCQA, meanwhile I’m not sure if poetry is even mentioned by the Python docs or CQA but it’s the one I’ve been using and it seems great, but I don’t even know if I’ve made the right choice anymore by using that.

All of this fragmentation just leads to developer confusion, newbies and people who have been using the language 10+ years alike.


My view is that it is not that hard (with any tool) to build a wheel. You can basically follow a guide, you have to supply the same information, it’s mostly just the format you do it in.

The problem is that compiled extensions are not a minor use case for Python, they’re hugely widely used for packages even if users don’t use them that much. Building a package with no dependencies is still not that hard. But there is no good way to distribute the compiled packages you depend on as their own thing - Intel release MKL as a “Python” package for e.g. but it is just the C library [1]. But of course, if you’re a package maintainer, the chance that the package maintainer of your dependency has both uploaded it to a package manager in a language they don’t use and that also being the version you need is pretty slim. So the issue becomes “how do I bundle every single possible dependency I might need inside a Wheel”, because source installs are not something most Python users understand. Conda and Spack try and fix this issue by being general package managers that can distribute packages in any language, allowing you to depend on FFTW or Eigen or SUNDIALS or whatever you need in your C extensions, and in all of the discussion I don’t think any proposal really tackles this gap.

[1] https://pypi.org/project/mkl/


I have a single, simple script (not a package!) that has dependencies. Actually, I have a few of these, just sitting in /usr/local/bin so I can execute them whenever.

How should I be managing environments for these scripts? Do I install dependencies in shared system python? Should I create a shared venv? Where should I store it? Any tools out there that make this decision for you and manage it?

Just the fact that homebrew occasionally updates my installed Pythons and breaks everything makes me reluctant to use Python as a scripting language at all. It's pretty reliable when I have a folder dedicated to a project (poetry is good for this) but I'm pretty fed up with how brittle it all is.


If you don't mind adding a pyproject.toml, you could use pipx[1] to install these scripts. The directory structure would look like this:

    <name>/
        pyproject.toml
        <name>.py
The pyproject.toml would look like this (using poetry, but you could use a different tool for this):

    [tool.poetry]
    # NOTE: Set <name> to your project name
    name = "<name>"
    description = "My Script"
    version = "0.0.0"
    authors = ["Me <me@example.com>"]

    [tool.poetry.dependencies]
    python = "^3.11"
    requests = "^2.28.2"

    [tool.poetry.scripts]
    # NOTE: Change <name> to match your package name in [tool.poetry]
    <name> = "<name>:main"

    [build-system]
    requires = ["poetry-core"]
    build-backend = "poetry.core.masonry.api"
Then you'd run `pipx install -e .` and the the executable script will be installed in ~/.local/bin.

[1] https://pypa.github.io/pipx/


Thank you for trying to provide a workable solution, it's really not bad, but it has some downsides for me. pipx itself is installed inside a python environment, so when brew breaks my pythons, it breaks pipx as well. Anytime brew breaks my pythons, I would need to do the install step again for every script (or write a tool myself which does it). Not a total deal breaker, but not really much better than my current situation which pretty much just assumes any version of `requests` or `fire` is acceptable. Because python itself is constantly being updated to break the base python environment on my machine, a workable solution would need to include the fact that the base python environment might need to have things installed.


One option could be to have Python installed in a separate location specifically for this purpose and to NOT include it in PATH. Then it is "out-of-sight" of brew and such packages and sort. You can even make the entire location read-only once you are done with installation of Python + pipx etc.


That would definitely work.


I use `pip install --user pipx` to install pipx, but I think a better option when using homebrew would be `brew install pipx`. With the latter, doing a `brew upgrade` should keep python and pipx in step.

EDIT: Thinking about this some more, doing `brew install pipx` would keep pipx from breaking on brew upgrades, but I guess your installed scripts would still fail if you upgrade from Python 3.N to 3.N+1.

So the only solution, I think, is to keep around all the versions of Python you use, assuming you don't want to upgrade all your scripts when upgrading Python.

I use pyenv to manage Python versions rather than Homebrew, and this seems to avoid the issues you're having, because the Python version used to install whichever tool via pipx sticks around until I explicitly uninstall it.


In an amazing coincidence, I just found another front-page post exploring this same topic! For Python, it seems to suggest using nix-shell.

1. https://dbohdan.com/scripts-with-dependencies


My comment with an example of a single file script which can specify its own dependencies:

https://news.ycombinator.com/item?id=34393630


Put them in ~/.local/bin, and that in your path. Then pip install reqs with —user.

If you want this automated use pipx, but it is overkill and loses simplicity.

The reason folks give dire warnings against simplicity is that sysad skills have plummeted in recent years. It’s easy to fix the rare conflict by putting the troublemaker in a venv. Outside of a large work project I never need to.


If you don't want to rely on the system or brew-installed Pythons for all your scripts, you might want to manage multiple Python installations with asdf-vm.

---

Disclaimer before I continue with my own suggestions: I have made small code contributions to pip and pip-tools, and maintain a Zsh frontend to pip-tools+venv, called zpy.

---

Suggestions:

- each folder of code maps to its own default venv, and may map to more venvs for different Python runtimes

- each folder of code has a requirements.in file with top-level dependencies, and a requirements.txt as a lock file

- you can either add a shebang line for your script which explicitly invokes the venv's Python and link that file into ~/.local/bin/, or instead create an external launcher script in ~/.local/bin/.

---

Here's an example of how that might be done using zpy functions:

  $ mkdir simple-scripts
  $ cd simple-scripts
  $ envin  # or in subcommand form: zpy envin
  ==> creating -> ~/.local/share/venvs/280…/venv :: ~/Code/simple-scripts

  $ pipacs httpx
  ==> appending -> requirements.in :: ~/Code/simple-scripts
  httpx
  ==> compiling requirements.in -> requirements.txt :: ~/Code/simple-scripts
  anyio==3.6.2              # via httpcore
  certifi==2022.12.7        # via httpcore, httpx
  h11==0.14.0               # via httpcore
  httpcore==0.16.3          # via httpx
  httpx==0.23.3             # via -r requirements.in
  idna==3.4                 # via anyio, rfc3986
  rfc3986==1.5.0            # via httpx
  sniffio==1.3.0            # via anyio, httpcore, httpx
  ==> syncing requirements.txt -> env :: ~/Code/simple-scripts

  $ print -rl -- 'from httpx import get' 'print(get("https://ifconfig.co/json").json())' >do_it.py

  $ vpyshebang do_it.py  # or: zpy vpyshebang do_it.py
  $ ln -s $PWD/do_it.py ~/.local/bin/do_it
If a Python runtime update breaks your venvs, you can probably fix things up using zpy's pipup function.


I use a separate directory and venv for each script. To execute the script, I use a shell script to call the venv's python interpreter. This is also how I use python scripts with cron/systemd.

    #!/bin/bash
    # myscript.sh
    venv/bin/python3 myscript.py
You could also skip the shell script and use aliases in your .bashrc.


I do something kind of like this, but all of my scripts break when the underlying env is suddenly broken when e.g. brew updates python without asking me and breaks all existing environments.

I'm sure I could come up with solutions that are very robust for my particular machine, but I would like something that allows me to share it as a gist and a teammate could just as easily use it, or I can use it myself on another machine without hassle. In other words, a solution contained within the script iteself and maybe one or two binaries outside that make it possible.


I see, it might be heavy handed but running them inside Docker containers might provide you with the isolation you're looking for. You could also build and share these images with your teammates.

I've actually started using a lot of different CLI tools with Docker, especially when the tool isn't available for my OS.


I don't understand how there still isn't a good answer for this. It seems like such an obvious need, and for years I've heard go being promoted for solving this problem. I get why Python didn't start out with an answer for this, but in an era of 20 TB hard drives, I'm more than willing to burn space for ease of install.


Trying to build Python under Nix (i.e. solid, not “close enough) is an education in how fucked up the Python packaging ecosystem is.

Which is a shame because Python is an amazing scripting language, the de facto numerical computing standard, and probably a better bash for most bash use cases.


One of the problems with Python packaging is Python Core Devs view “Python packaging” and “packaging Python” as two different things, but most people don’t bother with (or can’t tell) the difference :)


I must admit I just gave up on all these tools. Instead I just do

    pip install -r requirements.txt -t .lib

and I have

    import sys
    import os

    sys.path.append(f"{os.getcwd()}/.lib")    

in the top of my script. Some will tell me this is silly, but it just works. Rememer to at .lib to your .gitignore. Else you'll have lot of fun.


As the person who implemented the -t flag and knows the horrendously sharp edge cases that it fails on, all I can say is bless your heart.


I'd be curious to know what they are, as I'm interested in giving the -t flag a try myself.


Please do elaborate before digging my own grave, thanks :D


I did some looking around, and it sounds like it fails to work correctly with namespace packages. Something about the way you have to add them to the path leads to them being later in the path, so there could be times where they don't get found because something earlier in the path gets used instead.


I tinkered with this a little, and it's interesting. But what about using pip list or pip freeze? They don't show the packages in the .lib directory. Do you just add them by hand to the requirements.txt?


I add them by hand to requirements.txt and I try to keep it as minimum as possible. Yes it makes it not 100% reproducible, but it forces me to think about each dependency and I can see in my git log why it was added.


It's funny how "pip help list" or "pip help freeze" works. :-) I found I can add --path .lib to list/freeze those. It's an extra step, but not too terrible.


This is similar to how node is doing. I just basically copied that idea.


So we should all use PDM then as it's the best so far among all others? I read it somewhere once in the past but never used it. I just use `venv + pip` all the time so far, worked well for my case though.


So, I have recently returned to Python development after several years out. In my first project [1], I was building on an existing library that was already using Poetry - so obviously I went with that for my work. Although it was a bit of a learning curve, I quickly got accustomed to it, but still wondered why it had come about given my recollections of the other tools (virtualenv et al.) being “good enough”.

Then, more recently, I had to run a different project that lacked any documentation as to how I was to run it, had a setup.py file, a Pipfile, and more in it. In trying to get this to run, I managed to make a real pigs ear such that (no doubt thanks to my lack of experience with those tools) I eventually had to delete all my virtual environments, as none of them worked anymore…

So yes, I am 100% in the “one tool to rule them all” camp these days - and although PDM does look promising, right now it isn’t offering me anything above Poetry that I care that strongly about.

As Cato might say, PyPA delenda est.

1. http://github.com/grafana/pySigma-backend-loki


> There are two tools for dealing with packages in the Node world, namely npm and Yarn.

Not to mention pnpm. Or Bun. Or even Deno. All of which deal with packages, and make different choices.

This isn't to be a pedant, but I think it's a reasonable example of why it's less important that there's a single way to do a thing, or even a single tool vs. multiple alternatives; instead, what I care about is that I don't have to string together tens of tools to do that thing, and that there's some degree of compatibility and standardization in the API between those different alternatives.


Well, no. The JS package ecosystem is a slightly lesser mess than the Python ecosystem but only slightly so. I think it's a bad example to base any redesign off of. At least the author brought up .NET which mostly has its shit together when it comes to builds and packaging.


One more thing we need is app distribution for desktop apps written in python. I need to be able to package python app as single binary. There are tools like pyoxidizer https://pyoxidizer.readthedocs.io/en/stable/ . hopefully it become standard in python community


You probably already know about this, but Nuitka[0] is pretty great for building distributable Python apps.

[0]: https://nuitka.net


Note this post is 8700 words long and it doesn't really delve into how bad it is to build packages on Python; mostly it's about how bad it is to use them. It's an excellent writeup and the length is necessary. My guess is a post about creating packages would need to be 2x as long for the same level of detail.


I don't agree with all the points from the article, but I do agree there is a depth of learning to be had about creating packages (and doing it repeatably/scalably). I wrote a book about creating Python packages that just came out: https://pypackages.com

Even this book doesn't cover all options in each area, and it skips almost wholly over conda because I have no personal experience using it. conda and the work in the scientific community adds complexity both to the creation and the consumption side of packaging, and that's one area I'm not sure this post covers all the nuance of when considering how a "one size fits all" solution might work in practice.


Reading this is really depressing. I just want Cargo for Python. Poetry it is for now, but it has quirks, and it is dog slow...


A lot of people said they want Cargo for Python but immediately backed out when you require them to `cargo run` their program. They want `python myscript.py` to still magically work, but that’s exactly where a lot of the magic comes from. Running `python` directly is like manually invoking `rustc` (or maybe slightly more automated like a make script); it works, but is on an entirely different layer of abstraction and doesn’t fit well with the Cargo one.

I’m not trying to imply you’re one of those people, but this may provide some insight why “Cargo for Python” is not a more widely adopted workflow.


That is funny since poetry modus operandi is 'poetry run python ...'

On topic, "Cargo for Python" should be official, singular (no competing standards), opinionated and fast (tm).


Is there any reason why `python foo.py` couldn't do the equivalent of what `cargo run` does, if that's the expectation of most users?


Most of those workflow tools support at least two execution contexts, one for development and one for deployment (`cargo run` vs executing the compiled binary directly, `npm run` vs `node foo.js`, etc.). There are various reasons for the separation including hidden build steps or to set up the runtime environment, generally you want certain things automated during development but not when the program is deployed. A more accurate way to say this `python foo.py` can’t replace `cargo run` while still doing what it currently does; obviously it could be changed to run the program in the development context. The problem is it’s currently used in multiple contexts, and nobody really wants their workflow to break.


The main reason I have stayed away from Python. packaging is a mess!


And one of the reasons I moved away from it! My blog post on the subject was posted here and it was very unpopular despite having similar points and also making comparisons with Node.

There's nothing special about Python that makes it worth enduring this pain. We have other languages with a better developer UX.


Well, not for ML we don't. Python killed it all off. If you plan to work on core algorithms or finetuning existing models there is nowhere to run.


I'd be keen to read your blog post if you have a link?


https://cedwards.xyz/breaking-up-with-python/

Bear in mind this was not written for an audience, let alone a HN audience.


Thanks, appreciate the link, I'll be back to yell at you shortly :D

...and having read it, the only thing I'd yell is that "I AGREE WITH ALL YOUR PAIN POINTS". Especially around documentation and package management.

I do like type annotations for improving the reading experience of Python codebases, even if you're not using a type checker, but trying to get people to use them consistently for new code in an org that has a large pre-existing codebase involves either a lot of carrots, or some CI based sticks, and I don't quite have the organisational influence to just drop that on people... ...yet.


I think it's crazy how they don't want to change cause it'd disrespect the work of maintainers of existing projects. Python packaging is screwed if maintaining all packaging software at the cost of functionality is the goal


At a previous company I had a chance to sit down and talk with python developers across a number of groups, and a fun question was asked: Imagine Python 4 is another large breaking change on the scale of 2 to 3. What feature makes it worth it to you?

The most common and enthusiastic answer was: Packaging and distribution.

I think to say that "large numbers of python developers think packaging is bad and should be fixed" undersells the current sentiment. I think there are large numbers of python developers with boots on the ground who are _utterly sick to death_ of the problem and willing to accept drastic measures resulting inconsiderable suffering if just means it means the issue goes away. People are in other words, getting rather desperate and discontent.

Upstream needs to do a little soul searching and come to an understanding that whatever ideological rock they're chained to that is preventing some solution from emerging has a high chance of not being moored in the needs of reality.


Well who is upstream? Pypa? Python steering council? Python itself won't do much without pypa approval. Pypa won't endorse an existing or new tool that solves most mainstream cases well.


It seems most people agree that Python's packaging isn't great, but, conversely, is there a language where most people agree that the approach to packaging is awesome? I mean, what's the gold standard to aspire to?


Don't know if it counts but Java has done pretty well in my opinion. A single JAR-file can combine everything you need to run an application or be distributed as a library, and as long as the java command is installed it just works.

Likewise WAR's make it dead simple to add/remove/update applications during runtime.

Of course there are always special cases, but packaging and distribution in the Java world have always been painless to me.


Python has similar facilities for distribution, since it can load code directly from .zip files, treating them as if they were directories - very similar to JAR. The problem is on the stage where you have to acquire and manage those dependencies.


After joining a company with a Python codebase, goddamn I miss Java so damn much.

For a single example, all packages in repositories are namespaced, and namespace ownership is verified. [0]

So there's no chance of typosquatting an existing package, nor do you have to worry about someone jumping in and claiming all the "good" names.

[0]: https://central.sonatype.org/publish/#individual-projects-op...


I think this is what success for unification looks like:

Let's say there are two tools, one called pyup and one called pygo.

    pyup: responsible for which versions of python are available on your machine, as well 
          as which one is the default "system" python. It also keeps itself and pygo updated.

    pygo: responsible for (among other things) 
          - allowing a user to specify a python version for a project
          - allowing a user to specify a python version for a script
          - helping the user install a compatible python version for the project with pyup
          - helping the user install a compatible python version for a script with pyup
          - selecting a compatible python version/env for project
          - selection a compatible python version/env for a script
          - allowing a user to specify project dependencies
          - allowing a user to specify script dependencies within a script
          - determining which dependencies are required for a project
          - determining which dependencies are required for a script
          - installing project dependencies
          - installing script dependencies
          - installing projects or scripts as executables
I MUST NOT need a base python install to make any of this work. Tools are constantly mucking with system python and which one is on my path so I can't trust system python, period. pyup and pygo should be their own binaries which invoke python. example.py:

    #!/usr/bin/env pygo
    # version: ~3.11
    # requirements:
    #    requests~=2
    import requests
    requests.get("https://example.com")
When I run ./example.py for the first time:

    - pygo helps me install python 3.11
    - pygo installs requests in a location that I don't have to worry about
When I run ./example.py for the second time, the script runs without error.

If I still need to use something like virtualenv, poetry, or conda on top of this, the unification project has failed.


Gentoo's system sort of does both of these things. Among other things, you can control which packages are installed for which Python versions, dependencies across versions and default version for the system. Though it doesn't use the PyPI ecosystem directly, so everything has to be individually packaged for it to work.


I don’t think there’s much complaining in the C#/.NET ecosystem. Sure, there are alternate tools (FAKE for F#), but I believe most people are happy enough with dotnet.exe/msbuild/nuget.


I think Go (if you started using it in the `go mod` era) and Rust have the best stories around package management.

The interface around `go mod` is kind of confusing, but I have actual trust in the dependency graph it generates for me. Cargo has, afaict, nailed both the interface and trust in what's going on under-the-hood.

In the Python world, Poetry isn't too bad. It's terribly slow in comparison to `go mod` or Cargo, but I generally trust and understand what's happening on the inside, and it's interface is fairly legible to newcomers.


I'm a big fan of `mix` for Elixir. https://hexdocs.pm/mix/Mix.html


This. Mix is basically a mix (pun unintended) of Leiningen and Bundler. It works. Quite well.

Part of this is due to erlang solving a lot of the problems upfront (releases) and then the Hex team handling a lot of the rest.

But in general, the "Yehuda Katz" lineage (Bundler, Cargo, Yarn v1, mix as José used to be mentored by Yehuda) have pretty good tools worth copying. At least as a base.


Ruby gems and Cargo are pretty awesome. Also, I find I like Arch linux's Pacman quite a bit, though that's a slightly different use case where versioning isn't resolved.


Cargo is a gold standard imho, but there are definitely some simplifying decisions it makes that might not fit Python:

- Dependencies and executables can run arbitrary code during build. For example, Cargo knows almost nothing about how to build C code, and the common workflow is to pull in the popular `cc` library for this and call it in your build.rs.

- There's mostly no such thing as "installing a library", and each project builds its dependencies from source. This is baked in at the Cargo level (the install command just doesn't work for library crates) and also at the language level (no stable ABI besides "extern C").

- Related to that, the final product of a Cargo workflow is a mostly-statically-linked binary, so there isn't really an equivalent to virtualenv.


> - Dependencies and executables can run arbitrary code during build. For example, Cargo knows almost nothing about how to build C code, and the common workflow is to pull in the popular `cc` library for this and call it in your build.rs.

Python packaging involves arbitrary code execution, even today. While it’s less obvious than it was with `setup.py`, if you’re installing from source, the package can specify any build backend it wants, and that build backend can do anything. This could be mitigated by having an allowlist of build backends, but who would be responsible for vetting the tools, and how would it be guaranteed that an allowed tool is not taken over by someone hostile, and that an allowed tool does not have a system to run arbitrary package-provided code?


I don't think the final build output being statically linked has any connection to a virtualenv. I've only ever used Python's virtualenvs to avoid having to install dependencies globally. When using Cargo you don't need to worry about that because their data is saved per-project already and Cargo can sort itself out when you invole a command


R with CRAN. renv for reproducibility


What is clearly still missing on python.org website is the obvious edit-code-run-debug step-by-step HOWTO for a typical tight but circular Python development iterations, complete with standardized packaging approach.


https://chriswarrick.com/blog/2023/01/15/how-to-improve-pyth...

Is quite damning on how much of a better experience Node/JS/TS NPM is than Python's clusterfuck. Working with any other JS/TS open source is easy to get running. Even if you use Poetry in Python, you will run into everyone else using pip-tools, pipenv, pdm, conda, etc.


Glad to see this getting discussion. I hate virtualenvs for several of the reasons mentioned in the article - extra state to keep track of, and potentially installing packages to the wrong location if I failed to keep track of it.

I always just force my user packages to work. I usually only use numpy/scipy/matplotlib and occasionally a few others so its not that hard, but some sort of npm like experience would be welcome. I know that many are struggling with these environments.


As someone who has next to no experience with npm, which benefits of npm would you like to see in venv or package management in Python?


I think the article describes it pretty well, it actually highlighted a few features of npm I wasn't familiar with. But to speak to my own experience, it's convenient that npm installs all the packages you need into a project-local node_modules folder. This eliminates the need for venv.

Mostly I just want to see venv eliminated. I really don't like the workflow it requires.


Not being able to exclude system packages from resolution when using __pypackages__ is a pretty glaring omission that will force people to continue using venvs until it’s fixed.


conda is a bleeding pain in the ass; installed a ton of stuff without even asking if I needed them. Then when I wanted to install a different version of Python, kept running conflict resolution for hours before I got tired and killed it. Might as well just do the setup oldschool via virtualenv using requirements.txt.

Dealing with all this is why I chose to use golang for writing a CLI utility (even though I'm not a big fan of its error handling boilerplate); static typing + produces a single binary that can be run without needing any of the environment setup. I am aware of various Python tools that can produce a binary too but I think they have their own edge cases and it is just nicer when that can be done out of the box without any dependencies.

> You can also find deficiencies in the tools for the other languages mentioned. Some people think Maven is terrible because it uses XML and Gradle is the way to go, and others think Gradle’s use of a Groovy-based DSL makes things much harder than they need to be and prefer Maven instead.

Yeah but I have never had the installer crap out like pip or its ilk do when installing a package with maven. At worst, it can't find the package if the repositories are not configured properly.


Despite being misguided in a few points, the best bit about this article is the detailed table comparing the functionalities of different Python packaging tools.

One missing detail is the ability of PDM to support different build backends, which allows for some interesting capabilities: for example, using hatchling as the backend it is possible to utilise hatch's support for dynamic versioning, whcih does not exist in PDM proper. I haven't tried, but wouldn't be surprised if that could allow PDM to support C-extensions by using setuptools as the backend...


One of the misguided points is this:

> It is also notable that PEP 20, the Zen of Python, states this:

> There should be one-- and preferably only one --obvious way to do it.

> Python packaging definitely does not follow it. There are 14 ways, and none of them is obvious or the only good one. All in all, this is an unsalvageable mess. Why can’t Python pick one tool?

So a few comments on that:

1. There is a reason PEP 20 is called "Zen of Python" and not "Zen of Python Packaging".

1a. Even if it applied to the ecosystem and not just the language itself - PEP 20 is a guide, not a set of divine laws.

2. There is one obvious way, right there in the Python documentation. [0] Yes, it lists several different tools, but "a way to do it" and "a tool to do it with" are two very different topics.

3. "Why can’t Python pick one tool?" I never understood this fixation on silver bullets and "one tool to rule them all"... As long as common principles are well defined - which has been true for Python for quite some time even before PEP 517, with things like PyPI and pip - what is the harm in having multiple competing solutions?

[0] https://packaging.python.org/en/latest/tutorials/packaging-p...


> Why can’t Python pick one tool?

Sometimes I wonder if developers are just making new tools out of self-interest and they want their name on some well-known project, and that's why they don't just work with one of the existing projects to implement their ideas and move the overall community towards a common working model.

> what is the harm in having multiple competing solutions?

Developer confusion and community fragmentation.


I am the author of pigar[1], and I am using Go a lot, Go has its problems too, but I am a fan of `import "url"` style import statement, developers can write code first, and sync the dependency later with `go mod tidy`.

To fix problems in Python's world, Python's community should simplify the tools and cultivate a habit to declare the dependency first(maybe this should be mandatory).

[1]: https://github.com/damnever/pigar


Cant resist digging at Node.js even when writing up how infinitely better Node.js is at dealing with packages than python, haha:

> Let’s try removing is-odd to demonstrate how badly designed this package is:

You literally just deleted is-even 's dependency on is-odd then have the audacity to be shocked that it broke?

There's a lot of hatred for the small package philosophy of node.js, but it's also a huge win, stands a good chance of being why javascript has been such a winner, gotten so far: very explicit small things that say what they do on the tin. Rather than repeat yourself by making a copy pasted is-even and then maintaining both, it makes perfect sense to compose functionality, to build off what we have. And it is easier to understand the scope of what a package is & what it can do when it is explicitly limited in scope.

This is another place where there is lots of loud vociferous animosity against what is, but it's there for good reason. And with rare rare exception- left-pad rage-quit deliberate breakage, for example, it serves well. With the exception that yes, inventorying your stuff is hard.


I’m biased, but I do like Python’s package management much better than Node. Even just regular old virtualenvs. Can’t tell you how many times deleting the node packages dir and reinstalling fixes a weird issue, but that’s happened very rarely for me on the python side.

Also, having to comb through a bunch of node packages of dubious quality to find some sort of standard approach happens way too often. Like take python requests vs axios.


I'd say that Node's package management has run laps around Python's, to the point where it's pretty embarrassing. Since it's relatively new, it was able to hit the ground running with best practices:

1. Declarative package manifests. Python's ecosystem is still a mess of various approaches, and the fact that you have to run the setup.py script to determine dependencies is a nightmare. Because of this, running dependency resolution and installing is an order of magnitude faster in Node than in Python.

2. Drop-dead simple isolated environments: everything's in `node_modules`. You literally can't make a mistake by blindly running `npm install` in the project dir. With Python it's on you to manage your virtualenv, which boils down to PATH manipulation and symlinks that you'll have to remember to undo when switching around. There's no default for what to call your venv either, so it's on you to settle on a standard and gitignore it. Every time you run `pip install`, you have to hesitate and make sure you're in the right environment, or else risk borking the wrong env (or your global!)

3. Out-of-the-box comprehensive lockfile support. Debugging Python dependency issues is a nightmare. There's literally no way to figure out why a dependency was installed without using 3rd party tools (like pipdeptree). In Node, simply running `npm install` will automatically generate a proper lockfile.

I work full stack, and the difference is like night and day. I barely think about dependency management in Node.


Respectfully, this reads like you're using outdated Python tools.

Give Poetry [1] a shot, it has all the things you've listed here. Just as Node.js has come a long way in the last 5 years, Python has, too. Albeit, in a fashion that was much less centralized, arguably to Python's detriment.

[1]: https://python-poetry.org/history/#100---2019-12-12, released 1.0 in 2019/12.


Oh yeah, I've used Poetry. I'm talking about Python out of the box. The dependency management ecosystem is super fractured and relies on 3rd party tools developing their own standards, and even then, they can't overcome fundamental limitations. For example, Poetry is great but locking/installation still takes way longer than Node because there are no declarative manifests.


Poetry can not give you things that do not (always) exist in the ecosystem, like properly declared dependencies [1], which are a must in most other package managers.

Poetry can be painfully slow if you happen to depend on many packages that only use `setup.py` to define dependencies.

[1] https://python-poetry.org/docs/faq/#why-is-the-dependency-re...


I've tried with venv before, and the console (Iterm) just get broken.


There’s a place for small packages, but is-even/is-odd is a bit too small to be a reasonable package. It is far easier to just write `x % 2 === 0` inline, which is an obvious idiom, instead of installing and importing a separate package for this. The use of is-odd by is-even can be confusing for users. For example, you may call isEven(0.5) and get the following error:

RangeError: is-odd expects an integer. at isOdd (/tmp/mynodeproject/node_modules/is-odd/index.js:17:11) at isEven (/tmp/mynodeproject/node_modules/is-even/index.js:13:11) at Object.<anonymous> (/tmp/mynodeproject/index.js:3:13)

(But the main point of the demonstration was to showcase dependency resolution and where it looks for packages.)


isEven is in thay stack trace - should not confuse anyone with even a basic introductory level fluency at coding.

Is it too small? What if latter the language evolves BigInt? Donwe suffer a patchwork of libraries which have & havent upgraded, sussing around each time to find out?

I think the key thing to recognize is that this is all opinion. Many people dont like the availability of many opions, the ease at which dependencies have grown. And that's fine, there's some real pain here to having ballooning package trees. There's a pevel of conceit though that I feel that often arises, where we mock & shiv packages like is-even . But to me, it's not absolute, it's a matter of taste & preference. It looks weird to outsiders, but it has been enormously powerful & helpful, has been such a key successful element of JS that npm arose & made package management easy & package publishing easy & that we begat a new behavior of capturing all the little helpful things we do & making them available.

Maybe there are good reasons for inlining simple things, but it's not clear to me what the gains really are, or what's wrong with is-even.


Doubling down, there's just huge conceit that small modules are bad. And no evidence. People just love to hate. Dark side is easy & convenient & in reach, and one gains social clout, feels elite, by shitting on others. Boo to you.

It's not easy, it's not tame, but it's unclear what negativity beyond mild inconvenience has been generated. And much of the harm can be diffused with more sensible protection, not simply giving all modules access to everything. Systems like WASI are finally engineering inbuilt protection to sensibly de-risk imports; this is a fault of runtimes for not offering us protection, not our burgeoning package ecosystems for having value, growing.

It's still unclear what the worthwhile protests are.


It would be nice if major node packages invested time in re-inventing the wheel, just a little bit.

Back when I used it, I really appreciated how little bloat Hapi added to node modules, compared with webpack for example.

Obviously there's a world of difference in what problems the two solve, but still...


Every time someone asks me what I think is wrong with Python packaging, I’ll show them this link.

Saved. Thanks for sharing!


Regarding system package upgrades breaking virtual environments: would hard links address this? If the system package manager removes a file, your venv still has a hard link to it so it doesn't see a difference.


It would also prevent the Python in your venv from receiving any updates when the system-wide Python is installed. At that point it's probably easier to just use your own Python installation, e.g. with pyenv.


Sure but installing a new copy would defeat the space savings. The goal of this was to get the desired effect without wasting as much space. I wasn't trying to optimize purely for ease of use.

Put another way, my suggestion was to migrate from symbolic links to hard links.


Time for GvR to saddle up the BDFL pony for a (last?) ride?


Glad to see dotnet CLI get a shout out. That tool really makes cross compilation a breeze. In Python I usually give up and resort to Docker.


Interesting article though it's missing at least a mention of Nix, which IMO solves a lot of the pain points described.


I get the impression the author has not been exposed to enough pain doing package maintenance on a webpack codebase.


been using pdm since at least one year and have no regrets: it's the go tool and we should flock to it!


I use sites as composable mini venvs, but there's no tooling for that


Please allow me to vent on python packaging.

I was working to debug a very weird error that was happening in my code only when my app was raising a warning through my `warnings.py`.

My code was importing numpy, which in turn was trying to import its own `warnings.py`, but because of the python environment precedence, it was loading my own `warnings.py` which unfortunately had the same name.

How on earth was this something that passed the design of the import / package system of python? Why there are not unique id’s for the imported modules?

TLDR: An imported package took dependency on my own code because of module name clashing and lead to unexpected behavior.


How does 'warnings.py' work with the rest of the code? Intrigued as I've put myself into a corner with the way warnings/errors are handled in my project.

By the way I think there is a convention to handle these issues but it appears numpy isn't using it. If I remember correctly, any module with __ infront will be called {package}.__{module}.


Just do what Go did.


Python packaging is a solved problem:

https://python-poetry.org/


as a Poetry user myself, ummmmmmm. no. lol


Poetry is one of the better options, but its nonstandard pyproject.toml is not ideal. PDM is basically Poetry, with the metadata standard in place, and with support for __pypackages__.


I would like to add that PDM also supports having multiple dependencies groups, and the developer is superfast at fixing issues.


It's not really solved if you have no choice but to work with codebases that don't use Poetry (quite common). There's 14 tools for people to choose from that aren't going away any time soon.


Poetry is one of the worst modern solutions to python packaging, either pick hatch or PDM, they both have more and better implemented features plus their codebases aren't a complete mess


Yes, there should be (one way) to do packaging.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: