If anyone hasn't seen it, now is a good time to look at https://python-poetry.org/ It is rapidly becoming _the_ package manager to use. I've used it in a bunch of personal and professional projects with zero issues. It's been rock solid so far, and I'm definitely a massive fan.
I occasionally ask our principle / sr. Python engineers about this, and their response is always, "These things come and go, virtualenv/wrappers + pip + requirements.txt works fine - no need to look at anything else."
We've got about 15 repos, with the largest repo containing about 1575 files and 34MBytes of .py source, 14 current developers (with about 40 over the last 10 years) - and they really are quite proficient, but haven't demonstrated any interest at looking at anything outside pip/virtualenv.
Is there a reason to look at poetry if you've got the pip/virtualenv combination working fine?
People who use poetry seem to love it - so I'm interested in whether it provides any new abilities / flexibility that pip doesn't.
I will give it 5-10 years. If they are serious about it, it will last that long. They're about halfway there.
Package management is terrible work. Nobody appreciates it. It's extremely complex, it has to work in enormous numbers of configurations, and very minor errors can have catastrophic impact to security or availability of systems.
How do you deal with keeping your top level dependencies and exact versions of all of your dependencies of dependencies separate in a way that's sane and 100% reproduceable for a typical web app / repo that might not in itself be a Python package?
I'm in the same boat as you in that I'd like to keep using pip but the lack of a lock file is very dangerous because it doesn't guarantee reproduceable builds (even if you use Docker).
In Ruby, Elixir and Node the official package managers have the idea of a lock file. That is the only reason I ever look into maybe switching away from pip.
Running a pip freeze to generate a requirements.txt file doesn't work nicely when you use a requirements.txt file to define your top level dependencies.
I've been bitten by issues like this so many times in the past with Python where I forgot to define and pin some inner dependency of a tool. Like werkzeug when using Flask. Or a recent issue with Celery 4.3.0 where they forgot to version lock a dependency of their own and suddenly builds that worked one day started to break the next day. These sets of problems go away with a lock file.
> How do you deal with keeping your top level dependencies and exact versions of all of your dependencies of dependencies separate in a way that's sane and 100% reproduceable for a typical web app / repo that might not in itself be a Python package?
`pip-compile` from `pip-tools` is my go-to for this.
> Running a pip freeze to generate a requirements.txt file doesn't work nicely when you use a requirements.txt file to define your top level dependencies.
Use setup.cfg to define your top level dependencies. Use requirements.txt as your "lock" file. But even then you won't get reproducible builds across different OSes, or with different non-Python things installed on your machines. Use Docker images to guarantee staging and production will be identical.
This isn't quite the same. Yes, I can update my root dependencies in the `requirements.txt` file, run `pip install -r requirements.txt` and the run `pip freeze > requirements.txt`, but that's convoluted and requires me to know exactly what my root dependencies are. Is `astroid` something our tools use directly, or is it just a dependency of `pylint`? It's not clear. A lockfile clears this up.
Yes, and in addition to the requirements file, pip supports a constraints file, which is the lockfile you describe. It's separate from the requirements file. It solves exactly this problem.
> Including a package in a constraints file does not trigger installation of the package.
Maybe I'm not following something but how do you get all of this to work like a lock file in other package managers?
Let's use Ruby as a working example:
1. You start a new project and you have a Gemfile.
2. This Gemfile is where you define your top level dependencies very much like a requirements.txt file. You can choose to version lock these dependencies if you'd like (it's a best practice), but that's optional.
3. You run `bundle install`
4. All of your dependencies get resolved and installed
5. A new Gemfile.lock file was created automatically for you. This is machine generated and contains a list of all dependencies (top level and every dependency of every dependency) along with locking them to their exact patch versions at the point of running step 3.
6. The next time you run `bundle install` it will detect that a Gemfile.lock file exists and use that to figure out what to install
7. If you change your Gemfile and run `bundle install` again, a new Gemfile.lock will be generated
8. You commit both the Gemfile and Gemfile.lock to version control and git push it up
At this point you're safe. If another developer clones your repo or CI runs today or 3 months from now everyone will get the same exact versions of everything you had at the time of pushing it.
It should be the same process, except that the constraints file is not automatically created or detected, so step 5 would be "pip freeze >constraints.txt" and step 6 would be "pip install -r requirements.txt -c constraints.txt".
The top level dependencies go in requirements.txt and trigger installation of those packages. Everything else goes in the constraints file, which constrains the version that will be installed if something triggers an installation of the package, but it doesn't by itself trigger the installation - it only locks/constrains the versions.
Mainly because you can't run pip3 install -c requirements-lock.txt on its own it seems. It requires the -r flag.
That is a lot more inconvenient than running `bundle install` and if you use Docker it gets a lot more tricky because a new lock file would get generated on every build which kind of defeats the purpose of it, because ideally you'd want to use the existing lock file in version control, not generate a new one every time you build your image.
Nice! I’ll be adding this to my virtualenv + requirements.txt + pip process. Not sure why everyone wants to overcomplicate Python dependency management with pyenv/poetry/etc.
You can also, thanks to the weird way requirements.txt works, put the line "-c constraints.txt" in requirements.txt. In that case you don't have to specify it when you run pip.
That should apply the constraints when installing packages. I don't know if there's also a way to validate what's already installed.
I'm not sure what you mean by, "How do you deal with keeping your top level dependencies and exact versions of all of your dependencies of dependencies separate in a way that's sane and 100% reproduceable for a typical web app / repo that might not in itself be a Python package?"
We use requirements.txt + Docker/k8s to lock in the OS. All of the versions of python modules are defined like:
six==1.11.0
sqlalchemy==1.2.7
squarify==0.3.0
Which locks them to a particular version.
What type of dependencies aren't covered by this (I genuinely am a novice here so would love to be informed where this runs into problems)
Those aren't very locked down. If you install celery today you might get vine 5.0.0 but in X months from now you might get 5.9.4 which could have backwards compatibility issues with what celery expects.
So now you build your app today and everything works but X months from now you build your same app with the same celery version and things break because celery isn't compatible with that version of vine.
This happened a few months ago. Celery 4.3.0 didn't version lock vine at all and suddenly all celery 4.3.0 versions broke when they worked in the past. That was tracked at https://github.com/celery/celery/issues/3547.
Docker doesn't help you here either because your workflow might be something like this:
- Dev works locally and everything builds nicely when you docker-compose build
- Dev pushes to CI
- CI builds new image based on your requirements.txt
- CI runs tests and probably passes
- PR gets merged into master
- CI kicks in again and builds + tests + pushes the built image to a Docker registry if all is good
- Your apps use this built image
But there's no guarantee what you built in dev ends up in prod. Newer versions of certain deps could have been built in CI. Especially if a PR has been lingering for days before it gets merged.
A lock file prevents this because if a lock file is present the lock file gets used, so if you built and included a lock file in version control, then CI will build what you pushed from dev, so the chain is complete from dev to prod for guaranteeing the versions you want. That is how it works with Ruby, Elixir, Node and other languages too. They have 2 files (a regular file where you put your top level deps and a machine generated lock file). A lock file in Python's world would translate to what pip3 freeze returns.
Thanks very much - your description of our workflow is really good (and is pretty close to exactly what we have!)
I don't understand what dependencies everyone keeps talking about (which seems to be a big deal with Poetry) - when you run:
pip freeze
It captures every single python module, dependencies as well. Because everything in the dependencies file is listed as:
aaaaaaa==xy.z
You are guaranteed to have the exact same version.
We have all sorts of turf wars when someone wants to roll forward the version of a module, and, in the case of the big ones (Pandas) we sometimes hold off for 6-9 months before rolling it forward.
But there is something that Poetry is doing that is better than "pip freeze" - I think once I figure that out, I'll have an "aha" moment and start evangelizing it. I just haven't got there yet.
Why not keep the same docker image throughout then lifecycle? E.g. merge to dev branch, trigger ci (build image at this point), maybe deploy to a test environment, run more tests, then deploy to prod. No chance of packages changing since the image isn't rebuilt. Of course if not using docker, a lock file (i.e. actual dependency resolution) would seem essential for reproducibility.
First, how did you generate that requirements file?
Second, how do you separate dev dependencies from prod dependencies, and how do you update a dependency and ensure all of its transitive dependencies are resolved appropriately?
Lists every python module that's been loaded into the virtualenvironment. So, from my (admittedly new) understanding, that means we guarantee that in the production/devel/docker environment - every python module will be identical to whatever was installed in the virtual env.
Dependencies and transitive dependencies are guaranteed to be resolved/ensured because we list everyone one of them out in the requirements.txt file.
Yes, this works as you expect. What it lacks are three big things; making it easy to see what your direct dependencies are, separating dev dependencies from prod dependencies, and an easy way to update a dependency while resolving all transitive dependencies.
There are other shortcomings, but those are the big ones.
If you have a working setup and especially a strategy for updating routinely (e.g. using pip-tools) and pinning dependencies with hashes, there’s less reason to change. The biggest reason I’d give is consistency and ease of adoption, which might be coming down to “do you spend much time on-boarding new developers?” or “are you supporting an open source community where this causes friction?”. If you aren’t spending much time on it, perhaps try it on new projects to see what people think in a low-friction situation - in my experience that’s basically been “I like spending less time on tool support”.
They either haven't experience the pain or are oblivious to it. Pip old resolver is borked[0]:
> [The new resolver] will reduce inconsistency: it will no longer install a combination of packages that is mutually inconsistent. At the moment, it is possible for pip to install a package which does not satisfy the declared requirements of another installed package. For example, right now, pip install "six<1.12" "virtualenv==20.0.2" does the wrong thing, “successfully” installing six==1.11, even though virtualenv==20.0.2 requires six>=1.12.0,<2 (defined here). The new resolver would, instead, outright reject installing anything if it got that input.
I haven’t found a compelling need to switch to Poetry, but independent experimentation and competition can be good thing. I wouldn’t be surprised if the new pip resolver was partly inspired by Poetry, similar to how some NPM improvements were motivated by Yarn. [1]
I was one of those setup.py + requirements.txt (generated by pip-compile).
Though, poetry is actually quite good. There are still some things that I wish it had, like plugin support (for example I really miss setuptools_scm) or being able to use it for C packages.
But if your code is pure python it is great from my experience. The dependency resolver is especially good.
beware of zealot geeks bearing gifts. if your environment is currently working fine and you are only interested in running one version of python and perhaps experimenting with a later one then venv + pip is all you need, with some wrapper scripts as you say to make it ergonomic (to set the PYTHON* environment variables for your project, for example)
> Constraints files are requirements files that only control which version of a requirement is installed, not whether it is installed or not. Their syntax and contents is nearly identical to Requirements Files. There is one key difference: Including a package in a constraints file does not trigger installation of the package.
ERROR: You must give at least one requirement to install (see "pip help install")
So it seems like a strange choice of usage example. You have to provide both requirements and constraints for it to do anything useful (applying the version constraints to the requirements and their dependencies).
poetry has several advantages which I can no longer live without.
1. Packages are downloaded in parallel. This means dramatically quicker dependency resolution and download times.
2. Packages can be separated for development versions production environments.
3. Poetry only pins the packages you actually care about, unlike a pip freeze. One application I work on has 15 dependencies, which yields a little over 115 packages download. pip freeze makes it impossible to track actual dependencies, whereas poetry tracks my dependencies - and the non-pinned packages are in the poetry.lock file.
I was happy with Pip until I spent time in the NPM/Yarn world. Frustrated with Pip I switched half of our projects to Pipenv. However, I found that it struggled to resolve dependencies. Poetry works like a dream now, and life is so much easy now that we have switched all our projects to it.
The methodology of specifying your core dependencies, but also having locked version of your dependency's dependencies works really well.
AND you can easily export to requirements.txt if you prefer to use that in production.
It's a difference in the community's engineering values. JavaScript devs pull in a dependency for virtually everything, whereas Python distributes an extensive standard library with the language. It's less important that the same thing is hypothetically possible in both communities and more important that specific communities have chosen to use similar toolkits differently.
> I occasionally ask our principle / sr. Python engineers about this, and their response is always, "These things come and go, virtualenv/wrappers + pip + requirements.txt works fine - no need to look at anything else."
At the macro level, this seems like a bit of a self-fulfilling prophecy: if all the senior and principal engineers using Python don't care to take a look at something else, then it's not too surprising that new solutions don't end up sticking around. That isn't too say that there does necessarily need to be a change in the Python community's choice of package manager, but the rationale for not even considering looking at other options doesn't seem super compelling.
I personally rather avoid any per language package managers and opt for distro level package management (rpm/deb). Thats something that has bern there pretty much for ever & has all the hard & dirty depsolving issues long solved.
Also I can use it for all software regardless of what laguage it's written in, not to mention having to learn multiple half-baked package managers per laguage!
And lastly any non trivial software project will need dependencies outside of the world of its own package manager anyway, so why not go all in properly with rmp/deb & make everything easier for your users & your future self.
> Is there a reason to look at poetry if you've got the pip/virtualenv combination working fine?
You probably have a bunch of scripts that do what poetry does (either that, or you repeat the same commands over and over A LOT).
Switching to poetry might have some initial overhead, but a big upside is that you stop using custom, internal tooling, and use something industry-standard. Importantly, it makes it easier for you to understand external projects (since you're familiar with the standard tooling), and faster to onboard newcomers.
I’m curious how they ensure every developer is using the same versions of things, as well as how they manage dev dependencies and transitive dependencies.
pip + venv + requirements.txt doesn’t solve this out of the box while most languages have common tools that do. Either they’ve rolled their own way to manage these things, or they’re rolling the dice every time they deploy.
I don't really understand the "Dependencies" thing (Or the difference between dev dependencies/transitive dependencies)- we literally list every single module in our environment, and its version - It's not clear to me what other dependencies there could be in a python development environment.
I do note we have three requirements.txt files, a requirements.txt, requirements-test.txt, and a requirements-dev.txt. So, presumably there is a need for different requirements that you've identified that I don't understand. So there's that.
Dev dependencies: a library you need during development, but that isn’t needed in production. I think your -test and -dev are this, but it’s not clear how you are maintaining all of these, and building for prod.
This is the main complaint, most modern languages have a standard set of tools and flows for achieving this. Python doesn’t, and everyone does it a bit differently, and when starting a new project, you have to hand roll your own flow.
Or, use something like poetry but the python community as a whole doesn’t have a commonly used solution.
To clarify further - lockfile == reproducible builds without having to pip freeze every single dependency in the tree.
Fuzzy specs == effortless upgrades according to your risk tolerance for a given library (major version for boto3, minor version for pandas, something like that).
Poetry gets you the combination of the two: Let your dep versions float, and easily revert back to a previous deterministic build using the version-controlled lockfile if something breaks.
I'm wondering why we wouldn't want to pip freeze every single dependency in the tree. I'm looking at our requirements.txt, and everything is of the form:
vine==1.1.4
urllib3==1.25.10
wcwidth==0.1.7
I know that changing the versions of any of the underlying libraries is always a big conversation. (We're still locked in on pandas==0.23.4)
So, if I understand correctly, with Poetry, we might be able to say, "Keep Pandas at 0.23.4 and sqlalchemy at 1.2.7 but figure out all the other dependencies for us and load them."
Or, even better, "Keep Pandas at 0.23.x and sqlalchemy at 1.x.x but figure out all the other dependencies for us and load them."
The advantage here is security patches in underlying libraries come for free, while we focus on porting code for the really high-level + important Libraries which aren't always backwards compatible (Pandas)
Also - if we want to stick with specific versions, that's also possible with the lockfile - so every library will be exactly the same as the one in a build that worked.
The thing I don't understand - is when I do:
pip install pandas==0.23.4
It does load the dependencies. Indeed, if I create a requirements.txt that just has:
pandas==1.0.3
pytz==2020.1
six==1.14.0
Then pip install -r requirements.txt goes and does:
So - I'm still at a loss of the advantage of poetry vs pip install, given that pip loads dependencies as well - the advantage of "fuzzy specs" seems minimal given it's such a big deal to upgrade the big packages.
Nothing locks the version of numpy that you got there. If you run the same thing again in a few weeks you might get a completely different version, and have no way to revert to the version you had before.
Lots of people like pip-tools, it would feel a lot more lightweight and closer to pip than Poetry does.
Pipenv exists but... steer clear for a multitude of reasons.
Personally I like that Poetry centers itself around the pyproject.toml standard. I also think that its usability and the enthusiasm of both the maintainers of the users is going to really carry it more into the Python mainstream in the coming years.
Personally I'm very disappointed that we keep inventing new standards, like pyproject.toml when other things have like setup.cfg, existed for an extended period of time, works well, and is supported for reading and writing by the stdlib.
I see pyproject.toml as more like 'tox.ini is nice, it's good that so many tools use it, but it's really nothing to do with tox', and bringing them (and hopefully those declining tox too) and setup.cfg into one.
This comment breaks the site guidelines—would you mind reviewing them and sticking to the rules when posting here? Note this one:
"Please don't post insinuations about astroturfing, shilling, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data."
Here's plenty of past explanation of why we have this rule: https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme.... The short version is that internet users are too quick to reach for "astroturfing" etc. as an explanation for what they don't like, and it poisons the ecosystem when people post these accusations without evidence. Someone else having different views—different tastes in package managers, for example—does not clear the bar for evidence.
I assume you are referring to me? I can assure you, I have absolutely zero stake in Poetry. I'm not one of the developers, I'm not even on of the contributors (excluding opening a bug report or two).
If you don't use your own PyPi for a bunch of internal packages, it works great imo. One more wish item would be having absolute path dependencies instead of only relative path.
SSL for internal services seems to be becoming common these days and that's not a bad thing. Corporate/institutional information warfare is becoming a pretty big deal now (see all the US hospitals being infiltrated) so hardening on the inside to at least slow down an internal threat is not a bad thing at all.
Agreed. I was in the boat of if it's working, don't change it and just using pip and setuptools over the years.
Switched to poetry for one of my libraries as a test a few weeks ago, noticeably more painless!
One other thing I liked about it is the community, I distinctly remember digging into the setuptools source code once to find something that was undocumented. With poetry, that was one Discord message away.
Pip + setuptools is "it's working, no need to change" in the same way that "SVN works so why bother with Git?". It seems fine until you try the alternative, and then the thing it replaces just feels painful to use.
It certainly has issues of its own, and I'm sympathetic to people who prefer the UI of alternatives like Mercurial. But wow, I'd never go back to svn (or god forbid, cvs) ever again.
Migrating from poetry 1.0.10 to 1.1.x has unfortunately been major pain for us. Some of our dependencies do not yet fully support PEP517, which means we're stuck on 1.0.10 for now (e.g. mypy, which had a fix merged but not yet released, see https://github.com/python-poetry/poetry/issues/3094).
The poetry lock file also seems to get ignored for git dependencies. Say I depend on package Foo, on branch Bar, as a git dependency. At install time I get revision 1, which gets added to the lock file. Now let's say the head of branch Bar moves to revision 2. If I re-run poetry install, I now get revision 2, even though revision 1 is still mentioned in the lock file. The solution is simple: depend on revisions / tags, rather than on branches (and this sounds like good practice anyway), but it is surprising behavior.
And that's only relatively recently become not true of pip. Actually I'm not even sure it is always not true, maybe it depends on package manager (/packager)? python -m ensurepip still exists after all.
Particularly, it isn't an installation method someone writing a package manager should consider. Bootstrapping tends to be clumsy, but this is too much.
Interesting! A local company tried to hire me months ago to migrate a 2.6 legacy system to bleeding edge 2.7, I wonder if they have convinced someone yet.
I would have actually enjoyed that, the challenge being to use enough hacks and interfaces (six, __future__, etc) to make the subsequent, inevitable move to 3 a well-paid but trivial task...
You should ask to talk to a developer before accepting an obviously complicated migration. Ask about the numbers since they're not IP/Properitary. How many lines of code? How many functions? How many standard modules? How many custom libraries? Is everything Python? Are other languages being used too? That way you will know whether or not they are intentionally hiding the real issues from you to get you to do free work.
A bit off topic, but unit test coverage trivializes the work that goes into conducting a migration. Unit test coverage really only pertains to upgrades. A migration is not synonymous with an upgrade. Migrations are serious work that typically span entire ecosystems.
When I mentioned "the numbers" I wasn't hinting at unit test coverage. I was giving examples as to what the numbers could be. Obviously OP should ask lots of questions about all of the software running on a legacy kernel.
2.7 was essentially a bridge to Python 3. All features it has were backported from 3. This is why so many people before 2015 were saying that Python 3 doesn't provide any new features.
TBH I think the main thing about that release, for me, was the introduction of context managers ("with"). I don't remember any particular breakage but I wasn't doing sysadmin at the time, maybe there is something in the bowels of redhat that relied on some broken behaviour...
I actually rejected it because I thought it would be extremely boring after a few days, I did that for a couple of projects when I was writing desktop apps in Python 2.x and I found it very easy (2.7 to 3.x is not especially hard either in most cases).
I never understood the point of hiring a full team for at least 6 months just for that. The only explanation I can find is that the project manager was clueless about Python.
In my experience this is less to do with the fast pace of development, and more to do with the fact that many engineers use virtualenvs, and virtualenvs "vendor" the pip version so you need to update pip in each virtualenv you use. This is correct, but does lead to needing to upgrade so much.
No kidding. I pretty much don't even believe in the concept of a serious upgrade anymore, as far as pip goes anyway. As long as it still works when I run it, nobody is missing out on any serious upgrades as far as I'm concerned at this point.
That's a pretty bad mindset, new pip versions (20+) include support for binary distributions that saves a lot of time when installing pretty much anything that needs to be compiled.
Just run `pip install --upgrade pip` every month and you will be fine.
What the likely mean, is that newer versions of pip support newer platforms underpinning the binary packages.
Python binary packages are a bit of a kludge, they take a binary built on a base platform like CentOS 5, and use tools like patchelf to take all the libraries that they dynamically link against and re-wrap them in a new package.
Until about a year ago, CentOS 5 was the newest base platform available. So if your library needed a glibc feature or some other library feature from the past decade you were SoL.
I swear... it is so incredibly frustrating that I want to punch my computer when I see that message at this point. I think I had to finally shut it up via https://stackoverflow.com/a/64853362 to get my blood pressure to finally drop.
I guess exaggeration doesn't come across well over text. I was trying to get across that it's super irritating. No I don't actually want to punch my computer...
It refuses to install when there are conflicts now.
$ pip install "six<1.12" "virtualenv==20.0.2" -q
ERROR: Cannot install six<1.12 and virtualenv==20.0.2
because these package versions have conflicting
dependencies.
ERROR: ResolutionImpossible: for help visit
https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
So if I understand correctly, pip will now install the list of packages in the same order instead of choosing randomly, so that when there are version conflicts, you always get the same result?
I'm surprised I never ran into the issue, but I suppose it mainly show up if you have a large number of dependencies?
Is that what it does? My understanding is that it actually was analyzing dependencies before installing finding solution that satisfies all of them. Kind of like what poetry or zypper in SuSE does.
Yeah, I've seen that pattern around and also used it myself. Useful if you're using the disk a lot, for any purpose.
Let's say you have 200 JS libraries that you need to install dependencies, build and test for each commit. Making the JS dependencies install into a common cache directory into /dev/shm will easily speed up the install phase (and make the projects share dependencies cache fairly easily), and you could also make the test data (like fixtures) go through /dev/shm if you need it to.
It's a fairly straight-forward optimization if disk is a relevant part of your run time and you don't need terribly much of it. Probably not exactly common, because many people don't bother, but also not extraordinary.
I'd bet its becoming increasingly popular/convenient to do so with the growth of container-based test suites - really trivial to include a "postgres-ram" image for your test stack instead of one that uses disk, and you don't even need to know what tmpfs is to do so.
If the tooling you use lets you create tmpfs mounts for your container its very easy, yes. AFAIK those are RAM-backed by default. I think from inside a container it could be difficult to have the permissions necessary.
Pretty common if you run a lot of builds, have a big test suite or pay per minute.
In my environment unpacking a 200MB .tar.gz containing node_modules from previous runs takes half the time on RAM disk vs. SSD local storage. Python doesn't gain as much, packages tend to have fewer small files but there still is a small gain.
How much tests benefit from using a RAM disk depends heavily on what they do, I have seen 10% for React apps.
We also use other simple tricks like running databases with disabled fsync, mv instead of rm, cache whatever can be cached between CI runs, shallow checkouts and so on. Nothing ground breaking but it all adds up.
However, for pipelines which run for five minutes twice a day I don't even bother.
These days, “high amounts” is a lot more than most projects need - even ultra-portables have 8GB or more. Most projects don’t have unit test suites limited on file I/O measured in gigabytes unless they also have a budget.
Was very common in the spinning rust days to speed up io bound ops, and mentioned in yesterday’s Amiga thread as built in. Could still be useful if tests use a ton of disk.
Yikes! I'm surprised by the number of people who think temp RAM disk for QA testing came from using containers or the cloud. It was actually in use prior to that technology. And, it was popularized in the use of remote PXE installation which required modifications to the OS after the fact.
My contribution is correct. I'm not a parrot. Temp RAM disk started in Linux 2.3 (possibly older than that), it didn't come from builds inside containers or instances in the cloud.
Also maybe worth mentioning the PyPI team has 1-to-1 UX feedback/study for `pip` https://www.ei8fdb.org/thoughts/2020/03/pip-ux-study-recruit.... I'd be more interested opting in for open web survey (question and answer field) though. Nevertheless, great to see they're open to user feedback in forms other than Git issues.
Your package manager should be boring, extremely backward and forward compatible, and never broken. Experience has shown this not to be true for python. Several times over the years i’ve found myself, pinning, upgrading, downgrading, or otherwise juggling versions of setuptools and pip in order to work around some bug. Historically I have had far more problems with the machinery to install python packages I have had with all of the other python packages being installed combined, and that is absurd.
I don't disagree with your claim, but like all moral claims, I don't see how it's actually useful or actionable. Children should not get cancer. People should not drive without seatbelts. Our optic nerve exit the back of our eye instead of having to go through blind spot on our retina. And yet here we are.
The Python ecosystem is older than our modern conception of how a package manager should behave. Any new package manager must build on top of that chaotic history. Absent a time machine, your comment just seems like a vague lament about the state of the world.
Actionable: use backward-compatible behavior by default. Use opt-in for new features. Question: is the new resolver opt-in or is it enabled by default in the new pip version?
The action is simple, stop making breaking changes. There is frustration with breaking changes being in the history, but that is much less troubling than the reoccuring answer to the question “What is wrong with my code?” being “A new version of the package manager was released.”
It is also very possible to write code to completely work around old issues so the newest version of your package manager just works or at least outputs clear error messages with mitigation steps, but instead what you get is obscure stack traces when you run pip install.
Many of the breaking changes had to do with removing accidental features from the constraint file implementation. It seems a lot better to break that now rather than wait for people to accidentally start using and depending on those unintentional implementation quirks.
I tried the new version of pip and it's giving clear error messages when there are incompatibilities between versions. That should make your life easier.
This seems like an odd comment to make on a release which is making improvements, work done with the full knowledge that there’s room for improvement.
It’s also not universal: it’s been a decade since I’ve had to worry at all about that and even before then it was uncommon except on projects which weren’t trying to be very clever. If you had a clean setup this really hasn’t been the rule for a long time.
I mean, I do use a lot of C code and the things which break are external C level dependencies. You can blame Python for those but it won’t help fix them.
If a library doesn’t specify version compatibility and you don’t pin it but do choose to run an upgrade, why is that the installer’s problem? It’s like complaining about Toyota when you get a parking ticket.
See, this isn't necessary in other languages. All I want is an error, so that I can assess what the right thing to do is.
Honestly, I sometimes feel that people have stockholm syndrome with python packaging.
Nonetheless, I'm really happy about this release, as it will help to prevent these problems.
It's also worth noting that I didn't even learn Python for a long time because I kept breaking my install while trying to get started, so this stuff has real costs in terms of adoption.
The same problems happen in those languages — the underlying problem is someone not specifying version constraints they actually have. If you get more experience with those other languages you'll definitely encounter that with projects which aren't carefully following semantic versioning. In particular, this goes back to your previous confusion about C/C++ — if you're having a problem with those toolchains, the underlying problem is that they don't have a package manager at all and every downstream tool has to compensate for that.
The one part which is worse in Python than some of those other languages is that the default tool doesn't track all of the dependencies when you install a new package. If you're not using something like pipenv, poetry, etc. you can add "foo" to your requirements file and have it work now but break in the future if you aren't using a tool to pin specific versions (e.g. pip-tools). Some other tools like NPM or Cargo avoid that by default but that's certainly not true of all languages and it's something which is widely acknowledged as an area for improvement by the people recommending or working on the aforementioned tools.
So, my major comparator for ease of installation is a statistical programming language called R.
Both Python and R have equal numbers of C/C++ dependencies, but the only breakage I experience is with Python.
This is for a number of reasons:
1) R has CRAN (like CPAN, but for R). If your package does not build on the latest version of R, it becomes unavailable.
2) if your package has C++ dependencies, the installation will error out and say which headers its missing. This means that I can search for the header and install it.
3) Because of the removal of obsolete packages, dependencies just work and you can be assured that a package will build with a given version of R. Updates can be a problem, but they are opt-in at the package level.
The core issue with Python is the acceptance that different projects can have entirely different dependencies, which means that this breakage is relatively common.
Like, I build C/C++ on a semi-regular basis, and it can be super-painful, but Python is the only higher-level language that gives me this heartache.
Not to mention that if you install pip from the Linux repositories, you're setting yourself up for a world of pain (this actually stopped me from learning Python for a number of years).
To be fair, I mostly use conda for C/C++ dependencies as it works much better, but lots of packages are still only available through pip, which means I can't avoid the problem.
To reiterate, I am super happy with this release of pip, as it will make my life much better. It doesn't go far enough, but it is much, much better than what existed before.
Finally, Python packaging is a horrible wart on an otherwise good language, and I'm probably going to argue against using it in the future because of this.
> If your package does not build on the latest version of R, it becomes unavailable.
Wow. This would definitely make packaging much much easier, and it’s so cool R is able to do this.
I can’t imagine how angry people would get if Python ever goes there though. Each community has different tendencies, and one of Python’s I’ve observed is people really get pissed when you force them to upgrade anything.
I take it you're doing a lot of greenfield projects with the latest versions of Python then. I've encountered issues with Setuptools as recently as ~3 months ago:
The entire Python ecosystem is like this. Different conflicting versions of the language, different package managers broken in different ways, etc. It’s so bad that the core user base of data scientists usually just use machines that have all their libraries pre installed.
Contrast this with a language like Rust or JS, where you run one command once to install everything
I don't know Rust well enough to comment, but doing a lot of development in both JS and Python, JS is no cakewalk either. I'd say I deal with at least as many packaging/versioning issues on JS as I do on Python.
Another issue is the two mutually incompatible ecosystems of async vs. non-async Python libraries and frameworks. You now have these entire mirror universes of code, with "[X]-async" versions of existing synchronous libraries, and new async frameworks which require using these libraries. The years of work that've gone into non-async libraries are now basically all for naught, with lots of people reinventing the wheel for every possible thing that interacts with a network. (I'm also not a fan of async/await in general.)
I'm a big Python fan; it was my first language and still by far my most-used and the one I know best. But the "developer experience" is undoubtedly a giant mess compared to something like Go or Rust. (Rust also appears to be having some growing pains related to async, but everything else seems way more solid.)
Despite Python's catchphrase "there should be one, and preferably only one, way to do it", Go really embodies this across every dimension. One way to install the runtime, one way to format, one way to package, one way to do concurrency, one way to test, one way to write most things. And every version is almost completely backwards compatible.
It's kind of crazy that a language that (justifiably) prides itself in its programming simplicity also makes you deal with tangled messes like https://xkcd.com/1987/
My one solace is that some all-in-one third-party tooling is finally close to the level of convenience you get with Rust. pyenv and poetry save a ton of headache when it comes to managing Python versions and packages, and "just work" in a pretty similar way as rustup and cargo.
> The years of work that've gone into non-async libraries are now basically all for naught,
That's fud. Non-async code lives in, is maintained, and is not affected by the async alternatives poping up.
We had libraries based on twisted for over a decade, which were basically in the same position, but you're not mentioning those?
> It's kind of crazy that a language that (justifiably) prides itself in its programming simplicity also makes you deal with tangled messes
That comic is satire with many paths which don't really exist. Reality is much simpler is you're explicit about which version you're using and use a single source rather than OS, homebrew, Anaconda and upstream release at the same time. With one version manager and virtual environment per project everything works almost exactly like in Rust.
You're right, "all for naught" was going too far and I apologize for the hyperbole, but it's kind of like a blockchain fork: most new projects are gradually moving to the new async side of things, and this new mirror universe is being built at an accelerating pace while the previous fork (synchronous libraries) is decelerating.
In 5, 10, 15 years, a lot of code that might have otherwise still had some utility may no longer be practical. A lot of such code might have become obsolete after that long anyway, but some of it might be used far less than it could've been solely because it's outside of the future async status quo.
There are some awesome Perl projects out there, and their work definitely wasn't for naught, but it's hard to justify using them for anything serious in 2020, kind of like how it might end up being hard to justify using synchronous Python in 2025 or 2030.
(Not in the sense of running a standalone application, in which case who cares what it's written with, but in terms of something that may require some kind of continuous development or interfacing with other code.)
>We had libraries based on twisted for over a decade, which were basically in the same position, but you're not mentioning those?
With Twisted, you need to swap out certain network calls with Twisted's own APIs to use its full feature set, but it has an easy way to defer existing synchronous/blocking code: https://twistedmatrix.com/documents/12.2.0/core/howto/gendef... There's no way to do something like this with async/await, besides some hacky third-party attempts, as far as I know.
But, yes, I'm not a huge fan of Twisted's model, either. I was thinking more along the lines of a green thread model like gevent, which, although controversial, doesn't require you to change a single line of code beyond a monkey patch line at the top of the file. I'm not really swayed by Twisted's creator's anti-green thread manifesto (https://glyph.twistedmatrix.com/2014/02/unyielding.html).
And any language with a baked-in green thread model like Go also avoids this problem. Synchronous code can be run asynchronously, or synchronously, with no red/blue barrier.
>That comic is satire with many paths which don't really exist.
Yes, of course it's hyperbole, but it's not that far off, in my opinion. Now that I use pyenv for everything I no longer run into any issues like these, but before, it could sometimes become a nightmare.
I love tools like pyenv and poetry and they make Python way easier to deal with (it's still my favorite language, for sure), but part of the point of this thread is that it would be nice if universal tools like these were baked into the actual release like how Rust's and Go's tools are baked into theirs.
Tons of people still aren't aware that things like pyenv or poetry or Anaconda exist, or that they should consider trying them. Or they want to try them but are dissuaded due to work policy about changing things or using new third-party tools, or even any kind of third-party tools. Someone else in this thread brought this up here: https://news.ycombinator.com/item?id=25255536
> My understanding is Twisted doesn't suffer from the so-called "what color is your function problem"
It really does. The link you mentioned shows how to delegate some sync work to a thread to make it play nicely, but you wouldn't want to do it in each request for example. Twisted is its own world and either you're in or you're out. The async/await markers are much more generic in this case.
> There's no way to do something like this with async/await, besides some hacky third-party attempts, as far as I know.
Spawn a thread, share a queue, send the sync call you want to do, await on a queue read. It's effectively the same thing twisted is doing there.
The JS ecosystem has just as many "issues" if you look at it objectively. Don't mean to be funny or lazy, but one need only look at this SO answer to see how weird/fragmented the JS package system is (the answer has a Table of Contents).
I’m fairly certain that yum and apt meet these requirements for “traditional” package managers, I know Maven does as well.
I think it depends a lot upon the culture of the ecosystem, maven being fairly conservative is a natural consequence of Java being very enterprise focused.
That's easy to disprove. See the bugtrackers for those projects. They're not bug free. See why "aptitude" was introduced to cover "apt"s failures. Also one of the reasons we had the yum to dnf migration is to replace the worse resolver in yum with a sat-solver in dnf.
but python has been around for a while, I feel the frustration comes from the fact that other platforms seem to have managed to get a single working tool much earlier.
It's mind boggling to me that such a popular, mature, and beloved language has such a terrible package manager. Pip is seemingly broken more times than not. Deploying anything is a nightmare unless you use a virtual environment, and even when you do you still gotta cross your fingers and hope pip doesn't decide to break at some point in the future.
Maybe I am weird, but I've had far more problems with anaconda and its package management than I ever have with pip. Conda routinely takes hours to resolve whatever unholy issues it's running into sometimes
Java - all you have to do is mvn install clean build -> Maven is amazing at dependency management. Same in JS - npm install and npm start works like a charm.
what about version management when conflict happens in dependencies that are installed vs needed. Admit Python is a big mess. That is why people recommend installing within another weird software called conda environment vs directly.
Corner case, and a tad overblown from an "academic" perspective if you ask me - I've been doing day job (and after-hours hobby with ML/CV) development in python for the last 15 years and had that issue maybe 2-3 times.
On the other hand, with my very limited exposure to my team's JS/TS FE components, I've almost each time encountered issues with the supposedly easy "npm install" command. Broken NPM installs, cryptic error messages (ok get this one on Python too tbf), lingering version lock files, global vs local package issues, broken nodejs installs, node_modules being committed, node_modules being un-deletable, etc. Anecdotally, I'd say my experience on Python packaging and management has been worlds-better than for other languages.
Python has it's issues sure, but from my point of view, packaging is not a big one in comparison to other langs.
Have you tried PHP? It has an amazing package manager called 'composer'. The current release is popular and super amazing (runs 1/2 the internet). Python is a terrible divided (2 or 3?) and gross language from 1991.
Seen a few mentions of poetry. Not many for pip-tools which has been around longer, is less opinionated and has many of the same benefits https://github.com/jazzband/pip-tools
IME, pip and its inclusion in python installations made a great and very positive difference for using Python on Windows: before, third-party installations mostly (sic) didn’t succeed; after, they almost always succeed. I’m grateful.
Hynek, a CPython committer, had written a blog post[1] the state of Python application dependencies in 2018, updated in 2019 (no change in 2020, I asked). It was also surfaced on HN 3 times but did not get much attention[2]
Conda, the package manager that applies a bunch of weird patches to packages that sometimes leave them broken, has no lock file or reproducibility built in and suffers from the same issue when using “pip:” sections in your environment.yml?
most recently (october?) they vendored some package that was a system dependency before, and it broke the vendored versions of pip (eg. on debian).
I get, “not their fault” debian goes and modifies packages... but from a user perspective: it broke.
I would say my experience is roughly on every six months something to do with pip breaks for me... but I really cant be bothered trying to keep track of it.
I just try to avoid using it now. Down vote all you like, I don't care. Pip has broken my CI enough times its lost any good will it ever had with me.
it mostly breaks CIs or docker builds if you don't hardwire the pip version, because they tend to use latest pip by default..., It's mostly not directly pip's fault, but it can get annoying.
I remember issues when they changed their caching mechanism, or when a couple of libraries I was using were importing internal stuff from pip which got changed in a new version.
Also some packages for some reason need to be installed in the correct order and it's not immediately clear until pip tweaks their installation procedure.
I remember in an interview a developer had asked me to resolve the very same problem. And, I actually at the time never heard of it. He said to me that he knew I didn't have a lot of experience using Docker.
Conda is a huge dependency to support out of superstition, and there’s no guarantee that the sloppy practices which break when pip updates won’t happen there - if you have developers doing things like not testing upgrades or importing private modules from pip that’s going to break no matter which tool does the upgrade.
Sadly, pip 20.3 seems to have broken docker builds in one of my projects. The symptom is that the pip install seems to hang indefinitely (>40000 seconds). I switched back to 20.2 for now.
I prefer this resolver to `pip freeze` type pinning for dependency pull safety. Pip freeze makes it a nightmare to remove old packages if you have hundreds of packages frozen.
- fluid ones, where you specify immediate dependencies of your application with version ranges (typically versions that are api compatible with your app)
- locked versions (this is what requirements.txt supposed to be)
You get can get this kind of behavior if you define packages in setup.cfg in install_requires and then use pip-compile (from pip-tools) to generate requirements.txt based on it. pip-sync can then synchronize packages to requirements.txt.
Alternatively you could just use poetry which does all of this with a nicer interface.
setup.py (especially if you use declarative setup.cfg) works quite well, you can then use pip-tools to generate requirements.txt which then acts like a lock file.
Frankly I would still be using it if it wasn't that PyPA really trying hard to kill it.
This forced me to try poetry though and is quite decent frankly. I wish it would support building C packages though and I'm missing plugins like setuptools_scm which generates package version from SCM (e.g. git) tags.