If I see some JS, Go, or Rust code online I know I can probably get it running on my machine in less than 5 min. Most of the time, it's a ‘git clone’ and a 'yarn' | 'go install' | 'cargo run', and it just works.
With python, it feels like half the time I don't even have the right version of python installed, or it’s somehow not on the right path. And once I actually get to installing dependencies, there are often very opaque errors. (The last 2 years on M1 were really rough)
Setting up Pytorch or Tensorflow + CUDA is a nightmare I've experienced many times.
Having so many ways to manage packages is especially harmful for python because many of those writing python are not professional software engineers, but academics and researchers. If they write something that needs, for example, CUDA 10.2, Python 3.6, and a bunch of C audio drivers - good luck getting that code to work in less than a week. They aren’t writing install scripts, or testing their code on different platforms, and the python ecosystem makes the whole process worse by providing 15 ways of doing basically the same thing.
My proposal:
- Make poetry part of pip
- Make local installation the default (must pass -g for global)
- Provide official first party tooling for starting a new package
- Provide official first party tooling for migrating old dependency setups to the new standard
As a user of an application: `npm install` just works (same with `cargo build`). For Python, I’ll probably do `python -m venv env; . env/bin/activate` and then, well, it’s probably `pip install -r requirements.txt`, but sometimes it’ll be other things, and there are just too many options. I may well add --ignore-installed and use some packages installed locally of potentially different versions, e.g. for setting up Stable Diffusion recently (first time I’ve ever used the dGPU on my laptop) I wanted to use the Arch Linux packages for PyTorch and the likes.
For managing dependencies in an existing library or application (as distinct from starting from scratch, where you’ll have to make several extra choices in Python, and where npm is badly messed up for libraries), both npm and Python are generally fairly decent, but the whole you-can-only-have-one-version-of-a-library thing from Python lends itself more to insurmountable problems.
My personal background: many years of experience with both npm and Python, but I’ve never done all that much packaging in either. (Also many years of Rust, and I have done packaging there, and it’s so much easier than both.)
> With python it always has been pip install package and we are done.
The worst I encountered was a somewhat lengthy manual to build a project (cannot remember what it was), and not only did you have to manually install all required dependencies -- npm automates this by having an npm-readable list in package.json -- but after running all those commands, the last note said something like "oh BTW, whenever we wrote "pip" above we actually meant "pip3", use that instead, and if you did it wrong, there's no way to undo it."
Once I switched to pnpm many of those problems went away for me. I manage a slew of js/ts monorepos and pnpm is a god send, not because of performance or workspaces, but because things get resolved sanely.
I used to feel this way in the JS ecosystem. It's been quite a while since I've encountered something insurmountable (unless it's something like Bit using bespoke packaging).
Libraries that use native code in JS or C-bindings in Go are equally annoying to get going if the documentation from the author is sub-par. In the JS world you'll get pages and pages of node-gyp errors that are completely incomprehensible until you realize the author never mentioned that you needed a half dozen little development library dependencies installed. Native C/C++ library code interfacing just sucks in general because there is zero package or dependency management in that world too.
I think some of the pain has been Apple's fault. Requiring conda for some official packages means you always have two competing ecosystems on one machine - which is just asking for pain.
What packages are you referring to? I’ve been doing professional Python dev on a Mac for the past three years and have never had a reason to use conda, so I’m curious what I’m missing.
It’s only the last 5 years, give or take, that you can install C-compiled packages with pip. When I started out, I had to apt/port/pacman install packages, then run pip install (or python setup.py even) to install dependencies. It’s hard for newer python converts to even comprehend the pain it used to be. Conda came with the compiled package concept, and pip came after. Dealing with dependencies these days is a breeze, relatively speaking. On windows you even needed a visual studio license, just to pip install numpy!
Just because the egg/wheel format existed, didn't mean that they where actually used and actually worked. Being able to reliable pip install packages like numpy and gdal only started 3-4 years ago. There is a reason that Christoph Gohlke's python package site exists and was popular.
I started using numpy around 2007 and didn’t see reliable binary installs with good numerical performance until I discovered anaconda, much later (2016?). Maybe I hung out with the wrong people, or libraries. Some libraries did not have compiled wheels a few years ago, and M1 macs still run into the very occasional issue.
You’re running install-time code if you don’t, so I disagree. The wheel format is better suited even for pure python distribution. You can read more details on that site.
Conda environments are the most isolated from the host os, outside of using docker or a vm. They’re also the most heavyweight. As a package manager, you can install a lot of non-python stuff. It’s more similar to apt or yum than pip. You can even do things like run bash on windows with m2, which I usually find preferable to wsl.
pyarrow for example, if you don't have a wheel then good luck building one without using conda. It's not impossible but the developers don't really support it.
Official dependency and build tooling is not properly geared for C extensions. You'll be looking at compiler errors to figure out what dependencies you are missing.
Conda ships Intel MKL for linear algebra, too, meaning numpy.dot (matrix multiply), scipy, fft, solvers, … run 2-10x faster. Well, used to, I think the open source alternatives have improved since.
Installing Python only applications is trivial. What you're complaining about is all the missing code in other languages which isn't controlled by Python and depends on the OS to provide.
This is why we created Linux distributions in the first place. It is not the place of every language to reinvent the wheel - poorly.
I wish pip had some package deduplication implemented. Even some basic local environments have >100MB of dependencies. ML environments go into the gigabytes range from what I remember.
Cargo allows you to share a single package directory. Having a single site-packages and the ability to import a particular version (or latest by default) would solve this.
So if I have environments A, B, and C and each has a dependency ML-package-1.2 that's 100 MB, it means there's one copy of it in each environment? Meaning 3x100 MB?
By default, yes. It is possible to install ML-package-1.2 into your 'base' python and then have virtual environments A,B and C all use that instead of installing their own copies. However this is generally not considered best practice.
As a senior dev who is just starting with python, I just wish python makes pip being able to handle everything nicely. Hate seeing a bunch of tuts saying that I should use other tools (like poetry or whatever) because pip doesn't handle X use case.
I completely agree with your feeling. I been working with js, php, ruby for years, and packaging is pretty straightforward in every one of them. And the lang versioning works for every one of them (php is the most annoying, yeah, but even on php is easier than in python). Ruby has several alternatives like rbenv or rvm, but any of them works has every feature 99,9% of developers needs, and they work just fine. I want that in python ):
For all the complaints I hear about JS/npm, it's been so much less of a hassle than Gradle configs, Gemfiles, Composer files, and whatever python has going on.
I don't know how I'd feel about poetry being part of pip. Didn't poetry just push out a change where they deciding randomly failing 5% of the time while running in a CI env was a good idea?
The Python ecosystem is a bit of a disaster. There are many useful packages and programs written in Python, but they're legit scary to install. I always search for Go or JS alternatives if they're available to avoid all the python headaches.
Every time I need to interact with Python, I find myself banging my head trying to figure out eggs vs wheels vs distutils vs setuptools vs easyinstall vs pip vs pip3 vs /usr/bin/pip3 vs /opt/homebrew/bin/pip3 vs pip3.7 vs pip3.8 vs pipx vs pip-tools vs poetry vs pyenv vs pipenv vs virtualenv vs venv vs conda vs anaconda vs miniconda.
After working with Java for over a year professionally, I really appreciate its dependency management system(Maven or Gradle). Whereas with Python it is always a mess(Poetry looks promising though).
I’m curious because this gets repeated a lot by many people. What specific messes do you get into?
I’m asking because my experience with python these days is always just doing “python -m venv .venv” activating it and then using “pip install -r requirements.txt”
To update dependencies I do “pip install —-upgrade dep” run tests and then go “pip freeze > requirements.txt” this never fails me. Though sometimes updating dependencies does make tests fail of cause, but that’s the fault of the individual packages not the packaging system. Even so I’d say even that is rare for me these days.
I know the might only work for 95% of the workflows out there, but I’m very curious as to what specific messes the last 5% end up struggling with and what makes people like you feel that it’s always a mess and not just “sometimes it gets messy” etc.
> After working with Java for over a year professionally, I really appreciate its dependency management system(Maven or Gradle).
Personally, it feels better in some ways and worse in others.
The whole pom.xml approach/format that Maven uses seems decent, all the way up to specifying which registries or mirrors you want to use (important if you have something like Nexus). Although publishing your own package needs additional configuration, which may mean putting credentials in text files, though thankfully this can be a temporary step in the CI pipeline.
That said, personally I almost prefer the node_modules approach to dependencies that Node uses (and I guess virtualenv to a degree), given that (at least usually) everything a particular project needs can easily be in a self-contained folder. The shared .m2 cache isn't a bad idea, it can just make cleaning it periodically/for particular projects kind of impossible, which is a shame.
I think one of the aspects that make dependencies better in JVM land is the fact that you oftentimes compile everything your app needs (perhaps without the JVM, though) into a .jar file or something similar, which can then be deployed. Personally, I think that that's one of the few good approaches, at least for business software, which is also why I like Go and .NET when similar deployments are possible. JVM already provides pretty much everything else you might need runtime-wise and you don't find yourself faffing about with system packages, like DB drivers.
That said, what I really dislike about the JVM ecosystem and frameworks like Spring, is the reliance on dynamic loading of classes and the huge amounts of reflection-related code that is put into apps. It's gotten to the point where even if your code has no warnings and actually compiles, it might still easily fail at runtime. Especially once you run into issues where dependencies have different versions of the same package that they need themselves, or alternatively your code doesn't run into the annotations/configuration that it needs.
Thankfully Spring Boot seems like a (decent) step forwards and helps you avoid some of the XML hell, but there is still definitely lots of historical baggage to deal with.
Personally, I like Python because of how easy it is to work with once things actually work and its relatively simplistic nature and rich ecosystem... but package management? I agree that it could definitely use some work. Then again, personally I just largely have stuck with the boring setup of something like pip/virtualenv inside of containers, or whatever is the most popular/widespread at any given moment.
Of course, Python is no exception here, trying to work with old Ruby or Node projects also sometimes has issues with getting things up and running. Personally I feel that the more dependencies you have, the harder it will be to keep your application up and running, and later update it (for example, even in regards to front end, consider React + lots of additional libraries vs something like Angular which has more functionality out of the box, even if more tightly coupled).
With python, it feels like half the time I don't even have the right version of python installed, or it’s somehow not on the right path. And once I actually get to installing dependencies, there are often very opaque errors. (The last 2 years on M1 were really rough)
Setting up Pytorch or Tensorflow + CUDA is a nightmare I've experienced many times.
Having so many ways to manage packages is especially harmful for python because many of those writing python are not professional software engineers, but academics and researchers. If they write something that needs, for example, CUDA 10.2, Python 3.6, and a bunch of C audio drivers - good luck getting that code to work in less than a week. They aren’t writing install scripts, or testing their code on different platforms, and the python ecosystem makes the whole process worse by providing 15 ways of doing basically the same thing.
My proposal:
- Make poetry part of pip
- Make local installation the default (must pass -g for global)
- Provide official first party tooling for starting a new package
- Provide official first party tooling for migrating old dependency setups to the new standard
edit: fmt