Hacker News new | past | comments | ask | show | jobs | submit login
The Many Layers of Packaging: Why PyPI Isn't an App Store (sedimental.org)
144 points by mhashemi on May 10, 2017 | hide | past | favorite | 32 comments



> PyPI, pip, wheels, and the underlying setuptools machinations are all designed for libraries. Code for developer reuse.

Many Python applications seem to disagree with this point. The one I most recently deployed, Mayan EDMS[0], includes installing the application using pip.

On a side note, one thing that's always driven me nuts about having multiple package managers is that they don't talk to each other. If I install a system-wide Python library using pip, the system package manager isn't aware of it and will try to install the vendor-provided, and usually older, version to satisfy a dependency.

Likewise, there's no way for pip to ask the underlying OS to install some dependencies - just look at the packages you have to install to get Mayan EDMS running. This isn't as simple as asking for a given package name, you have to ask for the name that the underlying OS/package manager knows that package by. For example, "apt-get install postgresql" might suffice for Debian, but on FreeBSD you might need something like "cd /usr/ports/database/postgresql96; make; make install".

I'm not singling out Mayan EDMS here, it just happens to be the most recent bigish Python application I've installed so most of what I went through is fresh in my mind.

[0] https://mayan.readthedocs.io/en/latest/topics/deploying.html...


Absolutely! The post mentions a little bit how pip is definitely used for distributing command-line applications, from fab to pex to ansible. So much so that pipsi[0] is a thing. Even the tiny static site generator used to generate the post is a CLI app that leans on PyPI for distribution[1]. The origins and designs for the tools listed are primarily for libraries. Entrypoints[2] and other features came much later. Use of these techniques is overextended when developer distribute nontrivial code on PyPI with a non-developer audience. People get a buggy install and conclude that Python packaging is the problem :)

As for the package manager crosstalk, I feel you. That's definitely a source of the demand for tools that create self-contained artifacts.

[0]: https://github.com/mitsuhiko/pipsi [1]: https://pypi.python.org/pypi/chert [2]: http://setuptools.readthedocs.io/en/latest/setuptools.html#a...


Python/Ruby/Node all suffer from this - the OS stubbornly thinks it knows what's best. Here's what I do.

1. Ignore the OS when developing - let it manage its own dependencies for its own Python - fine. I use pyenv/rbenv/nvm to install my own dev. version of <language> and use isolated virtual environments for developing.

2. Deploy using Docker - again - ignoring whatever the OS has installed.


That's kind of by design. You're supposed to develop with virtualenv. Then, if someone packages your app for inclusion into the distro, it's on them to package the compatible dependencies and your app. (I mean, it would be nice if the devs helped here, but that's what distribution maintainers are for)

Not sure why you need docker though. It's fairly easy to distribute python libraries in a simple tarball.


If you want to do something like kubernetes you need to slap your python tar into a docker image.


Not sure I understand your point. Kubernetes is a service for managing containers. If you want kubernetes, of course you'll need containers of some sort. But not all apps need multi-host orchestration.


I love the fact they discuss hardware as a deployment strategy.

I considered the same thing at one point where I needed to suggest a bullet-proof way to run some Python code internally at a range of customer sites. A Raspberry Pi plugged into their network isn't a bad fix. The amount of time saved by pushing all configuration issues on to the internal IT department outweighs the hardware cost by a few significant digits.

I'm not sure the sheer cheapness of hardware has really sunk in yet. A lot of problems might be better solved by snail-mailing small computers to people.


"Network appliances" were a big market for this kind of thing. We've also found it worked for us in the point-of-sale business.


appliances are still a big market for really large companies with money to spend.

it's kind of a dream for these kinds of clients; the only thing you're responsible for is "plugging" it in. the vendor owns everything else (and gives you access to very little of it!). this is also great for the vendor because they can sell what essentially amounts to a server that does all of the magic at a ridiculous markup without much regard to the quality of the software underneath (unfortunately)

i cracked open a vendor appliance a few years ago. it was very much like opening an abandoned closet that's been untouched for 50 years. not pretty.


Well people used to mail USB keys and CD-ROMs and such. It seems less popular now that everyone has gigabit fibre.


Hey HN. Current conda lead developer here. Just want to highlight that transactional rollback for all disk mutating operations (create, install, update, delete) was a big feature in our latest release series. We've closed over 1,000 open issues in the last several weeks, in large part because of the problems that transactions solved.


Ohhhh snap! I'll update the article tonight!


"The mark of an experienced engineers is to work backwards from deployment, planning and designing for the reality of production environments."

Important point!


The software equivalent of limiting pre-fabricated building pieces, wind turbines, space shuttles etc. by the size of the largest bridge they must pass, or other constraints of rail or canal delivery.


You mean the size of the smallest bridge?


Oops, yes.


10x developers outsource production to Do(ker.


lovely article.

i don't agree with pip not being an app store. the best way to deploy something is to deploy onto the most convenient medium for your target audience. so if i need a cli tool or app that happens to have an API, pip or homebrew/chocolatey/yum/apt/etc are the best delivery mechanisms for it. however, i wouldn't tell our execs to pip install some app they need; i would self-contain everything, ship it with a pretty installer and host it on s3.

that being said, rpm's or deb's are a great way of packaging anything to *nix servers and have been for ages. yum and apt are pretty much guaranteed to be available and the rpm build file format is pretty powerful (albeit unfriendly to look at). we've deployed app artifacts via rpm with good success


Just in case you tl;dr

"A summary of our lessons along the way:

1. Language does not define packaging, environment does. Python is general-purpose, PyPI is not.

2. Application packaging must not be confused with library packaging. Python is for both, but pip is for libraries.

3. Self-contained artifacts are the key to repeatable deploys. Containment is a spectrum, from executable to installer to userspace image to virtual machine image to hardware.

4. "Containers" are not just one thing, let alone the only option."


> pip is for libraries

Since when?


If you accept the arguments in the article, since "forever". If you don't, maybe talk about why not instead of asking rhetorical questions?


pip & similar package managers are a trainwreck everywhere but on developer machines.


As someone who uses Python on a Windows box, I can count a few times where I ended up having to find binaries that were pre-compiled, to install a package from Pypi. While that isn't the case for the majority of the packages I've used, I certainly wouldn't say it's something an end-user should/would do.

Same goes for npm, too.


Huh. I'm not originally from the python community, but I had no trouble figuring out how to get reproducible installs out of pip using the requirements.txt idiom.


The issue with using pip and package managers in a similar vein is that they're not distributing complete packages, which usually means that stuff is compiled when installing some application. End users don't have compilers, and ideally you'd want to avoid them on servers as well. Often this also needs some or many OS-level depencies, like libraries and headers.

There are of course many other issues with pip and setup.py, far too many for a comment here.

But to give just one example; distutils/setuptools try to play god ^W build system with things like Extension. This works, badly. For example, most compilers that aren't GCC or MSVC compatible are not supported at all. Finding libraries? Not supported. Having proper dependency graph detection of files? Nope. If some "invisible" dependency of an output was modified, won't recompile. Ups. It pretty much behaves like a Makefile written by a naive person, just without the "-jN" flag.


Python does now have a distributable binary format: wheels (.whl files). A wheel contains everything already built for the target operating system/architecture, and installation consists solely of moving files into their destinations.

Also, I don't have the raw numbers to know for certain, but I suspect packages which have compiled extensions are more rare than pure-Python packages.


Now I know that the PyPA guys do a ton of work to make all this stuff less brittle, and indeed, Python packaging has improved a lot over the years. But overall it's still brittle, complex, difficult to debug and reproduce, and generally falls apart randomly with a frequency that does not inspire trust.


There's not really a lot you can do about this though, is there?

It's not that hard to test that things work, or to see the failures. Comments I've seen like 'there should be a server that installs every package on its own and makes sure its tests pass' have a bit of an issue: we already know the issues exist. Having a big list of them in one place doesn't really give us a lot more information.

What's the solution? If you need to compile native dependencies, how do you take PyPI and Python packaging in general and make those native dependencies able to be portably compiled, reliably, on any platform?

Things that are straight up unworkable:

* Bundling native dependencies as binaries * Specifying a single version of your native dependencies (the 'shrinkwrap' approach) and every library having its own


> Building native dependencies as binaries

Yep. http://conda.readthedocs.org


What? I said pip works great in my environment, not that it works great in yours. By all means, if you're concerned about those cases, use native packages or whatever you like.


I don't intend to contradict you, I'm just providing my angle on where the original comment came from.


That great! The PyPA works hard to make pip+pypi behave intuitively. That said, a lot of more applications need more machinery, yielding utilities like pip-tools[0] and conda.

[0]: https://github.com/jazzband/pip-tools




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: