Python Wheels Crosses 90%

haberman · on June 8, 2020

Today I spent at least an hour fighting with Python packaging. The more I think about it, the more I feel that self-contained static binaries are the way to go. Trying to load source files from all over the filesystem at runtime is hell. Or at least it's hell to debug when it goes wrong.

I would love to see a move towards "static" binaries that package everything together into a single, self-contained unit.

gorgoiler · on June 8, 2020

In the past week my coding colleague (whose age is only just into double digits) and I have had to:

* insert lines of code into asciidoctor Ruby to debug how the command line interface differs from the library interface;

* read through Raspberry Pi SenseHat code to figure out that the drivers won’t load via SSH (you have to plug in an HDMI cable!);

* rummage through Python’s PY Sequence List to answer the question “what kind of sort does Python actually use?”.

The problem with shipping binaries is it lowers the standard for being able to build from source.

If a package is easier to ship as a binary because the source is hard to distribute reliably, we give up hope of end users being able to modify and build their own versions.

This is obviously bad for freedom. It’s also hard for debugging and learning.

stinos · on June 8, 2020

If a package is easier to ship as a binary because the source is hard to distribute reliably, we give up hope of end users being able to modify and build their own versions

You might be giving up hope too soon, because what you describe has in my experience been the life story of source code from the start, to some extent. And I'm not blaming it for that. Distributing source code (well, not the distribution itself, but getting it to build) on all possible systems out there is hard and impossible to get 100% correct. It's simply inevitable to encounter systems on which it doesn't build because of their specific configuration, or user error.

Despite that, end users have been building their own versions and still do. Users who really want to don't let themselves hold back by a compiler error. Granted it's possibly that in a perfect world where everything would just work there would be somewhat more users doing that, but there's no such thing as a perfect world.

Moreover I do have the impression things are actually much better these days, mainly thanks to CI which makes it easy to get out of the bubble and build on multiple platforms. At least I don't have that feeling anymore of 2 decades ago where I sort of was prepared to waste hours getting anything to build which got handed to me on the internet. Still, it would be very interesting to go and see the percentage of projects out there which just build according to the instructions.

abathur · on June 8, 2020

You might find the Nix package manager fruitful. (Fair warning: Nix isn't "easy", but it is powerful. It takes some work to find your sea legs.)

It isn't always possible, but Nix places a lot of emphasis on building software from source (or downloading a pre-built copy from a binary cache), and it has fairly good support for overriding/customizing the expression that builds one or more pieces of software.

A few months ago I was trying to debug a test in the Oil shell project that broke only on macOS for very unclear reasons. I eventually figured out that, in a specific mode, bash (which both Nix and macOS use as sh by default) was filtering out a specific environment variable that this specific test was trying to set, before running a target program with a sh shebang. I only needed to add a few lines to the project's build expression to patch bash and use the patched version for the tests (all without affecting the system/user bash).

nh2 · on June 8, 2020

I expect Nix to become more popular due to its ability to build static binaries.

The key benefit that nixpkgs has over other Linux distributions' package sets on this topic are:

1. It allows you to _program_ your package set, thus adding static-linking overrides to 1000s of libraries/executables in a go, instead of having to modify each package manually.

2. It allows the end user to choose what linking they want, as opposed to the Linux distribution making that decision for them upfront.

3. It allows to do this from any Linux distribution.

kelnos · on June 8, 2020

> read through Raspberry Pi SenseHat code to figure out that the drivers won’t load via SSH (you have to plug in an HDMI cable!)

I'm really curious about this. Why would a driver care about whether you load it on a "real" tty vs. over the network? How does it even know?

gorgoiler · on June 8, 2020

I have to admit the facts known to me are simply: plugging in the HDMI cable stopped the bug.

We didn’t dig further than that, though the error was quite obscure until we read the source code, at which point we realized that the “FB” the error mentioned meant the Linux console framebuffer.

I assume the Python module for operating this device ships with a GUI, which is couple to some Linux framebuffer initialization code, which in turn only works if the framebuffer is enabled, which presumably it isn’t if the host boots without an HDMI device connected.

sigjuice · on June 8, 2020

I have a Raspberry Pi with a Sense Hat that has never had an HDMI cable plugged into it. I have written small programs with the Python module that operate the LED matrix that work fine AFAICT. The LED matrix also happens to be a framebuffer device, and it is possible that your error was about the LED matrix.

The LED matrix is an RGB565 framebuffer with the id "RPi-Sense FB". The appropriate device node can be written to as a standard file or mmap-ed.

https://www.raspberrypi.org/documentation/hardware/sense-hat...

executesorder66 · on June 8, 2020

You have a 10 year old coding colleague?

gorgoiler · on June 8, 2020

Not quite that young. I am a teacher with 85 teenagers under my wing, at the last count.

An important part of school is learning how to learn. It is a constant struggle with pupils to get them to actually think instead of googling for an answer.

Sometimes it feels a bit like I am a grumpy oldie telling them things like _in my day we had to look things up in books instead of using Google!_ but most of the time it’s valuable in and of itself to go through things from first principles.

(The sensehat bug was a case in point.)

As far as source code is concerned, it turned out to be quite tractable to follow the Python source code at least far enough to find the timsort.txt document.

If any of these kids go on to have technical careers, there will be plenty enough time for googling answers later on in life. Hopefully some of them will be writing the answers too.

deeringc · on June 8, 2020

On the flip side, it seems to me like having all of this friction may lose many folks early on. I absolutely agree that debugging and problem solving are core skills to master and fundamental for any programmer. As is understanding the system from first principles, and how all of the different sub systems interact. But in my opinion they shouldn't necessarily be the first thing someone encounters when they try to do X.

fnord123 · on June 8, 2020

tbh the only source of truth is the code. Documentation and comments are all lies! shakes fist

toyg · on June 8, 2020

It's a bit like tech support, where you have to assume the user is always lying.

maxbond · on June 8, 2020

I once got locked out of my Amazon account because I went through a weird flow where I ended up changing my email without clicking a confirmation link, and I made a typo, so it was changed to a nonexistant email.

No one in tech support believed me. No one was willing to help me. They told me to make a new account.

I've done my best not to buy from Amazon since.

Please don't be hostile to users.

toyg · on June 8, 2020

That's more fraud detection than tech-support, sadly. Even if they believed you, they might not have been authorized to help. You wouldn't want random people to hijack your account by crying "typo" to the support guys.

dragonsngoblins · on June 9, 2020

Surely they could revert it back to the previous email account though?

Possibly after you provide some kind of validation like the last few digits of credit card tied to that account or something?

toyg · on June 9, 2020

I'm almost sure Amazon used to allow you to log on with old email-password combinations, until they got flak for it. Not sure what they do now.

fnord123 · on June 8, 2020

I think it's a tongue in cheek way to say that they are teaching their 10 year old child to code and they have had to shave some painful yaks.

heavenlyblue · on June 8, 2020

The answer to “which sorting algorithm python uses” is the first result in Google. The third is official python docs.

gorgoiler · on June 8, 2020

The context was one where we were learning about how things work, rather than wanting to just know the answer per se.

I always _try_ to show how one can answer problems from first principles before resorting to “go look it up on stack overflow”.

It boosts ones sense of usefulness as a teacher if one can occasionally give more in depth answers as teachable moments.

For one thing, demonstrating deeper understand is how you score high marks in exams.

heavenlyblue · on June 8, 2020

I don't understand what you mean by "first principles".

Open up Python's source code and check - this is first principles-first hand.

As a matter of fact I am going to go as far as saying that reading source code is the _only_ way to actually be any good in software engineering because it's the only source of truth that never gets out of sync with the de-facto.

Did you expect to find a blog post about it (you can find it on the first page of Google search I mentioned)? Or a YouTube video (didn't pay attention to that)? Books are reasonably good but usually get out of sync quite fast.

And going back to first principles: the reason most of the literature may not mention what sorting algorithm Python uses is because a) it's irrelevant as a first principle knowledge b) it gets quickly out of sync.

dnadler · on June 8, 2020

I think you've misread the comment. You seem to both have the same definition of first principles.

yodelshady · on June 8, 2020

> read through Raspberry Pi SenseHat code to figure out that the drivers won’t load via SSH (you have to plug in an HDMI cable!);

What? I can assure you this is not true.

My R Pi has never touched an HDMI cable and runs a sense HAT just fine. What OS?

haberman · on June 8, 2020

I agree that being able to read source and modify the code for experimentation are important.

In my experience, the best way to read source code and experiment are to get the source from the upstream repo (eg. GitHub) and read/build from that. I do this with code written in C or C++ all the time.

The code is always out there (unless you're talking about proprietary software, which none of your examples above are).

sulZ · on June 8, 2020

> * read through Raspberry Pi SenseHat code to figure out that the drivers won’t load via SSH (you have to plug in an HDMI cable!);

I've suspected that for a long time without confirming it personally, so thank you for your time.

cowsandmilk · on June 8, 2020

> rummage through Python’s PY Sequence List to answer the question “what kind of sort does Python actually use?”.

If you needed to dig through source code to find out that python use Timsort, I genuinely am curious about how bad you are at search engines.

I believe in reading source code to answer some things, but that example is really odd for anyone who has researched sorting algorithms since Python literally invented a sort used by many other languages now.

gorgoiler · on June 8, 2020

Ahem, a more uncharitable person might retort with:

If you need a billion dollar search engine and an internet connection to make sense of the source code on your local machine, I am genuinely curious how bad you are at reading source code!

But as I said, that’s pretty uncharitable. Because of course I didn’t have the source code for Python locally.

I downloaded it after Googling “Python source code”.

cycomanic · on June 8, 2020

This reply made my day, thanks!

kidsil · on June 8, 2020

+ Negatively commenting on someone else's coding abilities.

+ Explaining something that is "obviously common knowledge".

+ Reducing motivation of participant to share their experience.

I think we have HN bingo right here.

FridgeSeal · on June 8, 2020

It needs “here’s why you should have used XYZ instead, obviously” to really round it out in my opinion.

snovv_crash · on June 8, 2020

Don't forget someone plugging their startup which solves this problem for the price of only 2 coffees per month.

badrequest · on June 8, 2020

Is it your experience that a significant portion of folks who write code in some form or fashion have "researched sorting algorithms?"

m4rtink · on June 8, 2020

Its fun and games until you find out one of the things you bundled into your static binary has a remote code execution CVE that is being actively exploited in the wild. Good luck tracking down all the artifacts and making sure they are rebuild with the patched version - till the time next CVE hits and you can do it all again for every static binary bundling the affected component.

closeparen · on June 8, 2020

Rolling out a new binary is a happy path process that gets used all the time; what's the problem here?

Spivak · on June 8, 2020

Yeah, I don't really understand the hate containers get for this. If the people who run your software aren't updating it you're screwed anyway. I promise you'll be way more disappointed if you assume that the system libs in the wild will be patched and up to date.

Like if building a pet app is your company/team's job then containers more than fine. If you're a distro maintainer that has thousands on thousands of packages then sharing libs starts to make a lot more sense.

cbmuser · on June 8, 2020

> Like if building a pet app is your company/team's job then containers more than fine. If you're a distro maintainer that has thousands on thousands of packages then sharing libs starts to make a lot more sense.

The problem is that distributions get their software from upstream developers. And if upstream decides to go static or use embedded code copies, the maintainers have much more work trying to untangle everything.

In my experience, packaging Go software for Linux distributions is very frustrating and tedious because of that.

Packaging Python or C/C++ software is much easier.

arp242 · on June 8, 2020

This is ironic, because Go in much easier to build than C or C++, especially since the introduction of Go modules.

Perhaps this "untangling" isn't the best idea, at least not always?

m4rtink · on June 8, 2020

Unbundling is often necessary due to these issues:

- the bundled things might have unclear licensing

- bundling stuff that is available in the repos is sub optimal, as the repo version will be updated for security issues while the bundled version will not

- you can use some mechanism to mark what stuff is bundled in a package, but then you need to make sure the bundled thing is patched and rebuild each thing bundling it, where with system wide dependency you just update that and you are done

arp242 · on June 8, 2020

It's pretty rare there are parts with "unclear" licensing. Does it happen? Sure. But "often"?

I'm fairly sure that almost all projects are receptive to updates for dependencies with security issues; I don't see why a distro needs to "untangle" anything for them.

> with system wide dependency you just update that and you are done

Unless the system-wide update won't work with your program.

The entire point of the Go tooling is that you can reliably and consistently build your software, and replacing random parts to create a weird unsupported (by the actual developers anyway) version is something the Go tooling was never designed to support: it goes exactly against what it was designed to do. Hence my previous comment: perhaps this isn't the best of ideas.

m4rtink · on June 8, 2020

This talk (from 2011 but still very relevant today) mentions some if the licensing and other issues package maintainers have to face:

https://www.youtube.com/watch?v=fRk97h1FLow

dgoldstein0 · on June 8, 2020

is it really that bad to distribute statically-linked go binaries in a distro? Or rather, perhaps the better question - what's the dynamic linking benefit that a distro can't live without?

Wondering since the application developers I know seem to prefer static linking.

cygx · on June 8, 2020

Security fixes, for one. It's more convenient to rebuild one shared library instead of having to rebuild all downstream consumers.

dgoldstein0 · on June 15, 2020

hm, fun. That seems understandable given the status quo.

I can't help but wonder if we'd all just benefit from better build systems, so that "rebuilding all downstream consumers" wouldn't be hard.

AnIdiotOnTheNet · on June 8, 2020

The just leave it out of your distribution's repo. If it is a static binary then it is already easy to distribute so no one needs the repo.

naniwaduni · on June 8, 2020

> If the people who run your software aren't updating it you're screwed anyway.

They're screwed. Hopefully you aren't.

Hopefully.

vmchale · on June 8, 2020

containers aren't for binaries/reproducible builds. Actually they're quite bad for it and make caching hard.

cbmuser · on June 8, 2020

The problem is that you need to keep track of all the binaries on your system which have embedded code copies.

hnarn · on June 8, 2020

This is a solved problem, its why apt and yum and all the other package managers exist. And/or use something like Ansible.

adrianN · on June 8, 2020

Maybe for software that runs on computers that you manage. But what about end-user software?

caymanjim · on June 8, 2020

Do you think it's easier for end-users to update Python packages on their own, or just install the latest version of the app that they know about?

dragonwriter · on June 8, 2020

Since either can be automated in the distributed software itself, I'd say they are both equally easy for the end-user, if the developer really cares.

drran · on June 8, 2020

For Linux, it's easier to just perform update. For half-developed OS'es, it's easier to download installer, then Next, Next, Agree, Next, No, Next, wait, Finish.

young_unixer · on June 8, 2020

The irony being that in the fully developed OS I experience:

- Outdated packages

- Having to build from source when the package is not in the repos

- Failed builds for no good reason

- Unability to have many versions of that same packages

- having to install log(n) build tools if I want to build n packages

- etc

while in half-developed OS'es everything just works (application-wise, at least)

spenczar5 · on June 8, 2020

Isn't that still broken if code is dynamically linked? If you don't control the end-user platform, you're out of luck anyway; static builds neither help nor hurt.

pas · on June 8, 2020

You can pre-build the static bin for the target platforms, you know exactly what are the bits in the blob on the user's box, and you can ship a binary diff and do incremental evergreen auto-updates. Probably faster and less brittle than testing/shipping and keeping Python up to date on the user's box.

pjc50 · on June 8, 2020

But this doesn't necessarily work in the "loose binding" world either - because people pin their dependencies to a specific version to avoid breakage.

There's a fundamental unresolvable tension in automatic update systems between "not updating breaks things" and "updating breaks things"; you cannot solve for both at once and either one will get users mad at you, although not upgrading at least gives a pushback argument. See how all this has played out with Windows and Mac updates, for example.

m4rtink · on June 8, 2020

Pinning dependencies to specific version is also bad - that's lazy engineering & instant technical debt.

You really should check what APIs your dependencies provide and how stable that API is, then set minimal dependency version. If you set the dependency to a specific version, you are creating a headache for anyone trying to use your software later on (by potentially forcing them to use oudated and insecure software) and for distro maintainers trying to keep distro software up to date and secure.

What if the dependency is so unstable that you have to pin the version or even use custom patched version? Well, thenaybe its not somethong you should be depending on & rather use something with less features but more stable. Othervise you are really being unresponsible - using quick convenience at the cost of long term usability of your software.

dgoldstein0 · on June 8, 2020

I'm not sure exactly what form of dependency pinning you are hating on - but if you are suggesting we should ditch dep-pinning altogether, I have to totally disagree. It's never a good experience to wake up one day and see that your code that worked yesterday, no longer does; or worse, that it works for you but other people start saying otherwise. Failing to pin deps (e.g. with a lockfile) makes it rather easy for these bugs to slip in - it's just one accidental backwards-incompatible change in a transitive dependency of yours away, and good luck figuring out how to reproduce the problem reliably.

m4rtink · on June 8, 2020

I should have mentioned that I'm seeing this from the point of view of a distro maintainer involved in Fedora.

The thing is that in general you don't want to have many version of libraries available in the distro in parallel as each has to be separately maintained, security patches applied, build issues fixed, etc., eating valuable maintainer time. Also in some cases libraries of different version can't coexist on a system cleanly.

Then imagine every piece of software just pins versions of their dependencies to a specific random version that happened to work at the time. To satisfy those arbitrary dependencies the distro would have to maintain all these versions at the same time, which is simply impossible resource vise, not to mention incredibly wasteful (both in maintainer time & system resources).

As for stuff breaking if you don't pin dependency version - well, distros have mechanisms to handle that. For example for Fedora, there is a stable release very 6 months & stable releases are not expected to get major changes in libraries, just bug fixes and smaller enhancements.

And at the same time there is a rolling version of Fedora called Rawhide, where all the latest package versions land and where integration issues are addressed. So any breakage would happen on Rawhide and be addressed by maintainers (of the library/software affected or both) long before a new stable release is cut from Rawhide and users will actually use it.

For an example I'm maintaining the PyOtherSide Qt 5/Python bindings on Fedora. A while ago the build failed due to Python being updated to 3.9. I reported the issue upstream, which quickly fixed it and I've built an updated version in Rawhide. All this long before a stable Fedora version will get Python 3.9, but I can be sure that when this happens, all will work fine.

pjc50 · on June 8, 2020

Yes - although from the developer perspective, they want to work cross-distro without having to test a big compatibility matrix, and generally use the language's package manager (which hopefully behaves the same across all platforms) rather than the distro's one.

Especially with Python, and especially with the 2/3 split, people got used to assuming that the distro version was something broken to work around (e.g. Redhat, OSX), and that all "real" work happened in one's local language-specific package cache or venv.

I'm increasingly of the opinion that it's a mistake for distros to ship Python or Ruby packages in their distro-specific package format, but I can also see that's going to be a holy war.

dgoldstein0 · on June 15, 2020

the compatibility matrix is exactly what I'm wondering about too. It's far easier to test against a single version of a dependency and deploy against it too.

I can't help but wonder if there's a better way that could scale to open source. I personally like monorepo-based development a lot - where you have one version of every library for the whole repo, and a total ordering on changesets (and a strong test suite to catch regressions). But organizing open source into a monorepo or even a virtual-monorepo seems tricky. Might be the sort of practice that can work well for companies but not so much when decision power is more distributed.

sergeykish · on June 8, 2020

As developer I use language specific packager (bundler/gems) and I know it well.

As Linux user who sometimes build packages (tinker around) I need all kind of dependencies - python, ruby, perl, haskell. It is much safer and faster to use distro packages.

Breaking changes on major version is awful for any consumer. Python 2/3 story is a shame. These can't be arguments against distro packages.

AnIdiotOnTheNet · on June 8, 2020

So if I understand you correctly, you're asking developers to change how they work in order to make package maintainers' lives easier while providing no benefit to either developer or end user. Is that correct?

m4rtink · on June 8, 2020

The benefit for developer is maintainers report compatibility issues before the developer hits them and in general take over a lot of the user support as well as the menial but necessary packaging tasks. For the user the benefit is they can actzally deploy the thing in a real production system (that is not the developrs laptop) with potentially clashing libraries and be reasonably sure it continues to be secure in the future by using up to date maintainer libraries, not some unstable release no longer getting any security fixes the developper pinned it to 5 years ago and never bothered to change.

AnIdiotOnTheNet · on June 8, 2020

On the other hand, inserting a volunteer middleman between the developer and the user cause extra unnecessary friction, delays, and has even been the cause of bugs [0], not to mention sometimes they explicitly ignore the wishes of the original developer.

[0] Remember when Debian generated predictable random numbers because a maintainer wanted valgrind to shut up?

m4rtink · on June 8, 2020

On the other hand, stuff like the NPM leftpad issue can't easily happen as the package maintainer "middle man" usually inspects any upstream updates for general sanity. And even if it got in to Fedora for example, it would most likely break only the integration rolling release called Rawhide, where it would be quickly discovered and fixed, never reaching the overwhelming majority of users that are on Fedora stable releases.

Now in comparison people using NPM just blindly pulled random stuff directly from upstream without anyone doing any sanity checking at all - no wonder one package vanishing made the whole thing fall over, often directly in production.

AnIdiotOnTheNet · on June 9, 2020

Are we talking about source-code packaging for developers or binary packaging for users? Because the left-pad scenario really doesn't apply in the latter case.

ClumsyPilot · on June 8, 2020

By that logic 90% of javascript ecosystem should never be used in production. Maybe that's even the correct conclusion?

m4rtink · on June 8, 2020

I think that's the correct conclusion. ;-)

mdoms · on June 8, 2020

There's no reason whatever binary packaging format you use can't store metadata on all of its internal dependencies. "Tracking down" vulnerable dependencies is almost trivial in this case.

sergeykish · on June 8, 2020

And this metadata could point on the region so it could be swapped easily. Oh wait..

earthboundkid · on June 8, 2020

What’s the scenario where dynamic libraries are regularly upgraded but static binaries are not? Just seems totally non-sequitur as a reason to prefer dynamic libraries.

m4rtink · on June 8, 2020

You mean like any normal Linux distro ? Dynamic libraries installed from packages in the repositories will get updated automatically while random static binaries downloaded from the internet or built from source will just sit there perfectly stable and unchanging and progressively more vulnerable, unless the user takes an action.

sixbrx · on June 8, 2020

That's not static vs. dynamic though, that's "in package manager system" vs not. If package managers managed static binaries, they'd all be updated too since the package manager knows all the dependencies.

sergeykish · on June 8, 2020

But what is the point of static compilation if maintainers update dependencies? Either dependencies defined by author or maintainers. Dynamic libraries just help to automate roll out.

And users would find outdated versions (maybe unmaintained). Current system update dependencies automatically or allow collaboration on life support (patched application, dependencies, config). It would be good to have alternative like container or virtual machine with snapshot from the days long past. But it should be clear this is not safe.

sergeykish · on June 8, 2020

Developer who missed CVE. Not anyone has fulltime job on his project. It could result on even more shaming of open source developers (if they freeze dependencies).

earthboundkid · on June 8, 2020

But in this scenario, how is the dynamic library going to be upgraded?

sergeykish · on June 8, 2020

* Developer who missed CVE fixed in his dependencies (if was not clear)

Em, like all other packages?

earthboundkid · on June 9, 2020

Say we have program X with dependency Y. X+Y is either dynamic or static. X can either have responsive maintainers or unresponsive maintainers. Y can either change to fix a bug or change to add a bug. (With Heartbleed, I remember our server was fine because we were on some ancient version of OpenSSL.)

Here are the scenarios:

- dynamic responsive remove bug: Positive/neutral. Team X would have done it anyway.

- dynamic unresponsive remove bug: Positive.

- dynamic responsive add bug: Negative. Team X will see the bug but only be able to passively warn users not to use Y version whatever.

- dynamic unresponsive add bug: Negative. Users will be impacted and have to get Y to fix the error.

- static responsive remove bug: Positive/neutral: Team X will incorporate the change from Y, although possibly somewhat slower (but safer).

- static unresponsive remove bug: Negative. Users will have to fork X or goad them into incorportating the fix.

- static responsive add bug: Positive. Users will not get the bad version of Y.

- static unresponsive add bug: Positive. Users will not get the bad version of Y.

Overall, dynamic is positive 1, neutral 1, negative 2, and static is positive 2, neutral 1, negative 1. Unless you can rule out Y adding bugs, static makes more sense. Dynamic is best if "unresponsive remove bug" is likely, but if X is unresponsive, maybe you should just leave X anyway.

sergeykish · on June 11, 2020

Sorry, this is offtopic. You have asked how is it possible - I've answered. Dynamic vs static discussed elsewhere in the thread, no silver bullet. I prefer to optimize against unresponsive author [0]. Chose your own distro

[0] https://news.ycombinator.com/item?id=23458020

zaphar · on June 8, 2020

This issue has different risk profiles based on the target user base.

If the install is on a consumer machine for regular usage then the right answer is shared libraries for the machine. It adds a lot of complexity for the packagers for that OS but you get a lot of safety for that consumer.

If the install is an inhouse app deploying to servers a company controls/rents whatever then the right answer is probably a static application. Consumers are more likely to want to chase the latest version of everything. Company developed software is far less likely to want that. The problem then becomes less about packaging and more about deploying fixes quickly. A static binary that is rebuilt, tested, and deployed is going to be a smoother path to fixing that error than figuring out how to deploy your new shared library and avoiding any issues in our production environment caused by clashes to that library. This is made more problematic by the likelihood that you will need multiple versions of that shared library with the fix to meet the needs of applications that can not yet be upgraded. Static binaries make things easier in this scenario and thus quicker to resolve.

marcinzm · on June 8, 2020

I mean, containers are basically just that and people seem to like them just fine.

ethbro · on June 8, 2020

People seem to like them just fine != People are patching CVEs

marcinzm · on June 8, 2020

That's a question of process and tooling. Plenty of software, OSS and proprietary, which scans containers for vulnerabilities and sends out alerts. There's also dependabot that automatically creates PRs when dependencies get new versions.

Sure people may not use the tools they have but then again people also don't upgrade their OS packages or maven packages. So for those people packaging everything doesn't hurt any more since they've got rampant CVEs already.

ClumsyPilot · on June 8, 2020

Exactly, container is a way to package all these disorganised, chaotic snowflake languages into something you can actually run in a standard way. It full fills the function of an executable, but at a higher level of abstraction. We've come full circle again.

vmchale · on June 8, 2020

Containers solve a different problem.

vmchale · on June 8, 2020

Some ecosystems allow you to do that, no?

est · on June 8, 2020

> Trying to load source files from all over the filesystem at runtime is hell

> "static" binaries that package everything together into a single, self-contained unit.

Ummm, except debugging "static" binaries without symbol file is hell.

If there's a .py source somewhere on the disk, I can at least try edit the source file from the package and debug internal private variables as last resort.

josefx · on June 8, 2020

Python can execute zipped files and attaching a zip to an executable is trivial. With a decent zip tool you also should be able to edit the zipped source files directly.

Spiritus · on June 8, 2020

I’m curious. Can you explain the problems you were having? And how they would’ve been solved by a static binary.

> Trying to load source files from all over the filesystem at runtime is hell.

This sounds like a mess unrelated to Python. Why are you trying to load source files from "all over the filesystem" at runtime?

> I would love to see a move towards "static" binaries that package everything together into a single, self-contained unit.

Compiling source files from all over the filesystem would be equally annoying.

PeterisP · on June 8, 2020

The python package requires a C extension that fails to compile because it requires some (non-python) dependency C library which it can't find (or use) despite it being installed to the system. Often because it expects version X but the system has installed version Y and I can't or don't want to install the particular version that this python package needs to compile its C code.

IshKebab · on June 8, 2020

Assuming you are genuinely asking... Python programs are almost always run from source, and they usually import third party libraries. Those libraries are installed by Pip... somewhere on your system. And then Python has to find it somehow at runtime. That relies on environment variables being set correctly, and works differently on different systems and is basically a huge mess even before you consider things like virtualenv and the fact that non-Linux systems usually have multiple copies of Python installed (and Python 2 and 3!).

It's basically a huge mess. There's even an XKCD about it.

Consider the alternative: Go compiles programs to a single statically linked executable with no dependencies. You literally just copy one file to your target machine and run it. It basically can't fail. That's part of the reason Go is so popular for server stuff.

And it's not just because Python isn't compiled. Other scripting languages handle this much better. Even JavaScript - for all the hate node_modules gets for being enormous, at least it works reliably!

Spiritus · on June 8, 2020

Yes, I was genuinely asking. I'm very familiar with how the Python ecosystem works (I've been using it professionally for the past 8 years or so), and I disagree with your assessment. At least not when comparing to other dynamic languages.

> Those libraries are installed by Pip... somewhere on your system.

They go into your virtualenv.

> And then Python has to find it somehow at runtime.

If you use a virtualenv, this works 99.9% of the time.

> That relies on environment variables being set correctly

What environment variables? Just use `./venv/bin/python` directly without environment variables.

> a huge mess even before you consider things like virtualenv

There's nothing to consider, always use a virtualenv. It's the same for thing for Node, except it implicitly handles it for you.

> and the fact that non-Linux systems usually have multiple copies of Python installed (and Python 2 and 3!).

How is that a problem? Just pick the interpreter you want to use when creating the virtualenv:

    virtualenv -p /usr/bin/python2 venv
    virtualenv -p python3 venv
    virtualenv -p python3.7 venv

> Consider the alternative: Go compiles programs to a single statically linked executable with no dependencies. You literally just copy one file to your target machine and run it. It basically can't fail. That's part of the reason Go is so popular for server stuff.

I agree that Go got it mostly right and it just works, except for that fact that it didn't even have a package manager for like 10 years so pinning dependencies was impossible unless you forked repositories.

> Even JavaScript - for all the hate node_modules gets for being enormous, at least it works reliably!

In what way is Python + virtualenv + requirements.txt less reliable?

IshKebab · on June 11, 2020

"There's no issue, it's easy! Just don't do it the obvious way. Instead use this other tool, then create an environment, and don't forget to activate it. You need to specify the interpreter you want, and then installing the dependencies is simple, just run `pip3 install -r requirements.txt`. I don't know what the problem is?"

> It's the same for thing for Node, except it implicitly handles it for you.

Indeed.

ghshephard · on June 8, 2020

100% agree with the one addendum that you may want to use virtualenvwrapper.sh - as it makes using virtualenv somewhat more convenient. Mostly syntactic sugar though.

marton_s · on June 8, 2020

> for all the hate node_modules gets for being enormous, at least it works reliably!

Funnily enough node_modules is one of the main regrets Ryan Dahl, the creator of Node.js has: https://www.youtube.com/watch?v=M3BM9TB-8yA&t=755s

IshKebab · on June 11, 2020

Yeah it's not a great design but it does at least work reliably!

atoav · on June 8, 2020

The best thing would be if python had something like a cross between pip and python-poetry, but better integrated with the interpreter.

Python has the most unpythonic package experience ever. As elegant as the language is, packaging is a convoluted nightmare. It still might work for some project, but certainly is no fit for the average python user.

necrotic_comp · on June 8, 2020

I started using the pipenv this weekend and it seems like it's pretty good for at least maintaining an environment within a project. I found it friendly at least.

That being said, I'll admit I don't know much about what it means to be pythonic. What do you think the drawbacks are to pipenv-type approach and what would be more pythonic in your mind ?

atoav · on June 8, 2020

What I mean by pythonic is the "batteries included" paradigma: if you want to — say — quickly script something up that does some time calculations you just go and import datetime. There is no need to research what datetime libraries are ok, download and install dependecies etc. You are able to go and (with the included batteries) solve your problem.

I wish I could just as easily say: ahh this script I made should become a package that I can reuse on my server. But then the server has Python 3.6 while your script has 3.8, so you end up installing another python, painstakingly watch not to install over the old one etc. When installing modules you have to do the same or set up a venv/pipenv.

All along the road there are possible ways to shoot yourself into the leg. Meanwhile in Rust you do a cargo new foobar, work on it, add dependecies to the cargo.toml, copy it to the server, build a binary with cargo build --release and copy the binary into the PATH. You spent zero brainpower on not breaking things and had capacity to think about other stuff.

The closest thing we have in those terms in pythonland is poetry, which is a very good start. But this should be part of the language, not something one has to do extra.

necrotic_comp · on June 8, 2020

Got it - that makes sense. I actually ended up with a smattering of the version problems when I was working with pipenv for the first time, and figured it was my fault in some way because I hadn't set up the "correct" python version on my system. That's somewhat disheartening to hear that it's a consistent problem.

Thanks for the insight and clarification.

jensgk · on June 8, 2020

> I would love to see a move towards "static" binaries that package everything together into a single, self-contained unit.

Isn't that why Docker is so popular? If we had Python static binaries (or the ability to have several versions of the same Python library installed) would we use Docker so much?

agumonkey · on June 8, 2020

and docker is (was) written in go who kinda put more light on static, the circle is complete

rtpg · on June 8, 2020

PyOxidizer might be of interest to you.

shoo · on June 8, 2020

Around a decade ago I worked for a place that packaged and shipped python applications to windows desktops using a similar bundling tool, py2exe. It worked pretty fine. Pretty much just unzips into a directory: there's your self contained python app hiding inside an exe and some associated data files/dlls in the directory.

We still used wheel archives etc for managing some python dependencies during our dev and build process, but it was never a complete solution: some python packages depend upon non python shared libraries so you need to install them using an operating system level package manager. Then when shipping the software to users you cannot assume the user has a working version of python, or even if they do you don't want to increase your support burden by having your application running inside some arbitrary python environment.

pansa2 · on June 8, 2020

Thanks, I hadn’t heard of PyOxidizer before. Looks like a significant improvement over tools like PyInstaller and Shiv, which produce self-extracting executables that are slow and/or leave files hanging around after running.

BiteCode_dev · on June 8, 2020

There are trade off to anything.

E.G:

- shiv leave files hanging around, but you can address those files directly. With PyOxidize, you will need to use https://pyoxidizer.readthedocs.io/en/stable/config_api.html#... a way to read a non Python files. Because most projects don't know this and just use open(), they won't work out of the box and you may need to monkey patch open().

- pyoxidizer requires a compiler, which is easy on linux, but a higher requirement on windows. Compared to shiv, which is just a pip install away.

There is no perfect solution yet, because you have really often many things to balance: bringing in the python vm or not, allow compiled extensions or not, provide support for open() or not, etc.

haberman · on June 8, 2020

Yes, that looks like exactly what I had in mind. Thanks for the reference.

jillesvangurp · on June 8, 2020

Agreed. I'd go one step further and call that docker container. If it needs to run on a server, the preferred format these days is a docker container. I too spend a bit of time fighting pipenv lately and generally trying to figure out the various ways python tries to virtualize barfing all over the filesystem by default.

IMHO python, node.js, ruby, and other scripting languages share a historical confusion of concerns related to development tooling and attempts to counter the unfortunate choice of installing dependencies globally that tend to leak to the server side in inappropriate ways. With Docker.

ClumsyPilot · on June 8, 2020

Except we solved the problem at a higher level of abstraction, creating a more complex system. If every language used LLVM (for example) we'd have almost the same effect.

There are other things docker can do (like mapping volumes) but I would be much happier if we solved it at the level of the executable.

jillesvangurp · on June 9, 2020

Your happiness and practical reality are two things. Docker is there right now and sort of the de-facto way of getting stuff on a server in basically all projects I've been on in the last five years. Programs are more than just their binaries. Most programs also expect non binary content like config files, data files, shell scripts, symbolic links, etc. Packaging that up in a sane way is basically what docker is good at.

u801e · on June 8, 2020

> Today I spent at least an hour fighting with Python packaging. The more I think about it, the more I feel that self-contained static binaries are the way to go. Trying to load source files from all over the filesystem at runtime is hell.

Most python packages I have seen will put their python source in the python site-packages directory. Where were the source files you were trying to load located on the file system?

nh2 · on June 8, 2020

You may enjoy this StackOverflow answer on The Return of Static Linking, summarising advancements on the topic:

https://stackoverflow.com/questions/3430400/linux-static-lin...

blaisio · on June 8, 2020

I do prefer static self contained binaries, but I think it's important to note that Python's package management system is horrible, and making self contained binaries is not the only way to fix it.

vmchale · on June 8, 2020

> The more I think about it, the more I feel that self-contained static binaries are the way to go. Trying to load source files from all over the filesystem at runtime is hell.

One need not depend on the filesystem for such things. And as bad as python packaging is, others get it better. One can have less-than-static-binaries without being quite that bad.

> I would love to see a move towards "static" binaries that package everything together into a single, self-contained unit.

Plenty of problems with that too!

kkapelon · on June 8, 2020

>I would love to see a move towards "static" binaries that package everything together into a single, self-contained unit.

You mean Docker containers? :-)

drran · on June 8, 2020

VMWare images or even hardware images: it's easier to buy new phone than to patch old one.

oefrha · on June 8, 2020

zipapp? It’s not like a self-contained solution didn’t exist for a long time. (If you mean the Python installation also needs to be contained in that self-contained binary — sorry you chose a scripting language.)

And yesterday I spent at least an hour fighting with cargo build. Without knowing exactly why we fought both our anecdotes are quite useless.

ashishb · on June 8, 2020

Even more so in a world where storage is cheap.

mintplant · on June 8, 2020

Just use Poetry. I haven't fought with Python packaging in a long time.

ghshephard · on June 8, 2020

I see a lot of references to Poetry, but I've yet to see any uptake with the "professional" python developers that I know - which could be entirely to inertia, rather than "best tool for the job" - most of them see to be still in the pip/virtualenv/requirements.txt world for all of their development/docker/k8s deployments.

I wonder if there have been any surveys on poetry, and whether it's trajectory (presumably exponential) suggests that it's going to become a dominant package management utility in the next 10 years or so?

developer2 · on June 8, 2020

IMO: Python is a garbage language, and should never be used for professional applications. Try to write an app whose very first block is a try/except for KeyboardInterrupt, with or without an if __name__ == '__main__' block. Now run the program a couple dozen times, and press ctrl+c as fast as you can. You will wind up with a debug-level stack trace dump for an uncaught KeyboardInterrupt exception from the core libraries that is thrown before your first line of userland code is parsed or executed. Python runs so much internal setup code before your first line of code is even considered. It's literally impossible to write a Python program that doesn't dump debug garbage to its output, without writing a separate bash script that uses trap to proxy SIGINT and SIGTERM to the running process. Python is a fucking joke in my eyes; and this is before discussing the way async/await was implemented by gutting the core of the language in a way that makes zero fucking sense. Python is a total… fucking… joke. It has some nice features when all you care is about the syntax and ease of development, but the fact is it's a terribly broken language. Use C. Or C++. Or Rust. Or Go. Or Ruby, or literally no joke… even PHP. Python is literally THAT HORRIBLE once you understand its internals. It absolutely disgusts me to say, but PHP with its single-threaded model is put together better than Python.

tldr; Do not use Python. It makes certain tasks seem easy, but all it truly does is make you look like an idiot developer. Pick literally any other language, as Python has screwed up too many basic concepts (related to POSIX and expectations from C) to be taken seriously.

shadeslayer_ · on June 9, 2020

comments like this make me want to ask for a downvote button.

developer2 · on June 11, 2020

So you are capable of writing a script in Python that always catches SIGINT and SIGTERM, exiting gracefully without an exception stack being thrown by core code? No? Right… no. Python is literally the only language I've used that cannot do this. Even PHP can do it. For real, PHP is overall a shit language (it has zero async/threading support), but it still handles signals according to POSIX (eg. it doesn't override signal handlers unless userland code does it, preserving default SIG_DFL behavior), unlike Python which breaks that simple contract. The "KeyboardInterrrupt" exception being thrown by core code before userland code is parsed and executed is such an extreme bastardization of POSIX standards that Python is simply, inexcusably, broken.

I stand by my opinion that Python is not well-suited to applications written by a professional entity, intended to be shipped to 3rd-party clients. Python plays too loose with the fundamentals provided by the operating system to be a professional language.

mturilin · on June 8, 2020

Deno (https://deno.land) solves this problem very well for JavaScript. I’d like so see more languages adopting “dependencies are URLs” approach. No more package hell.

aequitas · on June 8, 2020

How does this solve the problem? As I can tell it replaces a centralized dependencies (requirements.txt) containing references to a known and trusted registry (pypi.org) with url's in all files across your project (and 3rd party library's files). Making maitenance only harder and security updates even more so. And making it harder to debug things as you now keep in mind that multiple versions of the same packages can be used in your source. Also, requirements.txt already support's urls if you want your sources from other places.

For me, most of the pain with the Python packaging went away after I started using Pip-tools[0]. It's just a simple utility to add lockfile capabilities to Pip. Nothing new to learn, no new filosophies or paradigm's. No PEP waiting to be adopted by everyone. Just good old requirements.txt + Pip.

[0] https://github.com/jazzband/pip-tools

julianwachholz · on June 8, 2020

Like GoLang did?

u801e · on June 8, 2020

One thing I found when converting our python application packaging from RPM to wheels is that wheels don't properly handle the data_files parameter in the setup call. That is, it places files under the python library directory instead of in the absolute path as specified. This means that sample configuration files and init scripts end up in the wrong place on the file system. In order to get around this, we had to upload the source distribution to our devpi instance and run pip install with the --no-binary option which would then place those files in the correct directories.

The other issue is that there's no equivalent of the %(config) RPM spec directive to prevent the config file from being overwritten if it already exists on the file system.

So, for libraries, wheels are a good cross-platform packaging solution, but not so much for applications that require configuration files and init scripts.

raziel2p · on June 8, 2020

Wheels don't, and shouldn't, replace RPM or deb packages, and I don't think anyone advertised them as such. They replace eggs or installing from source .py files, you should still build RPMs or debs on top of that if you want to do any system-level stuff like set up systemd services, config files in /etc and so on.

u801e · on June 8, 2020

> Wheels don't, and shouldn't, replace RPM or deb packages

Part of the motivation for using wheels in our case was to allow for pulling in later versions of certain python packages as part of the dependency resolution process when installing or updating the package. Using RPM while attempting to get the updated dependencies would mean having to repackage every single dependency (direct and indirect) from pypi as RPMs and upload them to our yum repo so that yum would be able to handle dependency resolution.

> I don't think anyone advertised them as such. They replace eggs or installing from source .py files,

The setup method from setuptools does have the necessary features for handling system-level files and the bdist_rpm command leveraged that when building RPMs. It just seems that if this was a bug rather than a feature, then it should have been noted when transitioning from egg based distributions to wheels. Nothing in the documentation for eggs or wheels states that they don't support deploying system files in absolute file paths.

The fact that doing something like:

pip install git+ssh://git@github.com/org/project.git@1.2.3

and

pip install project==1.2.3

have different results for where files listed in the data_files named parameter of the setup end up on the file system suggest that the lack of absolute path support when installing wheels is a bug rather than an intentional change.

raziel2p · on June 8, 2020

That makes sense. What I did in the past was include wheels of the project itself and all dependencies (pip download) in the RPM/deb, then create the virtualenv and install dependencies as a post-install script. You could upgrade dependencies by calling pip install -U from the virtualenv, however any time you installed the RPM/deb it would "reset" to the packaged version (though there should be ways to prevent that, e.g. tell pip to never downgrade packages).

groodt · on June 8, 2020

Agree.

I don't think the Wheel specification was intended to replace application distribution. It is aimed at the distribution of individual Python libraries.

enriquto · on June 8, 2020

What's the difference between "applications" and "libraries"? Why can't they be distributed in the same way?

oefrha · on June 8, 2020

Libraries are self-contained and can unpack into any prefix. Applications tend to have config files, service description files, documentation, etc. scattered all over the system at specific locations in order to comply with host system conventions and inter-operate with supervisors and other applications.

Wheel is an unpack-into-any-prefix (and stays in that prefix) binary distribution format.

BiteCode_dev · on June 8, 2020

Applications are for the end user. The end user don't need the source code, don't want to compile anything, don't want to bother about installing Python or even knowing it's written in Python, wants it work with his system of choice (app store, package manager, simple exe...), wants OS integration (entry in menus, icons, etc.) and want it fast.

Libs are for the dev: you want the source code, you want to manage versions, you want it to integrate with your dev env, you want a well delimited scope, a way to install it using your usual setup (pip/poetry/etc) and to isolate it from project to project.

enriquto · on June 8, 2020

As a "dev" who develops mostly for other "devs" this distinction is not so clear to me.

BiteCode_dev · on June 8, 2020

Do you provide your lib as a dmg for mac users or through the windows app store ? Do you provide a entry in the start menu for it ? An icon on the desktop ?

enriquto · on June 8, 2020

Lol no I just send them an email with the url of a private git repo containing my code/experiments, and let them sort it out XD The pleasures of academia.

pansa2 · on June 8, 2020

Python libraries are distributed to other Python developers. Applications are distributed to end-users, who may not even know what Python is, let alone details of any Python installation on their machine.

jeanvalmarc · on June 8, 2020

Is it desirable for a distributed Python package to write files to a hardcoded absolute path? Wheels seem to respect the `package_data` setup key, which is definitely inside the library path and seems to work pretty well for library data. I'd be a bit horrified if a package I installed from Pypi attempted to write stuff in `/etc/` or whatever. I guess I'm just asking, aren't these two separate problems (config management and Python packaging)?

u801e · on June 8, 2020

> Is it desirable for a distributed Python package to write files to a hardcoded absolute path?

The established behavior of python setuptools when creating an RPM spec file via the bdist_rpm command was to create a spec file that would invoke the build command in the %build section and the install command in the %install section. The install command would place those files in the absolute paths specified in the data_files named parameter.

It has worked this way even back when distutils.core, the predecessor of setuptools, was used.

As for whether it's desirable, if you're missing the config file, init script, and other things necessary for an application to work, then you essentially have an incomplete installation and will have to copy those files from other sources and update them yourself via config management before you're able to run it.

> I guess I'm just asking, aren't these two separate problems (config management and Python packaging)?

In my view, config management should update a config file that's already there and also restart the service as needed (either when the package is updated, the config file is updated, or both). It shouldn't be responsible for placing the config file and init script itself.

For example, if I install an application like redis or rabbitmq-server, the package manager will place the included sample config file in the correct directory (e.g., /etc). I can further modify that file as necessary before starting the service. It will also include the scripts necessary to start the service as a daemon. Config management could be used to update the config file for environment specific concerns, but it doesn't need to essentially duplicate what the package manager has already done (create the directories, set permissions, place the file, etc). Python applications, regardless of how they're packaged, shouldn't really be different in this regard.

orf · on June 8, 2020

Yes, but python packages are not for distributing that kind of software.

You create a RPM _from_ a Python package. The RPM (and associated software) writes init files to the correct place. This bears no relevance on wheels, which absolutely should not be filled with district-specific knowledge about init scripts and should definitely not write to random hard-coded paths.

u801e · on June 8, 2020

> Yes, but python packages are not for distributing that kind of software.

Where is that documented? The fact that setuptools and distutils.core supported building RPM packages from a setup.py file indicates that python intended to support application distribution. Why remove that capability when moving to wheels?

> This bears no relevance on wheels, which absolutely should not be filled with district-specific knowledge about init scripts and should definitely not write to random hard-coded paths.

One thing I noted while figuring out this issue is that running:

pip install git+ssh://git@github.com/org/project.git@1.2.3

versus

pip install project==1.2.3

where project was packaged as a wheel on pypi had different results with regards to where files referenced by the data_files named parameter in the setup call were placed on the file system. Given that, was the change intentional, or is it a bug with how wheels handle package installation?

kstrauser · on June 8, 2020

Python has a couple of mechanisms for fetching files from inside the wheel files (or eggs or...) without you having to know or care about exactly where they’re located. See https://stackoverflow.com/questions/6028000/how-to-read-a-st... for examples.

sandGorgon · on June 8, 2020

You need to reopen this https://github.com/pypa/wheel/issues/92

pjc50 · on June 8, 2020

> it places files under the python library directory instead of in the absolute path as specified

How is this supposed to work in virtualenv environments? What happens if you have two of them?

navait · on June 8, 2020

I don’t know a lot about Python tooling, but in general my experiences with pip have been pleasant, so I appreciate all the work done by the maintainers to make it pleasant.

foldr · on June 8, 2020

Ah, so I've been confusing PyPI with PyPy. Someone made a great naming decision there.

wodenokoto · on June 8, 2020

A too little known fact: PyPI is pronounced Py-pee-ai, whereas PyPy is pronounced Py-py.

If everybody knew that, I think the confusion would mostly disappear.

JTon · on June 8, 2020

> A too little known fact: PyPI is pronounced Py-pee-ai

So I was reading this comment and thinking to myself, how does one even pronounce "ai"? Language is not my strong suit. After searching the web, I found out that "ai" is a diphthong and it sounds like "eye" or the letter "i". Leaving that info here for others who may be confused. Sound: https://www.youtube.com/watch?v=uyKgPH0kmrU

wodenokoto · on June 8, 2020

Just goes to show how difficult conveying sounds and pronunciation can be in text!

I’ll keep this in mind to next time I want to write out the letter I and use eye instead.

divbzero · on June 8, 2020

PyPI: “pie-pee-eye”

PyPy: “pie-pie”

wil421 · on June 8, 2020

Glad I wasnt the only one who thought how does “ai” sound.

young_unixer · on June 8, 2020

I confuse PyPI with PyPA, the maintainers of pip, which allows you to interact with PyPI.

schwag09 · on June 8, 2020

At one point in time I created a Python package to highlight this benefit of wheels: "Avoids arbitrary code execution for installation. (Avoids setup.py)" - https://github.com/mschwager/0wned

Of course Python imports can have side-effects, so you can achieve the same results with 'import malicious_package', but the installation avenue was surprising to me at the time so I created a simple demo. Also consider that 'import malicious_package' is typically not run as root whereas 'pip install' is often run with 'sudo'.

ghshephard · on June 8, 2020

I thought the rule was never run `sudo` with `pip install` or you'll screw up the permissions on your system.

sigjuice · on June 8, 2020

For other reasons, it is just as bad when using python from Homebrew on macOS where sudo isn't necessary. `pip install` will install modules in `/usr/local` where they will get mixed with Homebrew-provided python packages. I was hoping there would be a way to make `pip install --user` the default, but I couldn't figure it out the last time I checked.

ghshephard · on June 12, 2020

This is exactly why you want to do all (as in 100%) of your python work in a virtual environment, so the packages are completely isolated in your ~/.virtualenvs/[ENVNAME]/lib/pythonx.x/site-packages.

Never, ever, do a pip install in your non virtualenv environment.

djlewald · on June 9, 2020

If you're using a *nix system, you could create an alias. Something like, `alias "pip install" "pip install --user"`

downerending · on June 8, 2020

So far it's just a bit of smoke on the horizon, but I'm noticing some packages abandoning 'pip' installs entirely in favor of 'conda'. It's a bit early to tell if this trend will take off, but it does seem plausible.

dijksterhuis · on June 8, 2020

I really hope conda stays on the user end.

Like, I understand it's great for managing package dependencies/setting up environments etc for a single project. But I've found it's an absolute nightmare for building docker images/generally doing build stuff when it's involved.

Not to mention conda installs seem to take waaaay longer than pip.

apt + pip works far more sensibly and reliably in my experience.

lsorber · on June 8, 2020

My experience has been the opposite: building Docker images is much easier with conda than it is with pip. With conda you can start from miniconda3, copy an environment.yml, and then conda create it. With pip, you might need to take additional steps to install system dependencies like build-essential first, and you'll need different tools to manage your virtual environment and Python installation too.

dijksterhuis · on June 8, 2020

Yeah I know I'm on one side of the fence with this. The people I work with (not engineers) use conda all the time and love it.

A big thing for me is that apt + pip means I know more about what I'm installing, whereas conda seems like it'll go off and do what it thinks is best. If it's going into production then I want to know why we need package X and how it's been installed, rather than "conda says we need it".

Basically, I think conda encourages "install and forget" which means people don't really know or understand what they're putting into production. And that can cause a lot of problems further down the line.

Also the fact that conda installs everything to sandbox causes it's own issues. Suddenly I have two versions of a system package. Now I've got to do extra work to deal with that.

Then again, it could just be I never got over the time where uninstalling anaconda ripped apart the python install on my MacBook. That's a weekend I won't get back.

__

Also, Production shouldn't use virtual environments in my opinion. That's an additional deployment/build step which could fail one day. The container image is a virtual environment in and of itself anyway!

akbo · on June 8, 2020

Sounds like you just need to train your colleagues to be a bit more disciplined with their package managing. It's not that hard to be clean about dependencies with conda. Maybe my take on it can inspire you here: https://haveagreatdata.com/posts/data-science-python-depende...

Re the extra "virtual environment", you can just use conda's base environment in prod. Here's how to do that: https://haveagreatdata.com/posts/step-by-step-docker-image-f...

dijksterhuis · on June 8, 2020

Read your post. Yeah, conda works fine with pip stuff. And yes, what you've detailed is similar to pinning versions in requirements. Which I do as standard -- I even do it with apt installs sometimes.

But I often need to install via system package management for other dependencies. conda doesn't respect the base system package manager. That is what causes headaches.

If conda respected system package management first, then installed as necessary, I wouldn't have a problem with it as an admin. But it doesn't because it's not built for engineering/admins (want stability + efficiency), it's built for scientific projects (want to run code easily).

Also, I'm using the "royal* we. Like, we as in admins generally. I'm the only admin in my team (voluntarily), so I need to be ruthless with this type of stuff.

EDIT:

I think you missed my point about virtual environments.

The entire container is a virtual environment. Why would we want to use another virtual environement for no reason except the fact that conda wants us to?

It adds extra steps which we'd have maintain. Which means more developer resource spent on maintenance. Which means less time spent on new features.

It's just another thing that could go wrong. Simpler systems break less often.

akbo · on June 8, 2020

I see where you're coming from.

My use case is this: as a data scientist, I start new code bases all the time. Each project, simple experiment, data analysis, etc. needs its own cleanly separated dependency environment so I don't end up in dependency hell (I have 12 conda environments on my machine right now). Conda allows me to handle these environments with ease (one tool and a handful of commands -> as detailed in the article). With conda, I also have my data science Python cleanly separated from my system Python.

Of course there are other tools that can handle this use case. But pip alone won't do the trick. I don't like to have three separate tools for this (pip + venv + pyenv).

When I put something into production, I naturally want to keep using my conda environment.yml and have the same environment in dev and prod instead of switching to pip + requirements.txt, which might introduce inconsistencies.

mbar84 · on June 9, 2020

I've built my project template around conda, primarilly so that a new user can just do `make install` and have all environments created. The docker portion also works fine, the only issue being that the images are perhaps a bit larger than I would like.

https://gitlab.com/mbarkhau/bootstrapit

blt · on June 8, 2020

conda seems to spend a lot of time in a SAT solver or something to deal with the known NP-complete problem of dependency resolution [1]. Maybe pip doesn't do the same thing for requirements.txt setups?

[1] https://research.swtch.com/version-sat

downerending · on June 8, 2020

It can be pretty bad, yes. A primary issue seems to be that conda package authors don't always do their deps right.

Pip, on the other hand, often "wins" by not even bothering to notice the versions of low-level libraries, etc. If that doesn't work, you get to keep both pieces.

wdobbels · on June 8, 2020

Indeed, pip has no dependency resolver as of now, although it is being actively worked on [1], and they did secure some funding [2].

[1] https://github.com/pypa/pip/issues/988

[2] https://pyfound.blogspot.com/2019/11/seeking-developers-for-...

dimatura · on June 8, 2020

Yeah, so while conda can be annoyingly slow, comparing it directly to pip in that department is unfair. A more fair comparison would be pip plus poetry or pipenv, which also have speed complaints :/

0az · on June 8, 2020

Conda does their own thing, so I hope that this doesn't become commonplace.

It's a nightmare of interop. Yes, it works for one person on a single laptop, but my experiences with conda outside of the happy path are universally terrible.

I'll stick with the standards.

lsorber · on June 8, 2020

Can you give an example of a case outside of the happy path?

reubenmorais · on June 8, 2020

Just one from my personal experience: Conda installed Python having different compilation flags than official and distro Python builds, breaking ABI and causing native extensions built against official Python to crash when loaded in Conda Python.

kaybe · on June 8, 2020

A few times a year our office loses power, which causes the machine we have running for data evaluation to crash. There seems to be about a third chance for that to damage my Anaconda installation.

Last time I could only solve it by deinstalling everything and reinstalling an updated version. Took ages to get all packages running again. Luckily my code still works but now I get depreciation warnings (and not in my code). Who knows what will happen next time. It only took like a day I needed for actually working with the data as well... urgh.

closeparen · on June 8, 2020

As I understand it, conda has more sophisticated handling of native components, and is largely seen in scientific environments where Python is just the frontend to libraries that are actually in C++ and Fortran.

pas · on June 8, 2020

Conda is "pip" for people who know _nothing_ about pip/operating-systems/IT. It is basically a way to have a working environment without understanding that maybe pandas and numpy and scikit are not Python built-in modules, and that maybe you'd need a working C/C++ (and even a Fortran compiler too) to install it.

It's great for that purpose, but - as far as I know - it has not much to do with general Python dependency management.

downerending · on June 8, 2020

Conda does (in theory) handle python module dependencies well. In practice, like pip, sometimes you win and sometimes you lose.

Note that you can use 'pip' within a 'conda' environment as well, so you can sort of have it both ways.

dimatura · on June 8, 2020

I think this trend may be limited to fields relying heavily on natively compiled dependencies, which may not even have any python at all. Conda packages can be C++, Fortran, pretty much anything. That's the main draw for me -- otherwise, I'd stick with pip, which is definitely more mainstream.

groodt · on June 8, 2020

325 of top 360 Python packages are now distributed as Wheels.

thehiddenbrain · on June 8, 2020

For anyone working on python wheels requiring to compile c/cpp/fortran, this may be useful. See https://scikit-build.org

(Disclaimer: I am one of the maintainer)

dralley · on June 8, 2020

Thanks for your work! I've used skbuild to package a couple of C library dependencies we have that would otherwise only be available to us as RPMs.

usr1106 · on June 8, 2020

That doesn't seem to be a particular fast adoption. I remember seeing the first wheels when working at a job I quit early 2010. So it must be over 10 years.

Edit: A web search points to 2012, so maybe it's "only" 8 years?

Edit 2: Pip came in 2008, so something changed somewhat before 2010 as I remembered. But what did it install if not wheels?

groodt · on June 8, 2020

If you were using PyPI you would have encountered source archives or Wheels. If you were not using PyPI you could have encountered source archives or Eggs.

usr1106 · on June 8, 2020

> If you were using PyPI you would have encountered source archives or Wheels.

Before 2010? That's what I believed to remember. But it looks like wheels did not exist before 2012, so that can't be true.

groodt · on June 8, 2020

If you were using PyPI before Wheels were available, you would have encountered source distributions (sdist).

jacobush · on June 8, 2020

Eggs?

fortran77 · on June 8, 2020

But snakes don't have wheels. Why not call it "skins?"

xapata · on June 8, 2020

You get wheels of cheese from the Cheese Shop. https://www.youtube.com/watch?v=Hz1JWzyvv8A

reedwolf · on June 8, 2020

I wonder how long until nobody remembers Python isn't named after the snake species?

kuroguro · on June 8, 2020

But surely Monty Python was named after the snake species?

fortran77 · on June 8, 2020

Aha! Thanks.

jessaustin · on June 8, 2020

This seems like a really effective way to set up a page for shaming "laggards". It would be interesting to track over time how many github issues are just links to this page.

rlayton2 · on June 8, 2020

As noted on the page, but this idea (at least in the python world) started with such a "Wall of shame" for libraries that hadn't updated to Python 3 support. I'm not sure how much progress was directly attributable to that site, but it was fairly widely referenced.

jessaustin · on June 9, 2020

Haha python is so authoritarian.

dingdingdang · on June 8, 2020

Since pip is used to install Wheels it would probably be best to have a new separate 3rd party meta tool to install package managers themselves to avoid the confusion. Preferably this should have it's own additional PEP and integrate PyPI and PyPy along with other packages that could make life simpler for the (hopefully now happier) end user.

awinter-py · on June 8, 2020

have always wondered why pypi doesn't generate whl files for pure-python sdists

and why companies like travis / github aren't more active in language-level packaging work

github gives away so much free docker time -- faster installation would save them money directly

X-Istence · on June 8, 2020

Because there is no way to know if something is pure-python or not.

Even more so with pep517 and being able to use different build systems.

awinter-py · on June 8, 2020

wait then how does bdist_wheel know to make platform-dependent vs platform-independent whl files?

icegreentea2 · on June 8, 2020

Your hunch is right, setuptools is able to determine if it's pure python or not (https://packaging.python.org/guides/distributing-packages-us...)

Maybe it's not reliable enough?

In any case, I can see why PyPI wouldn't want to increase the scope of their work. Sometimes is super valuable to just be good at one thing (distribute what others give you).