Hacker News new | past | comments | ask | show | jobs | submit login
Tools of the Modern Python Hacker: Virtualenv, Fabric and Pip (clemesha.org)
157 points by iamelgringo on July 8, 2009 | hide | past | favorite | 41 comments



as someone who has been exploring python lately but who has spent the past few years using mostly ruby and javascript, i have really enjoyed observing how the python community approaches problems ...

- code readability is generally very important (unlike ruby, where magic-fu is quite important)

- there is more heterogeneity

- there is less of an obsession with testing and boasting about it.

Most importantly, I haven't read any blog posts promoting automated code quality measurement tools (are any of your methods over 3 lines?) -- those things are incredibly backwards in my opinion, and the gravitational pull of the ruby community toward them is unfortunate, and suggest that too many ruby programmers are squandering their gains in productivity refactoring code into 3 line methods...


I know this may be hearsay, but I've been using pylint and have been generally pleased with the results. I don't take everything it says as the ultimate source of coding standards, but it will pick up on some silly mistakes one might make and generally make my code more maintainable.

Granted, I'm still in University, but I think your disdain for automated code quality tools may be misplaced. Instead blaming the tools, why not blame the community as a whole for engaging in arcane coding practices.


You mean heresy. Hearsay is information you learned from a friend. :)


He may have learned that from a friend. Who here really remembers what they did during one of those all night coding sessions?


Could be, but sadly no. Chalk this up to a fatal reliance on Firefox spellchecker and laziness.


There are definitely legitimate uses of such tools... I myself occasionally use jslint, to minimize the chances of obscure browser compatibility issues.

However some poeple run these "code quality" measures as part of a regular test suite, so if you write a method that has more than 3 lines you get a warning or failure.

In my opinion this leads to less readable code, as methods are prematurely refactored simply for size, and coders tend to do things like putting a hash literal on a single long line rather than formatting it for maximum readability.

This is just my opinion... I don't hold it against someone to use them, I just think they can make code less readable, not more.


I think testing is an ok thing to be obsessed about... Oddly i just wrote that in my profile just a few minutes ago.

I like to view the Ruby VS Python discussion as pointless. There's a lot to learn from each other. In my opinion those two languages are poised to, if not already, lead web-development into the future.


Totally. They are both awesome. It's fun to be a part of both "cultures" :)


Beware: talks about the benefits of using utility code to solve subproblems without even a nod to the complexity introduced by adding dependencies to a project and the costs thereof. This always tells me I'm reading someone who views software development as writing code only, and doesn't spend much time maintaining old stuff. All the "hacker" verbiage notwithstanding, I think the author is missing some important points about what makes a good hack.

That said, I don't know a thing about any of these tools. They might be fantastically worth it for all I know. But talk about the tools, not about how they make you a great hacker.


I do real-world deployment of Python applications. I use two of these three tools daily, and will be using the third once it hits API stability.

I use them because -- even though they represent added dependencies -- they deal with tough problems in clean ways.

For example, every platform has its equivalent of DLL hell; in Python it's not unheard-of for Application A to need version 1.0 of some library, while Application B needs version 2.0, they both have to be on the same server and the two library versions are incompatible.

You can solve this by careful manual management of import paths, or you can use virtualenv, which does it for you and ensures complete isolation of each application's dependencies. I opt for virtualenv.

Add pip and you get a reproducible build process: create a virtualenv, and install whatever's needed to turn it into a working environment for your application. Then 'pip freeze' and you have a requirements file which can reproduce that environment for you on-demand (and which is also human-readable documentation of the needed software).

Fabric's in the middle of a big rewrite and so I'm not presently using it, but I will be once it's stable, because it adds the final piece of the puzzle: deployment to multiple servers, with support for multiple targets (e.g., I can differentiate "deploy this to the staging area" from "deploy this to the production environment").

Yes, there are tools on popular operating systems which can do these sorts of things, but you don't always have a homogeneous deployment target where you can rely on the same tool being available everywhere. These tools, on the other hand, are pure Python and so get to be cross-platform largely for free.


The tool he's missing that I end up using constantly is Memoize - http://www.eecs.berkeley.edu/~billm/memoize.html - enormously simplified my make processes, and works very well with Fabric and the others.


I am probably missing something about Fabric, but it seems to me it solves a problem already solved by most unix distributions.

Why not utilize the host packaging system for deploying your code and applications?


Because it's sometimes simpler not to, especially if you're planning to do more than just internal deployment.

You may end up dealing with multiple operating systems or package tools, for example, at which point Python's ubiquitous setup.py (and tools which work with it on every OS) look very attractive.

You may end up needing to install App A with Version 2.0 of Dependency B, and App C with Version 3.5, on the same server, at which point virtualenv starts looking really attractive.

These sorts of situations are extremely common in the real world, and "use the OS package tool" isn't a one-size-fits-all solution for them.


There are classes of problems that can't be solved at the system level, like deploying to more than one system type or across multiple machines at the same time.


Sorry, but I wonder what you're talking about. Package managers are designed exactly for this purpose.

It's not even hard to leverage the power of apt (the most advanced of the bunch) for your own deployments.

In essence you setup a local mirror and add that to the sources.list of all your hosts. Then you learn how to roll your own debs and push them to the mirror. The actual deployment happens via apt-get - which can be triggered by cron, a shell-script, puppet, a sweaty admin at 4am, or whatever fits your bill.

Working with the system this way instead of along with it (or even against it) has various advantages. Most importantly you get proper dependency management. Need to roll back to app version 2 from version 3, whereas version 2 depends on an older version of foo? No-brainer, apt takes care of that for you, both ways, also in much more complicated cases. Need a package or package version that's not in the official repos? No problem, roll your own and make your application package depend on that.

With a bit of elbow grease you can also have it mangle your database and other auxiliary infrastructure appropiately, within the respective pre-/post-install scripts.

Fabric and capistrano are just expressions of the old "if you don't understand you're doomed to reinvent, poorly" meme.


I agree with you. But now I want to launch and release to several EC2 and Rackspace machines, in parallel. apt doesn't help with that. It also doesn't help with releasing to multiple machines simultaneously (including different types).

If I have 5 debian machines that need to be updated, I should be able to do that with a single command and it should happen in parallel. The same applies if I have 5 debian machines and 5 red hat machines (etc...). I'm advocating a tool that is aware of the existing system specific package managers rather than a replacement of them.


I agree with you. But now I want to launch and release to several EC2 and Rackspace machines, in parallel. apt doesn't help with that.

Ofcourse it does. What makes you think it doesn't?

If I have 5 debian machines that need to be updated, I should be able to do that with a single command and it should happen in parallel.

reprepro -Vb . stage1 myapp_2.0-1.dsc

That drops a new pkg onto the mirror where the staging hosts pick it up within one minute, from cron. I could use the "live" distro instead of "stage1" to roll it out to production. We use sections if we want to limit the push to individual groups of hosts.

The same applies if I have 5 debian machines and 5 red hat machines (etc...)

If you mix linux distributions in a production environment then you have bigger problems to resolve first.

I'm advocating a tool that is aware of the existing system specific package managers rather than a replacement of them.

Those who don't understand are doomed to reinvent, poorly...


I think we mostly agree, we're just looking at the problem from different directions.

Of course it does.

Can apt launch EC2 instances and execute scripts (that are not part of the package) before and after installation? Can it update security group settings and request and assign static IP addresses? My understanding is that apt does not help with these problems, so we write scripts or use tools like Fabric to do this. These scripts/tools are aware of the package manager in that they call the commands to make things happen. This is the level I'm talking about at which there are open problems.

If you mix linux distributions in a production environment then you have bigger problems to resolve first.

In an ideal world this is true, but it does happen. For example, one vendor my require a specific type or version of OS from the rest. A business may also choose to change the OS from one release to the next.

It's important to be aware of what is possible and account for it ahead of time. Again, I'm not advocating not to use apt or yum or rpm. I'm suggesting that it's helpful to not tie your process to a specific one unless you have complete control over the environment, now and for the foreseeable future.


Can apt launch EC2 instances and execute scripts (that are not part of the package) before and after installation? Can it update security group settings and request and assign static IP addresses? My understanding is that apt does not help with these problems, so we write scripts or use tools like Fabric to do this.

Well apt does not launch EC2 instances, you launch them, after you defined their role in your central configuration server.

The first thing a launched instance does (in rc.local) is "apt-get install bootstrap". The bootstrap package contains everything a node needs to come alive. Ours consists of not much more than a script that immediately runs via the post-install hook. This script is where the magic happens, it connects to the "hivemind" and gathers the configuration data, based on the node name that the instance was parametrized with at startup. According to the role it is asked to assume it will install the appropiate application packages (we call them "logic bombs"). For sanity it makes sense to just name the packages after the role. We have packages for "faceplate", "db", "queue" and such.

The packages will depend on other packages as needed and most of them contain pre-install hooks for initialization tasks (e.g. mount an EBS volume for a database node, claim an elastic IP, mangle DNS, etc.).

Well, long story short, I think the key mistake of capistrano and fabric is to assume Push where you really want Pull. Once that is realized life becomes much easier.

My understanding is that apt does not help with these problems, so we write scripts or use tools like Fabric to do this.

Apt is ofcourse just one part of the toolchain and scripts will always be involved either way. My point is that a toolchain built around apt most likely has no need for something like fabric. Fabric is just not a very useful abstraction in a scenario involving more than a handful of hosts.

In an ideal world this is true, but it does happen. For example, one vendor my require a specific type or version of OS from the rest. A business may also choose to change the OS from one release to the next.

Well, these are problems technology can't fix. These are problems only the HR department can fix.

I'm suggesting that it's helpful to not tie your process to a specific one unless you have complete control over the environment, now and for the foreseeable future.

There is a word for systems where nobody assumes "complete control": abandoned.


It's not the packaging, but Fabric (and Capistrano and friends) all look like shells scripts to me.


Think of them as shell-script frameworks. They include commonly used tasks/libraries that are useful for deployment and build tasks. Yes, you could use shell script, but these frameworks gives you a more convenient environment. They are also easier to set up and are more portable, since they have fewer external dependencies.


They are also easier to set up and are more portable, since they have fewer external dependencies.

Sorry, but WTF? This made my toe-nails curl up.

You can't get a much easier setup than "already installed". You can't get much more portable than bash. And you can't get much less dependencies than zero.

Seriously, sit down and write the equivalent shell-script to whatever fabric/capistrano recipe you're currently using. I'm quite sure you'll be a bit baffled about why you bothered with them in first place.

To me it seems like Fabric/Capistrano were invented by people, for people, who are afraid to learn the bash syntax. This is unjustified, bash syntax is ugly but trivial.


Try getting a moderately complex shell script to run across different platforms. I dare you.

While it might be a safe bet these days to assume that bash exists (Though there are no guarantees), you can't really do anything with the shell alone - You have to call external commands, and they vary from platform to platform. FreeBSD has all sorts of annoying small variations of standard gnu utilities (or was it the other way around). And Windows doesn't even have a standard shell.

> To me it seems like Fabric/Capistrano were invented by people, for people, who are afraid to learn the bash syntax.

To me it seems like you never actually used shell script for anything serious.


Try getting a moderately complex shell script to run across different platforms. I dare you.

That's a broken premise. Your deployment script doesn't need (and should not) be complex by any metric. Your dependencies are ssh, tar, mv, cp, rsync/git/svn and a very small number of other utilities which are easily tested or wrapped for compatibility. If you think you need more then you're likely doing it wrong (e.g. trying to reinvent version control and package management at the same time).

and they vary from platform to platform

That's the other broken premise. You don't "build once, run anywhere". You build platform specific modules and only trigger them centrally. Puppet shows the way.

To me it seems like you never actually used shell script for anything serious.

Hm, let me think, I've created and managed a deployment of >20 racks. But yeah, nothing serious.


> Your deployment script doesn't need (and should not) be complex by any metric.

I kind of agree, but complexity is a relative concept. mv and cp doesn't exist on Windows. And you really don't have to use exotic commands to run into compatibility issues between bsd and linux. It's not long ago I had an error reported due to the fact that readlink doesn't work equal on linux and bsd. I don't think readlink is in the "too complicated" basket.

> That's the other broken premise. You don't "build once, run anywhere".

Why would I prefer to write three different deployment scripts, if I could write one? Am I missing something here?

> Puppet shows the way.

I don't know Puppet. I'll have a look at it.


I kind of agree, but complexity is a relative concept. mv and cp doesn't exist on Windows. And you really don't have to use exotic commands to run into compatibility issues between bsd and linux.

Well, I have yet to see an application to span across Linux, BSD and wintendo at once. Such a setup obviously has much bigger problems than the deployment process ("broken beyond repair" comes to mind).

The real world deployments I have met were, at most, Linux/Solaris and even in these cases it was usually the cheapest to just scrap one (guess which) and move on. I have never seen a case where maintaining a mixed cluster could have possibly been worth the maintenance overhead.

Why would I prefer to write three different deployment scripts, if I could write one?

Because 3 simple scripts trump one that is gobbled up with conditionals.


> You can't get much more portable than bash.

I was with you up to this point, bash is not portable, and it is a hideous shell.

Any sane hacker writes all their scripts that are meant to be portable in standard bourne shell. Bourne is to bash as C is to C++. And people that put #!/bin/bash at the top of scripts that can be run by any bourne shell will burn in gnu/hell for the rest of eternity.


I was with you up to this point, bash is not portable

I have yet to encounter a system where bash wasn't available, so yes I'd say it's quite portable.

You are right, though, it would be more consequental to run with vanilla bourne shell.

and it is a hideous shell.

Well, that's a holy war I'm not so interested in. In my opinion all shells are quite horrible. The idea is to pick the lowest common denominator and bash just happens to be the most popular of the bunch. Your chances of finding a working /bin/bash on any given system are still orders of magnitude higher than finding a working capistrano/fabric along with the corresponding ruby/python toolchain.

But again, I agree that if you're forced to deal with esoteric platforms then your chances of finding a working bourne shell are even higher than that.


> I have yet to encounter a system where bash wasn't available, so yes I'd say it's quite portable.

Most systems (other than linux and OS X) do not include bash by default, and not everyone is able or willing to install it (and its dependencies) just to run a silly shell script. Sh one the other hand is almost as universal as ed(1), and implements the most sane subset of bash anyway.

Other systems (eg., Plan 9) don't have bash available at all (this is considered by some as a feature), while bourne is supported, if only for backwards compatibility.

> In my opinion all shells are quite horrible.

Most mainstream shells are indeed horrible (don't get me started on csh), but [there are some quite sane shells out there](http://rc.cat-v.org).

> The idea is to pick the lowest common denominator and bash just happens to be the most popular of the bunch.

lowest common denominator != most popular of the bunch.

> Your chances of finding a working /bin/bash on any given system are still orders of magnitude higher than finding a working capistrano/fabric along with the corresponding ruby/python toolchain.

And your chances of finding a working /bin/sh on any given system are still orders of magnitude higher than finding a working /bin/bash


If you're distributing software to end users a packaging system is probably the way to go, but if you're looking to deploy code from development environments to staging and production servers then something like Fabric or Capistrano is the way to go.


Why? With the host packaging system I get versioned packages, through host management tools I can control which packages my production, staging, and development hosts should have, and I can describe my dependencies on host libraries and software through the host system's own tools.

Together with distribution systems like apt I can also significantly ease deployment.

I can see that executing some commands over a set of hosts at the same time could be useful, but doesn't sound like a killer feature for me.

As for deploying from staging to production servers, it sounds more tidy to build proper packages to deploy in staging and test before deploying the same packages to production.


I can see that executing some commands over a set of hosts at the same time could be useful

for h in $hosts; do ssh $h "my command"; done

As for deploying from staging to production servers, it sounds more tidy to build proper packages to deploy in staging and test before deploying the same packages to production.

Amen.


dsh is worth looking at. It's essentially a for loop that runs ssh, but it can also get named groups of machines from config files, run them in parallel, and prefix output lines with the machine they came from.


I took a brief look at virtualenv, but decided not to use it for some reason that currently escapes me. The solution that I came up with instead has been working out quite well so far...

My main package has a sub-package called "external". When I run manage.py, it immediately imports external. External's __init__.py adds a bunch of things to sys.path

Whenever I want to add a new package, I simply add the egg, or source, or whatever to the external folder, add an entry to the external/__init__.py and check it in. This process can even selectively load packages by platform with a simple if-statement. Now when I checkout a new enlistment from SVN, I immediately get the full set of dependencies at their exact versions.

Simple, but effective.

Thoughts?


Repetition leads to boredom, boredom to horrifying mistakes, horrifying mistakes to God-I-wish-I-was-still-bored

    It is by will alone I set my mind in motion.
    It is by the juice of sapho that thoughts acquire speed,
    The lips acquire stains.
    The stains become a warning.
    It is by will alone I set my mind in motion.
    -- David Lynch (Piter de Vries from Dune movie)

"It is caffeine alone that sets my mind in motion. It is through beans of java that thoughts acquire speed, that hands acquire shakes, that shakes become a warning... I am... IN CONTROL... OF MY ADDICTION!" -- From the Minicon Graffiti Wall, 1989


>" [deploying production code from local dev] always involves several steps like packaging up the source from you source code management system, putting the source in the correct place remotely, and the restating the remote web server. This can be very tedious by hand, especially for a couple of frequent, small changes. "

did i get this right? commit/push local code, up/fetch on server, reload web server.

am i deploing wrong or naively? you know some people honestly don't see the point in version control? yeah, i don't want to be that person.


I'm using Capistrano to deploy python code and works fine.

Can fabric get the lastest release from a git repo?


Yes - to the extent that fabric is just a framework for building shell deployment commands and could run the shell command.

I think there's a lot of (forgive me) synergy from pairing pip, virtualenv and fabric together though: pip can easily install a project into a virtualenv by checking it out of a source code repository (currently git, bzr, svn, hg I believe). Rebuilding the environment then is just a matter of "freezing" the current virtualenv, transferring the list to the host and building an env with the appropriate packages (down to revision #) that's a "put" and a "run" - two lines in a fabfile.

The article links to one of my blogs which links to the slides from talks I did on pip, virtualenv, and fabric for Baypiggies a few months ago: http://simeonfranklin.com/blog/2009/mar/28/baypiggies-presen... if you're interested...


Fabric is in a wild state of flux right now. I used it a bit -- then a bit more -- then I backed off entirely. It has promise, but it seems a bit young at the moment.


I keep meaning to use virtualenv. Good pointer to howto with modpython?


Don't use modpython. It's pretty obsolete.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: