Repository Structure and Python

thristian · on April 17, 2012

That's pretty much what I do, but I disagree about putting the tests in their own top-level directory. Instead, each package foo in my projects has a sub-package foo.test that contains tests for all the code in the parent package:

    foo/
     +- test/
     |   +- __init__.py
     |   +- test_bar.py
     |   +- test-qux.py
     +- __init__.py
     +- bar.py
     +- qux.py

This means that test-code is automatically part of the package, so you can easily run your tests in production (assuming they don't depend on services or modules only installed on your development machines) to check things during a rollout. Also, because the tests and the code are part of the same top-level Python project, you don't have to mess around with setting Python's import path to get the tests running; one "export PYTHONPATH=$PWD" in your terminal will get code and tests working in one go.

mapleoin · on April 17, 2012

I disagree. If your project ever gets packaged by a distribution, they will want to split the tests (and docs) into a separate sub-package with different package requirements or even discard them altogether. This will make that job a little difficult. And not only distributions do this. There are organisations who choose to package and deploy projects inside rpms or even bash scripts. Keeping the tests separate from the code just makes things a lot clearer and easier.

Tests that need to be run in production are probably functional or integration tests (Although even those should probably be run on a test system, not the actual production server). These should be separate from unit tests. Will you then have a whole tree of test subdirectories in each of your python submodules? test/unit/py, test/functional/py etc.

pyre · on April 17, 2012

  > so you can easily run your tests in production

I would be wary about running unit tests on my production box. More specifically unit tests that muck with the database. Sure you might have it setup to point to a test database, but is it really worth it to run the tests there on the off-change that the config gets screwed up and your unittests mess up your production data?

  > Also, because the tests and the code are part of
  > the same top-level Python project, you don't have
  > to mess around with setting Python's import path
  > to get the tests running

Within the source directory, there is no need to set $PYTHONPATH. You can just run:

  python setup.py test

You just have to point setup.py at your test packages. Note that this is only for the default unittest tests. I don't have experience with pytest or nose.

damagednoob · on April 17, 2012

> More specifically unit tests that muck with the database.

If you're doing a cross-process call then IMO they're no longer unit tests and have become integration tests. Unit tests test small pieces of code for validity and should run fast. You can't do that if you're hitting the database.

pyre · on April 17, 2012

Then how do you 'unit test' the validity of something that is supposed to make changes to the database? How is this any different than 'unit test'ing something that is supposed to write to a file?

286c8cb04bda · on April 17, 2012

For database testing you'd mock[1] the connection in some way. I've done it before by swapping in an SQLite database with some sample data.

[1] http://en.wikipedia.org/wiki/Mock_object

pyre · on April 17, 2012

How is swapping in a SQLite database any different than separating development / testing / production databases? At work we use a test database that is schema-only. All tests populate the data during setup, and rollback on tear down.

The fear of running unit tests in production is that someone screwed up the testing configuration and your database writes are no longer happening on a mocked SQLite connection, but hitting the live database. Oops!

286c8cb04bda · on April 19, 2012

> How is swapping in a SQLite database any different than separating development / testing / production databases? At work we use a test database that is schema-only. All tests populate the data during setup, and rollback on tear down.

It's better for me because it works when I'm on an airplane, that's all. If you do all of your development where the test database is accessible, then there's little to no benefit.

> The fear of running unit tests in production is that someone screwed up the testing configuration and your database writes are no longer happening on a mocked SQLite connection, but hitting the live database. Oops!

I Agree. Unit tests should never be run in or near production systems. The whole purpose of the tests is predicated on the assumption that your code is broken. Do you really want to bring broken code near customer data?

thristian · on April 17, 2012

I would be wary about running unit tests on my production box. More specifically unit tests that muck with the database.

Certainly! There's a lot of code that doesn't work with persistent storage, though, or can easily have its side-effects limited to some safe space like files in /tmp.

You can just run "python setup.py test"

Is that a thing that the standard distutils package supports, or is it an extension from setuputils/distribute?

mapleoin · on April 17, 2012

It's standard.

rdw · on April 17, 2012

One of the things that I've seen bother people is that the source code ends up in a directory named PROJNAME/PROJNAME. Once you understand how the module system works in Python, it makes sense that things are this way, but it can be kind of a redundancy shock.

llambda · on April 17, 2012

It's important to realize that the outermost dir is not generally the dir containing the package. In other words, the outermost dir is usually the dir that contains the project files, which also includes the package dir itself.

So for instance you might have some necessary build utilities, such as setup.py, a README, perhaps some tests, as well as a dir containing the actual package. Thought of like this, there's no actual redundancy, but it definitely is confusing until you make that connection.

Here's a simple example:

  project/
    README
    setup.py
    test_project.py
    project/
       __init__.py
       project.py

Here project/project/ is the actual package, everything above that dir is simply ancillary to the actual package itself.

webjunkie · on April 17, 2012

For Django I like to do

    django-useful-app/
      setup.py
      ...
      useful_app/
        __init.py
        ..

Aethaeryn · on April 17, 2012

Github can make it seem even more redundant if you have an organization with the same name as your repository. For instance, django is essentially django/django/django when you think about the structure[1].

[1] https://github.com/django/django/tree/master/django

tbatterii · on April 18, 2012

Ultimately.... https://github.com/github

ineedtosleep · on April 17, 2012

Experienced this exact thing after the Django 1.4 release, however, it makes sense now and it is not much of a nuisance any longer.

socratic · on April 17, 2012

Is there a standard set of tools that are being advocated here, in addition to the repository structure?

I've been mostly working with Rails lately, and I'd like to continue using tests, mocks/stubs, and sensible build rules, but I'm not sure what the preferred Python tools are. What's the best test::unit equivalent? What's the best way to mock an object? Do people really use Makefiles rather than something else (like SCons)? Is there some way to use virtualenvwrapper without bash (e.g., M-x eshell)?

kroger · on April 17, 2012

I use Make because it's fast, simple, always available, and bash has autocompletion for it, so I can type "make t<TAB>" to run my tests, for instance (or "make <TAB>" so see what commands I have available). I use make in my python projects for things like running tests, coverage, building the documentation (sphinx) and removing .pyc files.

For deployment I use fabric, but I have Make targets for the most used commands (again, it's nice to have completion). For example, these are two targets to deploy to my server and to my test machine:

    server-deploy:
            fab -f deployment/fabfile.py prod deploy

    test-deploy:
            fab -f deployment/fabfile.py test deploy

I use either pytest or nosetests to run my tests, mainly to have better and colored output.

I don't think you can use virtualenv(wrapper) without bash, but you can use use it with M-x ansi-term. But I got tired of trying to config emacs to run Python the way I wanted and now I edit my code an emacs and run the code in IPython in the terminal. Ipython's autoreload [0] is a huge help.

[0] http://ipython.org/ipython-doc/dev/config/extensions/autorel...

kenneth_reitz · on April 18, 2012

    export PYTHONDONTWRITEBYTECODE=1

:)

mapleoin · on April 17, 2012

Why do you remove .pyc files?

Pewpewarrows · on April 17, 2012

When doing a large refactoring or removing modules all-together, leftover .pyc or .pyo files can cause some very hard-to-identify import errors.

herge · on April 17, 2012

Make is better than SCons if you are not compiling C. Lots of people would use fabric instead. The stock unittest in the stdlib is quite good, especially in python 2.7+.

For mocks, wait until mock (http://www.voidspace.org.uk/python/mock/) is in the stdlib or download it yourself.

tildedave · on April 17, 2012

If you are on Python pre-2.7, the unittest2 (http://pypi.python.org/pypi/unittest2) library backports most of the unittest 2.7 features.

SoftwareMaven · on April 17, 2012

All activating virtualenv does is set up a bunch of environment settings (PATH, PYTHONPATH, etc). They provide activators for bash and csh; it wouldn't be hard to set up the environment for M-x eshell, even if it was a little more manual process.

skrebbel · on April 17, 2012

I find it very hard to figure out the context of this article. First, what kind of repositories? After a while, I settled on "source code repositories". Now, given that a lot of importance is given to readmes, maybe we're talking about Github repositories? Surely, the repositories at my company don't have readmes, because they don't have something like a Github. When you join a team, the rest of the team is the readme.

And so on, and so forth. Please, guys, if you write a blog post, add some context!

nikcub · on April 17, 2012

> Now, given that a lot of importance is given to readmes

README files are very, very old. As old as packaging software. To give you an idea of just how old they are, the file name is in all-capitals because when they were first used there were no lowercase letters in file names.

The file would be called README and it would be referred to as a 'README' in docs. We just kept it as all-uppercase with modern operating systems with lower-case characters because being uppercase kept them at the top of directory listings.

GitHub picked it up as the default description page because these files became standards in code packages, definitely didn't happen the other way around - with README becoming more popular with GitHub!

nknight · on April 17, 2012

> what kind of repositories

There's only one "kind" of repository generally referred to in programming topics.

> Surely, the repositories at my company don't have readmes

Good lord, why not?! Mine do!

> When you join a team, the rest of the team is the readme.

What nonsense. That is a horrible, inefficient use of resources. Stop it now, you're wasting time and, hence, money. There is no reason to keep walking new developers through the same steps and answering the same questions over and over again. Write a README! And a HACKING!

calpaterson · on April 17, 2012

There is a common OO pattern called a Repository (you might know it as a DAO).

gbog · on April 17, 2012

Overall these advice make sense, but the relative import

   from .context import sample

is better avoided. Why not this?

   from tests.context import sample

zaptheimpaler · on April 17, 2012

Is there any evidence to the below the fold mantra working on GitHub as well? IMHO, looking at a bunch of folders doesn't tell me anything, so at the very least you have to scroll down and read the README if you have any interest in a project .

In fact, if people were to standardize their repo's folder structure to match a template, that would tell me even less about the project. Seems counterproductive to me.

mapleoin · on April 17, 2012

It's not meant to tell you anything about the project. It's meant to stay out of your way and give you instant access to the areas you're interested in: code, docs, tests/example code etc.

mparncutt · on April 17, 2012

I agree regarding the Django tip. Does anyone know why manage.py was moved outside of the main project directory?

kenneth_reitz · on April 17, 2012

It wasn't, the generated package is now wrapped in a directory. Before, it wasn't.

benatkin · on April 17, 2012

I agree regarding the license. It is much easier to know to always check LICENSE, rather than scanning the bottom of the README, the source, or the root directory (for a file with another name, like MIT-LICENSE).

tocomment · on April 17, 2012

What's she talking about with Django 1.4? Does it duplicate the directories for some reason?

tbatterii · on April 17, 2012

some pretty decent suggestions.

another option is buildout(http://buildout.org) for more complex systems. it really makes it easy to go from clone repo, to running/testing/debugging/hacking