Hacker News new | past | comments | ask | show | jobs | submit login
The road to faster tests (37signals.com)
87 points by duck on Jan 19, 2011 | hide | past | favorite | 35 comments



2x speed improvements by getting better memory management. It's astounding how good Ruby feels to program in, as well as how much the webapp bounce cycle can hide significant infrastructure problems.


Ruby’s Test::Unit library (which we use for Basecamp’s tests) creates a new instance of the test class for each test. Not only that, but it holds onto each of those instances, so if your test suite includes thousands of tests, then the test suite will have thousands of TestCase objects.

On the Java side, JUnit does exactly the same thing. However I usually see this manifest itself as out of memory errors rather than a time sink. Over the years I have gotten some VERY strange looks from clients when I null out instance variables in my tear down methods. Usually takes a couple of hours of explanation to gain some level of acceptance, if not understanding.


What are the benefits of holding onto each instance?


I don't believe Test::Unit does this intentionally; it's just a side-effect of the implementation (load all tests into an array, and iterate over the array).


JUnit stores all of the meta data about each test case in the test instance itself, so all of the reporting is based off of these classes. It has always puzzled me why it was done this way, why not just store a collection of "meta-data" objects and report off of them (allowing the test instances to be garbage collected,) but I assume it would break most of reporting the plugins/tools to change it now.

Note this is for the older 3.x versions of JUnit, I'm not 100% this is still true for the 4.x line.


As an aside, it would be fairly simple to decrease the time taken for a unit test suite to run by switching to incremental testing--that is, by not running tests for code that hasn't changed.

While I've never implemented something like this, I imagine it would be straightforward for compiled languages if you have a good linker:

1) compile each unit test as a statically linked executable

2) make sure that each unit test outputs to an individual file; e.g. ./test_a >> test_results_a.xml

3) jigger your build process such that the linker removes dead and unused code from each test executable. Now each test executable contains code that is directly used by the test.

4) ensure that your build process only touches a statically linked test if a dependency has actually changed. This comes for free most of the time, but you may still have the linker touching files that it doesn't have to. You can remove these cases by staging the test files using a binary comparitor tool like rsync. If you have a crappy linker, or there's something in your test executables which is always changing (like a build stamp) you might need to compare using more advanced tools (like binutils) or something like Google's Courgette.

5) Run your tests only when the test is newer than the results--that is, when test_a is newer than test_a_result.xml. This is straightforward to implement using most build systems, like make, or using a test runner script.

Bam--now a test only runs if some dependent code has changed. If your test also uses data, you should list it as a dependency in your build system--either manually or by detecting open() calls at runtime via a shim over libc, or via strace. Your test run takes much less time, and, best of all, developers get a lot less spam to read through. This is perhaps a much greater benefit than increased speed.

Dynamic languages are a much harder nut to crack, as are integration tests that are loosely bound to the code.


I use autotest for this - it notices files being altered and runs relevant parts of your suite. If something goes wrong it'll keep trying that test everytime you save a file, until it works. I think it's fantastic.


Do you have a link to autotest? I run something similar. My ruby projects are all in git. I have a git pre-commit hook that when file X is modified before allowing a commit will run the test for X and only allow me to commit if the tests for X passes. This happens with zero work on my part forcing me to never forget to run the tests. While it doesn't run all of the tests just running the associated test catches a very larger number of accidental test failures for very little cost. Lastly because the tests are being run all of the time if one suddenly takes a long time to run I quickly profile it and speed it up. And --no-verify is always there to bypass the hook if need be.


How do you handle the case where file X uses file Y, and you commit file Y? You could run the test for X when Y is committed in case Y broke X.


For most of my projects I don't. Some of them have some rules listing test x depends upon y, but for the vast majority of my projects modify file X and only test X is run. This catches a huge non-insignificant amount of regressions. Really it comes down to effort. I can either A) try to remember to always run tests before committing or B) Have a basic hook that takes minutes to write/install that will always run at least one test and catch near all regressions.

I tried to do A for years. Most of the time you run the tests, but not always. And heaven help you if you are on a team. There will be someone who never runs tests. And you will end up having to schedule a chunk of time for regression fixing before every release. So to answer you question who cares about when file X uses file Y.



AS an experiment I just implemented the GC tweaks for our test suite (we're using RSpec), it gave us a 42% improvement in run time...

  Before: 914 seconds
  After:  538 seconds
Awesome! Thanks to Jamis for sharing!


Would it not be possible to track which lines and branches get executed by each test (normally you'd track this for your code coverage report, yes?), then have your IDE only re-run the tests which would have been affected by any changes you've made?

You'd still want to run the full-suite when you commit, but it seems like a possible sanity check. Unless I'm overlooking something :)


Yes -- in Ruby the tool for that is `autotest`. It watches your project directory and automatically runs relevant tests when you change application code or tests.


Autotest has been the single biggest instigator in me adopting a more rigorous testing procedure for AR than I have previously used. The number of bugs it catches practically before I leave the method I'm tweaking is astounding.

(I mostly do unit testing, but also do some testing against APIs to ensure that I'm passing them data in the format they expect. One of my external service providers had downtime yesterday, and autotest caught it about six hours before they did. I wasn't successful in waking them up, but at least I was able to push a hotfix to AR to minimize inconvenience while I waited for their API to come back.)

[Edit: I might consider, if tests were taking that long, spinning up a fleet of VPSes to munch through them for me, maybe with a continuous integration server. Unit tests should be disgustingly parallel, since they're small units of work and shouldn't depend on each other. That means if one Macbook can execute the suite in 15 minutes then you should be able to have 30 Macbook-equivalents running either in the server room or the cloud munch through the same test suite in 30 seconds, give or take a little startup time.)


We do have a CI server, and as you said it works well for catching failing tests. However, it requires that you commit and push your changes in order to test them, which means you are effectively publishing untested changes to your entire team. The same for any kind of distributed testing, unless you are using a shared volume to host your sandbox.

I'm running a Mac Pro with 8 cores, so there is a fair bit of parallelization I can do locally too. Unfortunately, the tests all depend on the database, and while I can certainly use tools like deep-test to spin up separate DB's for each worker, I've found that doing so adds a full 60 seconds to the test run. I fear that until we eliminate the database from (most of) our tests, super-fast runs will continue to elude us.

CI and distributed tests are good things, no question, but I'm still looking for ways to make it possible to run my tests locally in TDD-fashion. I'm far from out of ideas, it's just a matter of making time to experiment.


This is totally a personal/team comfort question, but is there any reason why you can't have two remotes? "git push jamistest" might use a few bits on a spinning platter somewhere, but that is cheap, and there is no reason your team has to see it if you don't push it to the master repo, any more than they see changes you keep on your local repo.


Aside from me simply wanting to be able to quickly run my tests locally, you mean? :) Mostly it's just an issue of configuring that so it works for all the programmers. Each would need their own remote, and each would need to be hooked into CI. Definitely possible, it just hasn't been a priority.


Most of the time you need only some of your tests (the ones you're working on currently). Then, preloading the framework becomes the real bottleneck. But it's solvable too, via spork (I believe, there is spork-testunit, but I never tried it).

Most of my coding is done in minute loops (add test / watch tests fail / add couple of lines / watch tests succeed). YMMV.


spork is friggen awesome, and if you are on MRI/Yarv you really should use it.

http://spork.rubyforge.org/


Hey Jamis,

Your CI server should be able to accept a patchset and run it without committing it. TeamCity does this (or roll your own CI server can do this too!).


There's actually a gem for your edit: https://github.com/jasonm/parallel_specs

Then there's hydra: https://github.com/ngauthier/hydra

And the excellent presentation on speeding up your tests with hydra that I can't find. It was one of those showoff presentations on GitHub somewhere...


I believe you're looking for the presentation "Grease Your Suite" hosted here on Heroku: http://grease-your-suite.heroku.com/



I cannot recommend this video enough.

It is a step by step guide with code showing how the guy reduced his suite's running time from half an hour to 15 seconds.

A lot of it is preventing so many db objects like in the OP, but it also discusses parallelizing the tests using hydra to make use of multicore. This ties into the db access thing as well though as to do parallell tests you have to make all the cases set up and tear down their own fixtures transactionally. Not doing this is a leading cause of slow tests anyway.

He even has metrics for stuff like running the whole suite off a ramdisk, the effects of SSDs, etc etc. He worked on this at his company for weeks/months apparently.

It's too hard to explain here, just watch that video if you do ANY ruby testing!


Absolutely! Thank you!


I don't know if it's gotten more sophisticated recently, but the last time I played with autotest it had a very naive concept of "relevant tests". Certainly not the sort of runtime analysis the parent describes.

(Also, on our moderately large codebase it noticeably impaired the performance of my whole machine because of the way it constantly scanned the entire directory tree. I know there was some talk of adding edge triggering via kqueue or fsevents, but no one had done it at that time.)


Autotest depends on the rules you use.

Find autotest/discovery.rb for your project to see something like:

    Autotest.add_discovery { "Rspec2" }
    Autotest.add_discovery { "Rails" }
    # both of the frameworks have their predefined rulesets
No one prevents you from adding your own discovery rules.

Also you're looking for autotest-fsevent if you're using a mac. Probably something similar exists for linux.


Autotest sounds like a great tool. Is there a Python equivalent?


I haven't used it, but sniffer sounds like it fits the bill.

http://pypi.python.org/pypi/sniffer/0.2.2


For a sufficiently large project, just starting autotest can be a drag. For projects with large suite, I think the only way which does not have diminishing returns is parallelizing the test suite.


You can safely do this as long as your tests are fully deterministic. If they use inputs which change over time, like a system clock, or a database, or share mutable state between threads, then you can't be sure that the code coverage does not change from run to run.

There's more to it than simply checking each line with a coverage tool--you also need to rerun tests when any implicitly referenced global data, strings, input files, command line parameters, etc. change, including calls to third party libraries for which the source is not available. Usually a build system and compiler can track some of these dependencies already, so you may be able to leverage those.

As long as you occasionally run the full test suite, though, you may not care to cover these cases for your incremental tests.


It is important to remember that you are changing the code (and probably the tests) at the same time, so your coverage data is getting stale. So you can only "mostly" predict which tests to run, which is somewhat useful but can also make these cut down runs misleading.


My team (back at MSFT) did this on checkin. It'd look at the class you changed and what tests had code coverage that touched that class, and rerun just those tests before allowing the commit. Post-commit, the rolling builds would update the code coverage for all of the different build flavors (debug, release, etc).

One nice side-effect of this support is that it also gave us code coverage deltas from the last build, so the system would send out embarrassing numbers with your commit if you'd dropped the overall coverage of the module.


Interesting. Hopefully someone (preferably someone that already knows the library code) extracts a fix so that stuff like this don't happen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: