The End of Bugs?

akeefer · on July 7, 2008

Having done TDD a bunch over the last five or so years, I couldn't ever go back to a world without extensive unit tests.

That said, there are interesting scaling problems with tests that I don't feel like many people seem to write about or know how to deal with; all the TDD books and sites describe techniques that work with simple, small code-bases but which often break down in the face of real-world problems. In particular, writing tests that only fail when the code is broken (and not just because the test is broken) is often incredibly difficult, and if you have (like we do) more than 40,000 unit tests, even a small false-positive rate like .1% will mean that a simple check-in could require hours of test rewriting not because the code is broken but because the tests are poorly written or make too many assumptions about implementation details. Even the best, most disciplined teams create some "bad" tests, and it doesn't take many to start killing you.

Similarly, in a real software environment where a code base lives for years and undergoes numerous changes, you often end up with stale tests that either no longer test anything useful or which enforce requirements that have since been changed.

So while the tests are invaluable and I can't imagine not having them, at the same time there are a lot of issues with test maintenance over the long term that become very difficult to deal with.

gruseom · on July 7, 2008

This and your other comment are obviously coming from real experience using these techniques on production systems. (In fact, they're so lucid that I went back and read all your previous HN comments and quite enjoyed them.) They are refreshing since most of the discourse on unit testing falls into binary for/against mode (q.v. the OP). I also agree with you about test maintenance. (That's one reason I believe we can do better than the object-model-plus-unit-tests approach.)

Here's one idea I've come to: test code and production code are different species that need to develop differently. Good production code is a well-factored, abstract machine. Good unit tests are concrete examples illustrating one thing each that are independent of one another. The same principles do not apply, much as a car's manual is not built like a car.

For example, I allow almost no duplicate code in production, but with unit tests my tolerance is much higher. Duplication is bad in either case, but trying to eliminate it from tests has negative consequences that are worse: it prevents them from being simple, self-contained, and independent. A good unit test reads like a story. There are many such stories you might want to tell about your system and typically there is lots of overlap between them, working out different permutations of the same thing and so on (imagine an enormously complicated Venn diagram). Production code is not a story at all (except maybe inside the occasional well-defined function), it's a set of abstractions.

When people treat tests like production code and try to factor out the common bits, they end up creating a whole new set of abstractions that sits between the production and tests. This layer soon becomes a sink for all kinds of bloat (ObjectMotherFactoryManagers, anyone?), eventually so thick that you can't even see the production code from the tests. You spend hours tracking down test failures only to find that the problem was in this test infrastructure. Regardless of what one thinks about unit testing in general, this clearly isn't a good way to scale it.

Here's an analogy that popped into my head one day when helping some people work on a large system that had this problem. In the 1920s it was fashionable to do formal analyses of fairy tales. People discovered all kinds of common patterns and overlaps ("Once upon a time", things come in threes, etc.). Yet each tale is also unique. Now imagine if someone said: All this duplication is bad. Instead of duplicating "Once upon a time" and "In a deep forest there lived...", let's refer the reader to a "setup" section. Of course we'll need to provide some parameters (how many boys, how many girls, etc) because they're not quite identical, only mostly. And witches always do such-and-such, so let's factor out a Witch and then we'll only have to write each witchy action once and can refer to these from different stories. My point is that, if you did this to a book of fairy tales, you'd no longer have a storybook. You'd have a weird assortment of meta fairy tale bits. Critically, you could no longer pick it up and read a chapter at a time. In fact, a person coming in cold would have trouble reading any of it.

akeefer · on July 7, 2008

Indeed, there's a real tension there between treating test as "real" code and letting them be separate. You end up getting bitten both ways, and we've gotten hit by all of them. Things can't all work the same, since if TestA and TestB share a common setup routine or helper and TestC comes along and needs to tweak that, you don't want it to hose TestA and TestB. On the other hand, if you have 1 production implementation of some interface and 50 different test mocks, you're in huge trouble if you need to change that interface, and life will be better if the tests either use the real implementation or all share a single mock (though even then there's a danger in the mock diverging from the main interface).

There's a similar problem with tests using their own code paths to set things up instead of the normal system path. For example, suppose you're testing a method that takes a User object. In your production system, users might only be created in a certain way, they might have certain fields required, they might have sub-objects attached to them, the database might enforce nullability or fk constraints, there might be insert callbacks in the code or in the DB, etc. If you try to just hack up a User in memory and pass it to the test method outside of the normal creation path, your test User object might differ in important ways from your actual User objects in production. So changes in the production code, for example assuming a field isn't null or relying on a default value, might cause the tests to break erroneously even though the app is fine. Then you have to find all the places you create Users in tests and fix them, or try to centralize things so you only have to fix them in one place.

Some of those problems can be partially avoided by proper decomposition and decoupling of the code, though oftentimes you have to have the foresight to do that before the tests get out of hand (and having the tests get out of hand is a good canary in the coal mine that your code is in trouble).

We actually went all the way to one extreme whereby we run our tests in the production environment as much as we can and avoid mocks and stubs; we're a Java servlet app (kinda the only way to do enterprise software these days, unfortunately), so we start up a Jetty server, an H2 database in memory, and go through the normal server-startup sequence before running most unit tests, which at least eliminates the test setup problem and a lot of the test/production divergence. It comes at a huge cost in terms of local test execution times, unfortunately.

richcollins · on July 7, 2008

Writing specs that are incorrect but pass are also bugs. I'm surprised he hasn't run into this situation.

jdale27 · on July 7, 2008

http://en.wikipedia.org/wiki/No_Silver_Bullet

silentbicycle · on July 7, 2008

The distinction between accidental complexity and essential complexity is very relevant here. Test-driven design is not a perfect solution ("The End of Bugs" is pretty sensationalized), but it's good at reigning in many kinds of bugs in dynamically typed languages, much like the static type systems in Haskell or OCaml.

Sure, it won't prevent all bugs, but I wouldn't write off testing so quickly. Consider Proebsting's Law. (http://research.microsoft.com/~toddpro/papers/law.htm)

(I'm responding to anti-testing backlash in general; I'm not clear what position, if any, you're making by just dropping in the link. Also, akeefer makes some really good points here.)

ntoshev · on July 7, 2008

You can't do exploratory programming writing unit tests first, can you?

akeefer · on July 7, 2008

My experience is that it's painful if you don't have any idea of what you're doing; it's not the testing itself per se but rather the fact that if you're iterating rapidly on the code your tests will need to be rewritten over and over again. Large libraries of unit tests are invaluable for catching regressions, but they tend to impose a certain amount of friction on code changes and refactorings. So personally, I tend to do TDD when I know where I'm going, but for exploratory programming I write the minimal amount of high-level tests to test things end-to-end and make sure things kind of work at all and back-fill more detailed tests once the code has settled down a bit.

gruseom · on July 7, 2008

You can. In a way, writing tests first is all about exploring. You're only specifying what you're trying to achieve in the next little while. The test is a device to tell you when you've achieved it. Then you move to the next step. What you're trying to do overall still evolves.

There is a tradeoff. You end up with a longer program that does the same thing - that's bad. On the other hand it works better, and you're likelier to know if you break something - that's good. People argue about whether the bad exceeds the good or vice versa, but surely that can't be answered all one way or the other.

My experience is that this style of programming is well suited for the OO environments that it emerged from. OO object graphs get complex very quickly and you need a device for managing side effects (i.e. unexpected breakage). Unit tests, specifically TDD, are the best currently known device for that. But to me a better approach is to program functionally in a way that eschews (most) side effects and to test one's code by evaluating expressions in a REPL. That gives most of the benefits of unit testing in a much lighter-weight form. I still write the occasional unit test, but only in targeted places where the code has to do something tricky and I want to keep a record of good examples. All I do in that case is put the expressions I've been playing with in the REPL into a function somewhere.

bkovitz · on July 7, 2008

I'm a TDD fanatic, but I abandon unit tests when writing exploratory code. The whole point when exploring is just to try ideas and follow inspirations. Unit testing forces you to stop every couple of minutes to think things through carefully and lock in what you've got. It messes up the exploratory flow.

greyman · on July 7, 2008

Please can someone explain more about "BDD" here?

ntoshev · on July 7, 2008

http://en.wikipedia.org/wiki/Behavior_driven_development

Raphael · on July 8, 2008

The whole time I read the post I was thinking, "Why didn't I install BDD earlier?"

wallflower · on July 7, 2008

Nice to hear that BDD/TDD is doing well for you. I love RSpec's stories. Unfortunately in a land where using the Spring framework is a big risk (e.g. corporate software development), BDD is not going to be widely adopted, let alone seriously considered. Startups are for those who want to expand their technological comfort zone. Average companies in the other hand, they're fine with remaining average.

gruseom · on July 7, 2008

One hears the exact opposite argument too, which is that these techniques are suited for large/average teams in corporate environments and not at all for startups where people are talented and have to work quickly.

culley · on July 8, 2008

I work in a large corporate manufacturing environment that runs 24 x 7 with lots of money at stake, and we are actively moving to TDD/BDD.

I expected a much bigger fight from management, but TDD really aligns with the "Toyota Way"... not that I work for Toyota.

aditya · on July 7, 2008

Can someone point out any compelling reasons to use RSpec over Test::Unit?

tyler · on July 7, 2008

I've used both to a limited degree. The only thing I can say is that RSpec felt more "natural". I suppose it's up to you whether feeling natural is compelling or not.

alice · on July 7, 2008