The normal practice for large scale codebases in complex domains is "the code is the spec". That is, the only specification for how the system should work is how it worked yesterday.
In that case, unit tests serve as a great specification. Even tests that just duplicate the business code under test and assert that it's the same (A huge waste in normal cases) is useful. Because a Unit test is much better than a word document in describing how code should behave. I very much prefer a massive, hard to maintain set of poorly written unit tests, to a large number of outdated documents describing how every bit of the system should work.
So is a big/bad test suite a burden? Sure. But is it a burden compared to maintaining specifications of other kinds? Is it a burden compared to working in a system with neither type of specification?
Further, the people writing long articles like this are (or at least were) very good developers. There is often an element of "good developers write good code, so just use good developers" in them. But writing good software with good developers was never the problem. The problem is making software that isn't terrible, with developers ranging from good to terrible, and most being mediocre.
The article addresses your first point in a big section:
1.4 The Belief that Tests are Smarter than Code
Telegraphs Latent Fear or a Bad Process
Your other point about only good programmers not needing unit tests is moot as you haven't followed it through to the conclusion. Namely, if bad programmers write bad code that needs unit tests, they're also going to write bad unit tests that don't test the code correctly. So what's the point?
> Namely, if bad programmers write bad code that needs unit tests, they're also going to write bad unit tests that don't test the code correctly. So what's the point?
Let's assume you hire good programmers, because otherwise you're doomed. But oftentimes, the "good" programmer and the "bad" programmer are the same person, six months apart:
1. John writes some good code, with good integration tests and good unit tests. He understands the code base. When he deploys his code, he finds a couple of bugs and adds regression tests.
2. Six months later, John needs to work on his code again to replace a low-level module. He's forgotten a lot of details. He makes some changes, but he's forgotten some corner cases. The tests fail, showing him what needs to be fixed.
3. A year later, John is busy on another project, and Jane needs to take over John's code and make significant changes. Jane's an awesome developer, but she just got dropped into 20,000 lines of unfamiliar code. The tests will help ensure she doesn't break too much.
Also, unit tests (and specifically TDD) can offer two additional advantages:
1. They encourage you to design your APIs before implementing them, making APIs a bit more pleasant and easier to use in isolation.
2. The "red-green-refactor-repeat" loop is almost like the "reward loop" in a video game. By offering small goals and frequent victories, it makes it easier to keep productivity high for hours at a time.
Sometimes you can get away without tests: smaller projects, smaller teams, statically-typed languages, and minimal maintenance can all help. But when things involve multiple good developers working for years, tests can really help.
Here's the crux of your argument - and I honestly think it's a flawed premise: The failure of unit tests indicates something other than "Something Changed".
Were the failing tests due to John/Jane's correctly coded changes, regressions, or bad code changes? The tests provide no meaningful insight into that - it's still ultimately up to the programmer to make that value judgement based on the understanding of what the code is supposed to do.
What happens is that John and Jane make a change, find the failing unit tests, and they deem the test failures as reasonable given the changes they were asked to make. They then change the tests to make them pass again. Again, the unit tests are providing no actual indication that their changes were the correct changes to make.
WRT the advantages:
1. "design your APIs before implementing them" - this only works out if we know our requirements ahead of time. Given our acknowledgement that these requirements are usually absent via Agile methodologies, this benefit typically vanishes with the first requirement change.
2. "makes it easier to keep productivity high for hours at a time" This tells me that we're rewarding the wrong thing: the creation of passing tests, not the creation of correct code. Those dopamine hits are pretty potent, agreed, but not useful.
>> Here's the crux of your argument - and I honestly think it's a flawed premise: The failure of unit tests indicates something other than "Something Changed".
Even if all you know is "something changed", that's valuable. Pre-existing unit tests can give you confidence that you understand the change you made. You may find an unexpected failure that alerts you to an interaction you didn't consider. Or maybe you'll see a pass where you expected failure, and have a mystery to solve. Or you may see that the tests are failing exactly how you expected they would. At least you have more information than "well, it compiled".
And if "the unit tests are providing no actual indication that their changes were the correct changes to make", even the staunchest proponents of TDD would probably advise you to delete those tests. If there's 1 thing most proponents and opponents of unit testing can agree on, it's probably that low-value tests are worse than none at all.
I have no idea if this describes you, but I've noticed a really unfortunate trend where experienced engineers decide to try out TDD, write low-value tests that become a drag on the project, and assume that their output is representative of the practice in general before they've put the time in to make it over the learning curve. People like to assume that skilled devs just inherently know how to write good unit tests, but testing is itself a skill that must be specifically cultivated.
Somebody could do the TDD movement a great big favour and write a TDD instruction for people who actually already know how to write tests.
There probably are good resources about this somewhere, but compared to the impression of "TDD promotes oceans of tiny, low-value tests" they lack visibility.
Agreed, testing is an art that most developers do not have. It must be cultivated and honed. I have been programming for over 35 years and developing tests is painstakingly difficult to cover all your bases.
> Here's the crux of your argument - and I honestly think it's a flawed premise: The failure of unit tests indicates something other than "Something Changed".
No, the failing tests indicate something changed, and where it changed - what external behavior of a function or class changed. Is the change the right thing, or a bug? Don't know, but it tells you where to look. That's miles better than "I hope this change doesn't break anything".
> "design your APIs before implementing them" - this only works out if we know our requirements ahead of time.
No. "API" here includes things as small as the public interface to a class, even if the class is never used by anything other than other classes in the module. You have some idea of the requirements for the class at the time when you're writing the class; otherwise, you have no idea what to write! But writing the test makes you think like a user of that class, not like the author of that class. That gives you a chance to see places where the public interface is awkward - places that you wouldn't see as the class author.
> This tells me that we're rewarding the wrong thing: the creation of passing tests, not the creation of correct code.
We're rewarding the creation of provably working code. I fail to see how that's "the wrong thing".
If we were discussing integration tests, where the interactions between different methods and modules are validated against the input - I'd agree.
But this is about unit tests, in which case the "where" is limited to the method you're modifying, since it's most likely mocked out in other methods and modules to avoid tight coupling.
> includes things as small as the public interface to a class
If we're talking about internal APIs as well, we can't forget that canonical unit testing and TDD frequently requires monkey patching and dependency injection, which can make for some really nasty internal APIs.
> I fail to see how [provably working code is] "the wrong thing".
So, thinking back a bit, I can recall someone showing me TDD, and they gave me the classical example of "how to TDD": Write your first test - that you can call the function. Now test that it returns a number (we were using Python). Now test that it accepts two numbers in. Now test that the output is the sum of the inputs. Congratulations, you're done!
Except you're not, not really. What happens when maxint is one of your parameters? minint? 0? 1? -1? maxuint? A float? A float near to, but not quite 0? infinity? -infinity?
Provably working code is meaningless. Code that can be proven to meet the functionality required of it - that's what you really want. But that's hard to encapsulate in a slogan (like "Red, Green, Refactor"), and harder use as a source of quick dopamine hits.
> But this is about unit tests, in which case the "where" is limited to the method you're modifying...
Sure, but the consequences may not be. Here's a class where you have two public functions, A and B. Both call an internal function, C. You're trying to change the behavior of A because of a bug, or a requirement change, or whatever. In the process, you have to change C. The unit tests show that B is now broken. Sure, it's the change to C that is at fault, but without the tests, it's easy to not think of checking B. That's the "where" that the tests point you to.
> Provably working code is meaningless. Code that can be proven to meet the functionality required of it - that's what you really want.
Um, yeah, of course that's what you really want. Maybe you should write your tests for that, and not for stuff you don't want? If you don't care about maxint (and you're pretty sure you're never going to), don't write a test for maxint. But it might be worth taking a minute to think about whether you actually do need to handle it (and therefore test for it).
> canonical unit testing and TDD frequently requires monkey patching and dependency injection
This is probably the root of most of our disagreement. I belong to the school of thought that says (in most cases) "Mocking is a Code Smell": https://medium.com/javascript-scene/mocking-is-a-code-smell-... And dependency injection (especially at the unit test level) is giant warning sign that you need to reconsider your entire architecture.
Nearly all unit tests should look like one of:
- "I call this function with these arguments, and I get this result."
- "I construct a nice a little object in isolation, mess with it briefly, and here's what I expect to happen."
At least 50% of the junior developers I've mentored can learn to do this tastefully and productively.
But if you need to install 15 monkey patches and fire up a monster dependency injection framework, something has gone very wrong.
But this school of thought also implies that most unit tests have something in common with "integration" tests—they test a function or a class from the "outside," but that function or class may (as an implementation detail) call other functions or classes. As long as it's not part of the public API, it doesn't need to be mocked. And anything which does need to be mocked should be kept away from the core code, which should be relatively "pure" in a functional sense.
This is more of an old-school approach. I learned my TDD back in the days of Kent Beck and "eXtreme Programming Explained", not from some agile consultant.
I feel like you're perhaps defining unit tests to exclude "useful unit tests" by relabeling them. Yes, if you exclude all the useful tests from unit testing, unit testing is useless. Does a unit test suddenly become a regression test if the method it tests contains, say, an unmocked sprintf - which has subtle variations in behavior depending on which standard library you link against? No true ~~scotsman~~ unit test would have conditions that would only be likely to fail in the case of an actual bug?
> But this is about unit tests, in which case the "where" is limited to the method you're modifying
That's still useful. C++ build cycles mean I could have easily touched 20 methods before I have an executable that lets me run my unit tests, telling me which 3 of the 20 I fucked up in is useful. Renaming a single member variable could easily affect that many methods, and refactoring tools are often not perfectly reliable.
Speaking of refactoring, I'm doing pure refactoring decently often - I might try to simplify a method for readability before I make behavior changes to it. Any changes to behavior in this context are unintentional - and if they occur, 9 times out of 10 discover I have a legitimate bug. Even pretty terrible unit tests, written by a monkey just looking to increase code coverage and score that dopamine hit can help here - to say nothing of unit tests that rise to the level of being mediocre.
Further, "where" is not limited to "the method you're modifying". I'm often doing multiplatform work - understanding that "where" is in method M... on platform X in build configuration Y is extremely useful. Even garbage unit tests with no sense of "correctness" to them are again useful here - they least tell me I've got an (almost certainly undesirable) inconsistency in my platform abstractions, likely to lead to platform specific bugs (because reliant code is often initially tested on only one of those platforms for expediency). This lets me eliminate those inconsistencies at my earliest convenience. When writing the code to abstract away platform specific details, these inconsistencies are quite common.
I recently started maintaining a decades old PHP application. I have been selectively placing in unit and functional tests as I go through the massive codebase. Whenever I refactor something I've touched before, I tend to get failing tests. If nothing else, this tells me that some other component I worked on previously is using this module and I should look into how these changes affect it.
Unfortunately, the existing architecture doesn't make dependencies obvious. So simply knowing that "something has changed" is very, very helpful.
I appreciate your answer, but the destructiveness of the reward loop is directly addressed in the PDF.
I also have no problems designing APIs, experience is what you need and the experience in asking the right questions. No amount of TDD will solve you getting half-way through a design and then finding you needed many-many because you misunderstood requirements.
Other than that API design is (fairly) trivial. I spend a tiny amount of my overall programming time on it. I will map out entire classes and components for large chunks of functionality without filling in any of the code in less than an hour or two and without really thinking about it. Then the hard part of writing the code begins which might take a month or two, and those data structures and API designs will barely change. I see my colleagues doing similar check-ins, un-fleshed components with the broad strokes mapped out.
I'd say it's similar to how we never talk about SQL normal forms any more. 15 years ago the boards would discuss it at length. Today no-one seems to. Why? Because we understand it and all the junior programmers just copy what we do (without realizing it took us a decade to get it right). API design was hard, but today it's not, we know what works and everyone uses it. We've learnt from Java Date and the vagaries in .net 1.0 or PHP's crazy param ordering and all the other difficult API designs from the past and now just copy what everyone else does.
> I appreciate your answer, but the destructiveness of the reward loop is directly addressed in the PDF.
I've gone back through most of the PDF, and I can't figure out what section you're referring to. There's a bunch of discussion of perverse incentives (mostly involved incompetent managers or painfully sloppy developers, who will fail with any methodology), but I don't see where the author addresses what I'm talking about.
Specifically, I'm talking about the hour-to-hour process of implementing complex features, and how tests can "lead" you through the implementation process. It's possible to ride that red-green-refactor loop for hours, deep in the zone. If a colleague interrupts me, no problem, I have a red test case waiting on my monitor as soon as I look back.
This "loop" is hard to teach, and it requires both skill and judgment. I've had mixed results teaching it to junior developers—some of them suddenly become massively productive, and others get lost writing reams of lousy tests. It certainly won't turn a terrible programmer into a good one.
If anything, the biggest drawback of this process is that it can suck me in for endless productive hours, keeping me going long after I should have taken a break. It's too much of a good thing. Sure, I've written some of the most beautiful and best-designed code in my life inside that loop. But I've also let it push me deep into "brain fry".
> No amount of TDD will solve you getting half-way through a design and then finding you needed many-many because you misunderstood requirements.
I've occasionally been lucky enough to work on projects where all the requirements could be known in advance. These tend to be either very short consulting projects, or things like "implement a compiler for language X."
But usually I work with startups and smaller companies. There's a constant process of discovery—nobody knows the right answer up front, because we have to invent it, in cooperation with paying customers. The idea that I can go ask a bunch of people what I need to build in 6 months, then spend weeks designing it, and months implementing it, is totally alien. We'd be out of business if it took us that long to try an idea for our customers.
It's possible to build good software under these conditions, with fairly clean code. But it takes skilled programmers with taste, competent management and a good process.
Testing (unit, integration, etc.) is a potentially useful part of that process, allowing you to adapt to changing requirements while minimizing regressions.
My point was that even bad unit tests are better than both good/bad word docs or no unit tests. So even bad tests written by bad programmers have a value. They have a large COST too (which is what he's arguing), my argument was merely that the value might not be less than the cost, which is his assertion.
> Software engineering research has shown that the most costeffective
> places to remove bugs are during the transition from
> analysis to design, in design itself, and in the disciplines of
> coding. It's much easier to avoid putting bugs in than to take
> them out.
Everyone knows that. It's just beating a dead horse. But companies demonstrably want to ship buggy and feature rich software fast, rather than ship well designed and lean software. And b) don't want to pay for developers of the kind that write good code from the beginning. So with that out of the way, the whole point of real world development methodology is to make sure that the feature-bloated code hastily written by average developers doesn't lack a specification, doesn't deteriorate over time into something that has to be abandoned, and doesn't have such a high risk of modification/refactoring that it can't be maintained.
Now there are some good points to 1.4 "code is better than tests" actually applies: make asserts in the code rather than in the tests, where possible. Fully agreee with that. Or even better I'd say: don't "assert" things, just make the invalid state impossible. This is what types are for. If you can't take a null into a method, don't even accept an object that could be null by accident. Accept an Option type. Yes even in Java. Even in bloody C. Anything that a compiler could have caught shouldn't be either a runtime assert nor a test.
> if bad programmers write bad code that needs unit tests, they're also going to write bad unit tests that don't test the code correctly. So what's the point?
Well firstly they are usually better than no tests. And second, they tend to fail too often, rather than too little. This is what causes the enormous cost incurred by these tests. But apart from that 90% unnecessary cost (test failures are failures of the code rather than the costs, so after changing the code you have to change the test) - they still do add a lot of confidence for refactoring. Because in my experience with a large but bad test suite, there are often false positives but false negatives are much rarer.
The logical conclusion of this line of argument, which is dependent on the proposition that even bad unit tests have value, is that we could just automatically generate copious numbers of unit test cases for a given code base, and we would have something of value. In reality, this approach would be useless even for functionally-invisible refactoring or rewriting, precisely because it would be at the unit level. For that, let alone any substantive changes, Coplien's claim that higher-level tests are better stands, so this article cannot be dismissed as "beating a dead horse."
> The logical conclusion of this line of argument, which is dependent on the proposition that even bad unit tests have value, is that we could just automatically generate copious numbers of unit test cases for a given code base, and we would have something of value.
Actually, this is a surprisingly useful strategy! There are two common versions of this idea:
1. Invariant testing with random data. This usually goes by the name of "QuickCheck", and it allows writing tests that say things like, "If we reverse a random list twice, then it should equal the original list." In practice, this catches tons of corner case bugs.
2. Fuzz testing. In this case, the invariant is, "No matter how broken the input, my program should never crash (or corrupt the heap, or access uninitialized memory, or whatever)." Then you generate a billion random test cases and see what happens. This usually finds tons of bugs, and it's a staple of modern security testing.
So yeah, even random test cases are very valuable. :-)
These are indeed useful strategies, but they do not invalidate the point I am making here. Fuzz testing is system-level testing, not unit tests, and so is along the lines Coplien is proposing (especially when combined with assertions in the code.) Invariant testing implies an invariant, i.e. an interface-level design contract, while the post I was replying to was predicated on systems where "the code is the spec", and my reply considers only automated unit-test generation that is based only on what the code does, not what it is supposed, in more abstract terms, to do. The "logical conclusion" is based on the possibility of automatically generating unit test cases that function as such, up to any given level of coverage, without adding value.
OK, just to back up a step, here's my best argument in favor of automated testing:
Virtually nobody just makes a change and ships it users without some kind of testing. Maybe you run your program by hand and mess around with the UI. Maybe you call the modified function from a listener with a few arguments. Maybe you have sadistic paid testers who spend 8 weeks hammering on each version before you print DVDs. (And those testers will inevitably write a "test plan" containing things to try with each new release.)
The goal of automated testing is to take all those things you were going to do anyway (calling functions, messing with the UI, following a test plan), and automate them. Different kinds of testing result in different kinds of tests:
1. Manually calling a function can be replaced by unit tests, or even better, doc tests. Everybody loves accurate documentation showing how to use an API!
2. Manually messing around with the UI can be replaced by integration tests.
3. Certain things are much faster to test thoroughly in isolation, so you may mock out parts of your system while testing other parts in great detail, then tie everything together with a few system-wide tests.
People love to invent elaborate taxonomies of testing (unit, request, system, integration, property, etc.). But really, they're all just variations on the same idea: You'd never ship code without testing it somehow, and it's boring and error-prone to do all that testing by hand every time. So automate it!
No-one is arguing against automated testing - in fact, I have said elsewhere that I think test automation is the best single thing most development organizations could do to improve their process. In the article, Coplien says that his position is often mistaken for an attack on test automation, which it is not - and it is certainly not an argument against testing, either; it is an argument about how to test effectively.
What I am doing here is using the idea of automatic unit test generation to show that Coplien's argument cannot be dismissed with the claim that all unit testing adds value.
> The logical conclusion of this line of argument, which is dependent on the proposition that even bad unit tests have value, is that we could just automatically generate copious numbers of unit test cases for a given code base, and we would have something of value.
I don't think that conclusion is any more valid than the proposition that if we just skipped the developer alltogether and generated both the business code AND the test, we'd have something of value.
> Coplien's claim that higher-level tests are better stands, so this article cannot be dismissed as "beating a dead horse."
i think higher level tests are good, but I don't think that makes lower level tests bad. I think there should usually be a health mix, with systems tested on several levels.
> I don't think that conclusion is any more valid than the proposition that if we just skipped the developer alltogether and generated both the business code AND the test, we'd have something of value.
That would have some value, as evidenced by all the useful spreadsheets produced by end users, but the the value of the conjunction ("the business code AND the test") lies entirely in the 'business code' part, which is why we do not automatically generate unit tests for spreadsheet macros, or any other code for that matter.
>I think higher level tests are good, but I don't think that makes lower level tests bad.
That's not my point; my point is that unit tests are not automatically useful. Once we accept that, Coplien's argument that there are better ways to spend our limited resources cannot be dismissed (and resources are always limited, because of the combinatorial explosion, which is most evident at the unit level).
Any idiot can ship version 1.0. I get exasperated with people who try to act like they have some special wisdom to share about getting the first version out. Yes, there are some specific things you need to do that some of us lack, but you can find those in very nearly every single self-help or motivational book.
It takes some proper skill, vision, and maturity to ship version 3.0. And you probably won’t get a big payday if you can’t.
> ... if bad programmers write bad code that needs unit tests, they're also going to write bad unit tests that don't test the code correctly. So what's the point?
Without getting into the greater unit testing debate... the point would be having a testable code base.
Those who test-first guarantee that tests exist and that code can be exercised in a test-harness. Those who do not generally have code that cannot be exercised in a test-harness. Granting bad developers who write bad code and bad tests: they're at the least writing tested bad code. That lowers the burden to fixing it, and ensures that any given issue will be solved in isolation instead of requiring a major, risky, rewrite to get the system to the point of further maintenance.
Having that test apparatus puts your maintenance developers on a much nicer foot and provides the blueprints for migrating, abstracting, or replacing most any system component, along with meaningful quality baselines. It's a mitigation technique for the high-level big-bang rinse-and-repeat system cycles that pop up in the Enterprise space, but no silver bullet.
Bad tests can be deleted or improved with impunity, but a bad logical system core made with no eye towards verification is expensive and risky to even touch. For a small website that means nothing, for an complex legacy monster system about to be rewritten for the 4th time it can make all the difference in the world.
Testable code is not a good thing in and of itself. Code should only be testable at its high level interfaces i.e. interfaces that are close to the requirements. So if the code is a library then the high level interface may be the public interface of a class. In that case you may need single class tests. But to do anything useful (in the sense of high level requirements) most code use a combination of various different classes or modules. Write tests at this level because it gives you two advantages:
1) If you decide to change your implementation (refactor low-level code to make it more performant/re-usable/maintainable) then you won't have to change your tests.
2) It will simplify your implementation code.
The second point is as important as the first: In general your code should have the minimum number of abstractions needed to satisfy your requirements (including requirements for re-use/flexibility etc). Testable code often introduces new abstractions or indirections just for the sake of making it possible to plug in a test. One or two of these are OK. But when you add up all the plug points for low level tests in a system it can make the code a lot more complex and indirect than it needs to be. This actually makes it harder to change which was not the idea behind making it testable in the first place.
Gui code may be an exception here. Because gui's are often slow and unstable to test it is worth cleanly separating the gui from the model just to make the system more testable, even if you never intend to use the model with any other kind of ui (the usual motivator for separating model and gui). Any model-view system will help here. If you have a system where the gui is more or less a pure function of the model then you can write all your scenario/stateful tests at the model level and just write a few tests for the ui to verify that it reflects the model accurately
Well-structured code is a good thing in and of itself, and more testable code is inherently more well-structured (or rather, untestable code is inherently badly structured; it's still possible to have testable but badly structured code, but if you insist your code be testable then you will avoid many of the pitfalls)
> Testable code is not a good thing in and of itself. Code should only be testable at its high level interfaces i.e. interfaces that are close to the requirements. So if the code is a library then the high level interface may be the public interface of a class.
Perhaps I misunderstand you, but I have to disagree with this.
Testable code is absolutely a good thing in and of itself.
When something breaks, being able to isolate each piece of the codebase and figure out why it's breaking is essential.
Far too often I've got bug reports from other developers saying "System X is broken, what changed?". When I ask for more detail, a test case proving it's broken, etc - they can't come up with that. Their test case is "our entire service is broken and I think the last thing that happened was related to X". Sometimes it is a problem with System X, but more frequently it's something else in their application.
The smaller you can make your test case, the better, and the easier it is to get a grasp of the acutal problem.
In my experience, it would be pretty hard to consistently write unit tests that pass and at the same time don't test the code. They might not test the code complete enough, but they will test the code. That's still better than not testing at all, and can always be fixed by adding tests later.
Tests don't have to be smarter than the code. They just have to be different code. They're screening tests, not diagnostic tests - if the test and the code disagree, you might have a problem. If they don't, then hopefully you don't.
But exactly these tests tend to be the useless ones that just pass and don't find bugs.
On 2 separate occasions I had an app with extensive unit tests that seemed to work fine but also seemed to have strange rare bugs.
Both times I wrote a single additional "unit" test that fired up the environment (with mocks, same as the other unit tests) but then acted like a consumer of the API and spammed the environment with random (but not nonsense) calls for several minutes. These tests were quite complex so basically the exact opposite of what you're suggesting.
Not only did I immediately find the bug, but in both cases I found like 10 bugs before I got the test to even pass for the first time.
At the same time all the small unit tests were happily passing. Because they didn't hit edge cases (both in data and in timing) that nobody had thought of.
Yeah but the cases I had in mind, I probably coded correctly. The cases I forgot about are more likely to cause problems (weird cases in both data and especially timing / concurrency handling).
>, if bad programmers write bad code that needs unit tests, they're also going to write bad unit tests that don't test the code correctly. So what's the point?
I don't think one necessarily follows the other. Consider the examples of test pseudo-code:
A novice bootcamp graduate could write that test harness code.
However, it requires an experienced programmer with a PhD in Machine Learning to write the actual neural net code for image_to_text(). (Relevant XKCD[1])
(Yes, there's also variability of skill in writing test code. An experienced programmer will think of extra edge cases that a novice will not and therefore, write more comprehensive tests.)
That still doesn't change the fact that writing code that tests to check for correct results is often easier than writing the algorithm itself.
It also contradicts the notion that refactoring code requires that the test code be refactored. That's not always true. Since the jpg file format came out in 1992, that test code could have been written in that same year and it would still be valid today, 25 years later. We still want to ensure the image_to_text() function will return "cat" for "photo1.jpg" even if we refactor the low-level code to use the latest A.I. algorithms.
Another example of a regression test to test performance:
Again, it's easy for a novice programmer to write a "performance timing specification executes in less than 500ms". It's much harder to refactor the code from slow code O(n^2) to faster code O(n log n).
To do a meta-analysis of the debate with Coplien: a big reason we're arguing his conclusion is that he complains about this abstract thing he called "unit tests". If he had copy-pasted concrete examples of the bad unit testing code into his pdf, maybe more of us would agree with him? In other words, I'd like to see and judge for myself if Coplien's "unit tests" look like the regression tests that SQLite and NASA use to improve quality -- or -- if they are frivolous tests that truly wastes developers' time. Since he didn't do that, we're all in the dark as to the "test code waste" he's actually complaining about.
Yes, and furthermore, due to the combinatorial explosion, and as explained in the article, unit tests can only cover an infinitesimal fraction of all the possibilities at that level. The sort of interfaced-based testing favored in the article ameliorates the problem because it uses the inherent abstractions of the code to prune the problem space.
In addition, if you depend on unit tests, you cannot tell if a change has broken the system - you can expand or rewrite your tests to cover the change (if we put aside the question of what requirements these tests are testing), but that cannot tell you if you have, for example, violated a system-wide constraint. Only testing at higher levels of abstraction, up to the complete-system level, can help you with that.
Therefore, to the extent that the "the code is the specification" argument is a valid one, it actually leads to the conclusion that Coplien is right.
In my opinion, the way that most unit tests are written is mostly a waste of effort, but there are a class of unit tests that are much better suited to the task. The class of unit tests I'm referring to is property-based tests.
The most famous property-based testing framework is QuickCheck, but there are plenty of others, e.g. FsCheck for C#/F#, ScalaCheck for Scala, Hypothesis for Python, etc...
The general idea being that you write out a specification that the code should meet, and the test parameters are determined at runtime, rather than being hardcoded. There's more to it than that, but that's the general idea. If you want a simple introduction, I'd recommend this video:
That’s actually one of the reasons I like Elixirs approach to inline documentation so much. If you include a code example, the example is run as part of the test suite. Makes is dead simple to automatically include simple unit tests right where the code is along with documentation that had to remain up to date.
This is also somewhat popular in Python, Haskell and Rust. Nice, but doesn't seem applicable to all the situations. Works quite well for collections of simple independent functions e.g. I did that for a regex library https://github.com/myfreeweb/pcre-heavy/blob/9873fb019873340... — but didn't use much for other projects…
Yea, definitely not for everything. It's more of a complement to the rest of the test suite and also verifies that the documentation/examples in code comments are still accurate.
> That is, the only specification for how the system should work is how it worked yesterday.
You'd think, if this was the desired model, you could partially automate the "writing unit tests" part of this process. (Integration tests no, but unit tests yes.) The "spec" for the unit tests is already there, in the form of the worktree of the previous, known-good commit.
That means that, in a dynamic language, you'd just need a "test suite" consisting of a series of example calls to functions. No outputs specified, no assertions—just some valid input parameters to let the test-harness call the functions. (In a static language, you wouldn't even need that; the test-harness could act like a fuzzer, generating inputs from each function's domain automatically.)
The tooling would then just compare the outputs of the functions in the known-good-build, to the outputs of the same functions from your worktree. Anywhere they differ is an "assertion failure." You'd have to either fix the code, or add a pragma above the function to specify that the API has changed. (Though, hopefully, such pragmas would be onerous enough to get people to mostly add new API surface for altered functionality, rather than in-place modifying existing guarantees.) A pre-commit hook would then strip the pragmas from the finalized commit. (They would be invalid as of the next commit, after all.)
Interestingly, given such pragmas, the pre-commit hook could also automatically derive a semver tag for the new commit. No pragmas? Patch. Pragmas on functions? Minor version. Pragmas on entire modules? Major version.
Ah, but the spec is often really poorly defined. That is all sorts of edge cases and uncommon code paths do not work as imagined. And, of course, it is full of bugs.
I have found countless "bugs" and issues when writing tests for existing code. I also often have to refactor the code to make it clearer what it's doing or to make it testable.
Unit tests are a development tool more than they are a testing tool. They are a means to produce correct, well documented, well speced, cleanly separated code.
> The normal practice for large scale codebases in complex domains is "the code is the spec".
”The code is the spec” is just a way of saying “we don't know what the code is supposed to do, and the intent was never, even initially, properly validated by domain experts, or, if it was, we tossed the knowledge generated in that process.”
This is especially the case if that phrase is used about a large system in a complex domain.
I don't see such a large overlap between code review and unit testing. Here's an example from one of my code reviews, where I discovered two issues:
1) A relaxed requirement was agreed to by the PO, but the remote developer wasn't aware of that and wrote code to fulfil the original complex requirement. The implementation was harder to understand and touched more areas of the code, leading to issue number 2.
2) By analysing the interactions between multiple units, I was able to determine that the code as implemented was in fact reading a setting too early, when it was not available, thereby having no effect at all compared to the already existing code. Ironically, this exact concern had prompted the negotiation with the PO which resulted in the relaxed requirement.
Interestingly the error was not caught by unit tests, because it didn't happen in a single unit. It wasn't caught by integration tests, because that particular online component was mocked and it passed code review by two other developers.
How many times do you need to repeat the review of a given unit of code? (actually, that would have more chance of catching additional bugs than would rerunning the same unit tests.) Once you change the code, the existing unit tests are, by definition, invalidated (i.e. there is no value to being able to rerun them), while most higher-level testing will retain their validity. You have actually created an argument for automated higher-level testing (the author mentions that his position is often mistaken for an attack on test automation.)
By definition, unit tests test the implementation. A test at the API design contract level is an integration test. The only place where these are the same thing are pure (side-effect-free) functions that do not call other functions.
... unit testing is automated, repeatable, code review!
Except when it isn't.
"Unit tests can be one form of code review (that also happens to have the advantage of being automated and repeatable). But should never be mistaken as substitute for actually understanding what the code does, why it does it and how it got that way. Which requires an incompressible amount of effort, analysis, and plain and pure grit", would be another take on the matter.
> Even tests that just duplicate the business code under test and assert that it's the same (A huge waste in normal cases) is useful.
Maybe, but there's no reason to write such tests by hand. Just automatically test each method against the previous version and see what changed.
I think the generated results would be too noisy and developers would stop paying attention to test failures, but maybe a strict enough code review process could prevent that.
Most unit tests don't test functionality at the business requirement level. Most unit tests I've seen are more fine grained. I think keeping your requirement documentation as comments on unit tests that are a literal translations of those documentations is amazing and very helpful.
But I would argue that represents in my experience just 10% of the unit tests I see. Instead I see a lot more testing of implementation details.
I agree. Espacially since most projects I worked on didn't have a doc at all. When the doc existed, it was always obsolete. Unit tests are not a silver bullet, but if you request passing tests and coverage to put in prod, you basically request a minimal & always up to date documentation.
And even with all the (very real) downsides of it, it still a huge plus.
Besides, a lot of persons tend to see tests in a very rigid way. You don't have to do tdd. You don't have to do have full coverage. You don't even need all your tests to be unit tests, you can mix end2end and functional tests with it. You can have hacky mocks in it too. It doesn't need to be perfect for your team to benefit from it.
Actually, I would say that you probably want it not to be perfect so that the benefits outweight the costs. Cause they, tests have a huge cost.
People seem to have not read the article properly and are conflating unit testing, with all testing. The article doesn't say to avoid all unit testing, but to restrict the test suite to that which can be validated by business logic or some formalized oracle - this would imply unit testing for critical systems like hardware drivers, banking/avionics, crypto, etc, where there are known and eternally consistent results that relate to the feature spec. But it also suggests that, in most cases, maintaining systems and integration testing are more economical usage of time, and further implies that they are more 'correct' in light of the fact that they directly relate to the feature spec.
I believe I came across this document (or something close to it) a few years ago, and it changed my life. I was able to get great velocity and low bug-per-feature count (1 or 2), using systems/integration testing combined with exploratory testing by a QA team. I was even able to find subtle bugs in the unit-tested back-end via integration tests from a mobile app. At the end of the day, if your testing isn't serving the business logic, it's wasted effort.
The most bad-ass thing I remembered was the part where if you couldn't elicit a code path using a systems test, it's a good candidate for deletion.
These days, a large majority of the frontend tests in my codebase aren't tests that I've written myself. They're automatically generated Jest snapshot tests that capture the virtual DOM output of entire components in different states as dictated by product requirements, using the StoryShots plugin for React Storybook.
These tests prove to be extremely effective in catching bugs, because they directly reflect product requirements, and because they end up exercising a large majority of the codebase for very little cost compared to covering the same amount of code with individual unit tests.
The latter point is an important consideration too, because a lot of the code these integration-level tests exercise are code that I'd have considered too trivial to have been worth the ongoing maintenance costs in building unit tests for, when considered in isolation.
This means it'd have been easy for me to be satisfied with selectively and arbitrarily deciding which pieces of code should qualify for test coverage, and leave test coverage at some arbitrary number, as opposed to what I do now, which is to always strictly require 100% code coverage, but explicitly and deliberately exclude code that I have good reason to not test, which feels like a much more solid framework for ensuring code quality.
I still write unit tests for functionality that component snapshot tests can't reasonably cover, but usually only at a level of granularity where the tests themselves can map cleanly to product requirements as well, instead of painstakingly testing every little function in perfect isolation.
I'm afraid that the TDD movement went to far. The proponents became so convinced that they were right that they started comparing themselves to people who argued that washing hands was important in early surgery, and that those who questioned it would soon be unemployable in the field. Not those who didn't practice it, those who questioned it.
I dealt with some of this, including a rather bullying type who tried to browbeat people into writing tests first. And while that may be a bad example, I'm afraid it really wasn't unrelated to the movement itself.
When people, a while ago, said "TDD is dead", they didn't actually argue that the technique wasn't useful. They were saying that the whole "you're wrong, I'm right, if you want to stay employed you'll do as I say and practice TDD", that is no longer defensible.
It's probably time to leave that in the past, and hope that people who promote a methodology this way have learned the mistakes of aggressively cramming things down everyone's throat. Truth is, I was actually a frequent practitioner of TDD, and I was pretty appalled with how it was getting pitched to the programming community.
Now, let's leave that in the past and take a fresh look on whether TDD is a very beneficial practice in many contexts. I certainly agree it isn't crap.
As in many areas of thought, when the pendulum swings too far in one direction, it then swings too far in the other direction. Hopefully we'll arrive at a reasonable middle ground at some point, and unit tests will be valued (and prioritized) neither too much nor too little.
On a side note, unfortunately I think the software industry tends to be particularly bad about this pendulum swinging back and forth between extremes thing. There is so much emphasis on innovation that people are biased toward making radical changes and believing they are going to revolutionize everything. And the culture is such that the more you go out on a limb and push something radical, the more you are respected, because we often value guts more than good judgement.
> the pendulum swings too far in one direction, it then swings too far in the other direction
> Hopefully we'll arrive at a reasonable middle ground at some point, and unit tests will be valued (and prioritized) neither too much nor too little.
My coding style has very much gone through a similar evolution. When I first learned about TDD, I went in whole hog - test everything, ui tests, request tests, unit tests - tests for everything.
Then I started to notice that the development costs associated with extreme coverage did not, in fact, pay off.
The number of bugs found was vastly outnumbered by the time wasted dealing with the peculiarities of various ui testing frameworks, let alone the amount of time wasted waiting for those tests to run.
My metric is now that I've written enough tests to feel confident that the code works. It's an extraordinarily qualitative metric, and I cannot find a way to objectively quantify it, yet it very much works for me.
Really, it all comes down to visibility. Is there somewhere that is up to date, that will tell you what the code is supposed to do? Do you have tooling in production that will alert you when it's not doing what it's supposed to do? Do you feel confident that the code works?
This. A thousand times this. The debate about testing should stop being about what type of testing os the best and start being about what type of test to use where.
"In my experience, unit tests are most valuable when you use them for algorithmic logic. They are not particularly useful for code that is more coordinating in its nature."
I agree in principle, but IME:
* This kind of algorithmic code often makes up a fairly small proportion (~5%) of code written for most business driven applications - it's usually mostly coordination. Some domains may be different (and for them, unit testing will appear to be much more effective), but I think this is the norm.
* Integration tests can test algorithmic acceptably well, but unit tests do not test integration code in an acceptable fashion.
* Where you have algorithmic and integration code smushed together (this type of technical debt is, sadly, the norm 'in the wild'), again, integration tests work acceptably well whereas unit tests require a mess of mocks.
* Integration tests do not have to be high level or slow.
"Refactoring breaks tests. Sometimes when you refactor code, you break tests. But my experience is that this is not a big problem. For example, a method signature changes, so you have to go through and add an extra parameter in all tests where it is called. This can often be done very quickly, and it doesn’t happen very often. This sounds like a big problem in theory, but in practice it isn’t."
This is a big problem in practice when you are working on large scale code bases in which the developers have not been super strict about decoupling everything.
> Integration tests can test algorithmic acceptably well, but unit tests do not test integration code in an acceptable fashion.
True, but you can have both types of tests, which lets you test algorithms (unit tests) and higher level combinations (integration tests) both in the best way. The right tool for the right job.
> Where you have algorithmic and integration code smushed together (this type of technical debt is, sadly, the norm 'in the wild'), again, integration tests work acceptably well whereas unit tests require a mess of mocks.
In many cases, you'd be better off separating the logic, which would let you have both types of tests, each for a different part of the logic.
> Integration tests do not have to be high level or slow.
You can, but if your code is 95% coordination and 5% algorithmic, at most you'd want 5% unit tests.
>In many cases, you'd be better off separating the logic
Which is refactoring and if you're refactoring you need to have your code surrounded with tests to do it safely.
And, once you've done that, if you've already written integration tests for that logic and they perform acceptably well, there may be no point in rewriting the integration test as a unit test.
For well-defined algorithmic logic, I'd say one is better off mostly using existing code (usually via a library). If a library does not exist, write the function as stand-alone with tests exercising its API. Preferably release it as open source.
There are a number of points here that I do not find convincing - for example:
The combinatorial complexity argument does apply more (much more, in many cases) to unit testing than to testing at higher level of abstractions, broadly for the same reasons that abstraction ameliorates the design-complexity problem. Consider a program using heuristic methods to find a near-optimal solution to an NP problem: not only would testing that it produces valid solutions at the program level be a lot simpler than unit-testing all its components, no amount of the latter would establish that any of its solutions are valid. The point is not that integration testing allows you to cover the whole state-space, but that it is a more effective way to select what you test.
The point of assertions is that they are tested in the execution of integration tests.
It has been my experience that most of the effort expended in the debug-fix cycle is a result of errors in decomposing the problem (or, equivalently, composing the units) - for example, an incorrect assumption about the data a unit uses, rather than the handling of the data in accordance with the assumptions made about it. Discrepancies between what is expected of a component and what it does also seem to be common, such as when one component is predicated on a constraint being respected by another, but the latter does not always do so.
I'd love to submit this document to HN, so it has a chance to get a wider audience than provided by this sub thread.
However, the title "Seque" is completely unspecific and hence totally useless. Given that title, I don't see how to submit this to HN in a way that it attracts readers.
The guidelines on titles allow for writing your own title as long as accurate and more informative than the original title.
The bigger problem, I think, is the velocity of articles appearing and disappearing from the front page makes follow-up articles have an inherently smaller starting audience.
> The guidelines on titles allow for writing your own title as long as accurate and more informative than the original title.
Mods appear really clear about this. They want you to use the original title unless it's clickbait, or inflammatory, or misleading. Only then can you change it, and they say an informative sentence from the article might do.
This means that articles with terrible, unclear, titles get posted to HN all the time.
Certainly mods actively edit useful titles, explanatory titles to replace them with the terrible, unclear original title (Steven Hawking's PhD thesis is a good recent example).
Are you immortal or do you have a time machine? New people with new opinions mature every day. Pay gap was debunked long ago yet the dead horse gets beaten every month.
Don’t underestimate the intelligence of your people, but don’t
underestimate the collective stupidity of many people working
together in a complex domain.
Sometimes I find that unit testing is an attempt to spend as much time as possible experiencing the first reality (you are a smart, competent programmer) as a relief from the pain of the second reality (you have a gnawing fear that your understanding of the context is fatally flawed.) Can't figure out if the work you're doing is constructive? Anxious about the lack of requirements? Go soothe yourself writing unit tests. Kick back and watch the integration tests run. You're doing your job, and the rest will take care of itself, right?
This is such a great document. I have referred to it quite a lot in my last job as we were restructuring our test. We mainly had unit tests that we'd just try to make green every time after changing code. It didn't stick well with me that we'd just rewrite the code in a different syntax (Rspec DSL). You don't really test anything in that way.
Just be prepared for a lot of shocked reactions when you say 'Most Unit Testing Is Waste' to people who have not been softened a bit to this concept.
I really like this way of thinking. The next chapter of getting into this subject are the live discussions (5 videos) between Kent Beck, David Heinemeier Hansson and Martin Fowler talking about TDD (which relies a lot on unit tests): https://martinfowler.com/articles/is-tdd-dead/ I enjoyed these a lot too and the combination of these sources has really improved my testing.
This article is an anti-resume, and I recommend not reading it. I'll take a look at two claims.
Unit tests are unlikely to test more than one trillionth of the functionality of any given method in a
reasonable testing cycle. ... Trillion is not used rhetorically here, but is based on the different possible states given that the average object size is four words, and the conservative estimate that you are using 16-bit words)
An int may contain "four billion states", but for the requirements, its highly likely that we can classify the integer into three states: less than zero, zero, greater than zero. As a bank, I might not care how much money you have, only that you have more than zero. In a transaction, I don't care how much money changes hands, as long as no money is lost.
Pointing at memory-as-bits, as if we're still using punch cards, and then hand waving "I can't possibly test this", ignores sixty years of progress. The refusal to imagine a class with range checking is a damning statement about the author's own ability as an engineer.
Programmers have a tacit belief that they can think more clearly (or guess better) when writing tests [than] when writing code, or that somehow there is more information in a test than in code.
Consider writing a sorting algorithm vs testing a sorting algorithm. Would you feel more confident writing the test for a sorting algorithm than writing the algorithm itself? The test is simple: is every item in the list less than the next item? The code is far more complex. We're in the same realm as NP problems. I can write a test to verify that a graph is correctly 3 colored, but the code might be a bit harder.
Perhaps, then, the author's experience of other developers believing they can "think more clearly" is actually his observation that the developers are solving simpler problems, and are thus more confident. And that is the point of tests: it is easier to verify than solve.
In short, every conclusion in this article begs the question, "Might there be another explanation?"
The best workflow I've found is to define an empty function, add a breakpoint, and code the function in a repl. At this point I'm pretty confident the code works for at least a 'happy path' input so I copy the code to the editor and add tests to call it with a variety of other inputs.
Being able to inspect the state of the program in real time is invaluable and gives me a lot of confidence that I understand how the code works. For some reason most programmers I see don't even run their code locally and just use log statements to guess at state when it invariably doesn't work as expected.
Another big problem is test data. I see way too much naive mocking. You really need to exercise your code with data that is as close to real-life input as possible, ideally it's a sanitized version of production data. Other tests are great too (i.e large lists of 'naughty strings') but if you're manually specifying your test data you are a) spending a lot of time doing something that should be automated and b) are only exercising your code with what you think it might see which is usually not good enough.
Log data is how unit testing started I think, people would log all the output and compare the sheet of actual log output to the sheet of expected log output.
Working in an GUI debugger / REPL is great for visualizing code I often code that way myself, but let's not fool ourselves, it still requires manually setting breakpoints, and pressing keys to step through code. You can't do this for every method after every change, whereas unit testing has that advantage. I do agree with most things mentioned in the article though. A lot people end up writing tests that hit databases and are too slow or fail randomly due to chained state, or are over specified & end up just getting in the way. What it comes down to is you can have good tests or bad tests, and its still entirely subjective just like whether the code itself is good or bad. I recommend the book xUnit test patterns, its basically a bunch of "rules of thumb"
"my team told me the tests are more complex than the
actual code. (This team is not the original team that wrote the code and unit tests. Therefore some unit tests take them by surprise. This current team is more senior and disciplined.)"
Despite the inflammatory title, the point the PDF is making I believe is that unit tests should be kept short and simple, and convey the intent of your code.
Intent is akin to the 'why' - written code explains the 'how' and 'what' extremely well, but without the 'why' it loses all meaning. Writing tests to convey intent is essential because no computer or programming language can do this for us currently, so it's left up to us.
Unit tests are great when you have a large code base with dozens of apps and need to modify a core library to remove side effects. How do you know you didn't break one of the apps?
But if we go deeper, why does the core library have side effects? Because it was a crummy, poorly designed piece of crap to begin with. If it was originally written with high quality it wouldn't need the refactor now. The author noticed this. He wrote good code to begin with so it didn't need a lot of testing.
When you have a large team of mediocre engineers you need unit tests to guard against more bad code from getting in. They might even be brilliant engineers, stuck in a horrible process of churning out features to unrealistic deadlines.
If you have a great team, a great process, and a great budget with realistic goals, you can crank out amazingly good software without the need for a lot of unit tests. But that's not the real world. In the real world we have lots of unit tests. They are a band-aid over the other problems without addressing them specifically.
It's easy to disagree with someone who portrays an enemy which doesn't exist.
I disagree with this piece everywhere it manages to land somewhere concrete, where its claims can be verified and assessed.
> But if we go deeper, why does the core library have side effects
Let's not get ahead of ourselves in theoretical, non-applied functional programming.
As even Haskellers recognize, all computing would be meaningless if there ultimately was no side effects. In fact, we use computers for their "side-effects".
Sometimes you need side-effects. And sometimes you unit-tests are great way of automatically verifying that you have the right side-effects.
class AccessCountedInteger {
private final int value;
private int accessCount = 0;
AccessCountedInteger(int value) {
this.value = value;
}
public int getValue() {
accessCount++;
return value;
}
public int getAccessCount() {
return accessCount;
}
}
This unit has side effects on the method getValue() that impact what is returned by getAccessCount(). While it is contrived, if you are making something following the builder pattern, you will in general have a lot of side effects from all of the methods called on the final build() method. This is quite amenable to unit testing.
That is not the kind of side effect that Haskellers would acknowledge as necessary (and I'd see it as a bad idea). We can call it a "side effect" but I think there's a qualitative difference between that and something like file I/O or async.
Just... small point, but: side effects can be encapsulated, abstracted, and injected to create testable code in "side-effect" heavy situations.
Complex running dialog with a hardware serial device, for example, could be rearchitected as a component that has a clear-text dialog with a serial device proxy. This would create functional, side-effect free, testable units of code while also maintaining rich side effects in production (handled by pure integration tests).
Essentially: if you _have_ to use an integration test you should almost always be refactoring into something that you handle with unit tests in conjunction with your integration tests. Otherwise you're leaving maintenance developers with too much risk when making changes and no clear separation of your domain model from your application code.
> side effects can be encapsulated, abstracted, and injected to create testable code in "side-effect" heavy situations.
Absolutely, but at that point you're moving the side effect out to the boundary and separating it from the logic. Then you can unit-test the logic away from the side effect (but the test of the side effect itself still has to be an integration test).
I think his point is that from the functional perspective I/O is a side effect, since you're mutating state. But for almost normal beings the I/O is the thing that actually matters, not the twiddling of bits in computer memory.
Quite the opposite. In Haskell I/O is an explicit effect. In impure languages I/O is something that happens as a side effect of just calling a function.
Even the best code written by the best developers will eventually need to be refactored if the software lives long enough. Changes in requirements, environments, and tools will eventually invalidate the initial design. If the software will only be used for a couple years and then discarded then you can get away with minimal unit tests. But extensive unit tests are absolutely essential to successfully maintaining a system with a 10+ year lifecycle.
The kind of refactor that you are talking about will require a rewrite of the unit tests. Unit Tests are a burden in restoring. They only stay the same if liskov substitution holds. Which is rare in any significant change.
"Unit tests are pointless" -- this is often said by developers who suffer Dunning-Kruger syndrome. Except the devs who are actually working on something more exotic that cannot be tested well automatically.
Yes, but devs who don't write unit tests are probably not going to write integration or acceptance tests either.
Maybe except of a one guy who I talked to a while ago. He does not write unit tests, because static type checking in C++ takes care of everything that unit tests do (according to him), but I actually have seen his code that contained few system tests, so there are exceptions.
I think it advocates wrong attitude towards software development and I think most points made in the article are just plain wrong and stupid.
I guess the main reason for my condescending sentiment is due that the author gives 0 examples of code that requires tests vs code that does not so it is actually not.
> devs who don't write unit tests are probably not going to write integration or acceptance tests either
Why do you think they aren’t? I don’t write unit tests except when I’m specifically paid extra for that (yes, coding strongly-typed languages too), but I do write other kinds of tests as I see fit.
Doing TDD is a habit and if a developer does not find unit tests useful or practical, then I don't think they are going to write tests that are more complicated than unit tests.
Btw, what type of software do you write and how do you know if it actually works?
> I don't think they are going to write tests that are more complicated than unit tests
I find unit tests are useless for most cases. But this doesn’t stop me from writing more complicated tests. Even unit tests when they’re useful. E.g. for sufficiently complex low-level SIMD math routines: inputs & outputs are simple, no IO, no multithreading, no large dependencies, no side effects, just some computations, and tons of weird _mm256_verb_typesuffix intrinsics everywhere i.e. it’s very easy to make mistakes writing these.
> what type of software do you write
Lately Windows CAD and Linux embedded. Previously mobile software, PC and console videogames, CNC & robotics, GIS, WinCE/embedded, multimedia/video codecs, utilities for Windows administration, lots of other stuff.
> how do you know if it actually works?
I use strongly typed languages that catch 99% of stupid errors at compile time. I test it manually while using debugger at the same time to inspect internal state. I design my software to be testable, e.g. logging, asserts, configurable debug dumps, well-defined interfaces between components so they can be tested in isolation, etc.
P.S. Today I’ve fixed 3 bugs in my code.
[Windows] A user tried using my software on old AMD CPU, it didn’t work ‘coz SSE 4.1 instruction set is not supported.
[Windows] Under some conditions, my code reads value from a depth+stencil D3D texture while the GPU is still rendering previously submitted commands into the same texture, read fails.
[Linux] When both HDMI and DSI displays are connected, my DRM/KMS client code selects the wrong one.
All three are caused by environment or external hardware, and you can’t unit test these.
> In most businesses, the only tests that have business value are those that are derived from business requirements.
This is, imo, the most important takeaway and it is, or should be, obvious, but it often isn't (maybe because "common sense is the least common of senses" or something like that). It's also what I strive for, with one caveat[1].
> One is to use it as a
learning tool: to learn more about the program and how it
works.
Another great takeaway. Good tests should work as documentation, imo. I consider this another test quality metric, even: If looking at the tests only confuses me further, those tests need to be changed (and the code they're testing too, most likely).
[1]: Striving for this can turn code coverage into an useful metric. Not of correctness, of course, but on how much code we're writing that doesn't solve a business requirement. That code should be refactored away to separate libraries or replaced with third party libraries that already do that. I'm firmly in the "avoid NIH" camp.
> 1.4 The Belief that Tests are Smarter than Code Telegraphs Latent Fear or a Bad Process
That's not why you write unit tests. They are, in a roundabout way, programming's way of double-entry bookkeeping. Tests and code both are there to double check the other one. Neither one is "smarter" than the other.
Refactoring code usually means you have to refractor unit tests as well, since they are so closely tied to the structure of the code.
Integration tests, however, should not have to change, since they are typically run against the external interface to the software, which shouldn't change if you're just doing a refactoring.
Funny thing. Unit tests force you to think about good code design. Integration tests don't. If you only have integration tests in your system, you'll most likely end up with a big ball of mud. I've seen this so many times in so many real world projects that it hurts.
Except that in my experience, unit tests just end up being a big ball of mud too. And at that point, they don't make refactoring easier, they actually make it harder.
Having good integration tests means that you're free to completely re-organize your code, and you can still test that it works as expected.
Given that requirements change over time, even a well thought out code structure will likely need to be refactored at some point, and being able to have a test suite that doesn't have to be re-written during a refactoring is extremely valuable.
I've also seen code bases become unnecessarily complex because so much thought was put into how to unit test it, that not enough thought was put into writing clear, easy to understand code.
> Funny thing. Unit tests force you to think about good code design. Integration tests don't.
They absolutely do, but at higher levels of abstraction, in terms of interfaces, interactions, constraints, obligations and responsibilities. In fact, unit and integration tests make you think about design in essentially the same way.
Yes, but broken tests at least give you hints about where you need to fix things. It can be hard to find every piece of code that relies on something you're changing.
As long as there are suitable integration and system tests, this is no problem at all. Bonus if there are defensive assertions as recommended in the paper (design-by-contracts style). Sometimes it is easier to refactor a system without lots of unit-tests, as you need to update/fix lots of tightly coupled tests.
What matters is what the system does (as seen from the outside), not how the underlying classes/functions work. They are mostly incidental.
Most proponents of unit tests use horrible programming languages; they are afraid to change code because anything could break at any time. Stop using those languages and most of the problems 'fixed' by unit testing just disappear.
A lot of us would consider SQLite to be high quality and relatively bug-free. The extensive test suite they've built up that exercises each release is a huge reason for it.
Of test code quantity, Coplien writes:
>If your coders have more lines of unit tests than of code, it probably means one of several things. They may be paranoid about correctness; paranoia drives out the clear thinking and innovation that bode for high quality.
> - Keep regression tests around for up to a year
> - Throw away tests that haven’t failed in a year.
SQLite appears to keep their tests. Somebody files a bug; SQLite writes a test that reproduces that bug; the test code remains long after the bug is fixed. This prevents the bug from reappearing. (Isn't that the main purpose of "regression" in the phrase "regression test"?)
I also think it's better to keep relevant regression tests for years. E.g. Consider that there's piece of code that's been working correctly for 5 years with 5 years of passing regression tests . Imagine a new programmer wants to rewrite the code to optimize it for speed and reduced memory usage. I think we'd feel much more confidient if the new code passes those same regression tests that were accumulated over 5 years.
As for test code size ratio... A lot of good comprehensive tests will have LOC outnumbering the actual code being tested. This is especially true for library code that's used in many places up the stack. I wrote string parsing routines and a reverse Boyer-Moore search routine where the test code (test edge cases, test nulls, test string sizes at 2^32 boundaries, etc) was 10 times larger than the actual code.
Of testing's utility, Coplien writes:
- Testing can’t replace good development
- [...] Tests don’t improve quality: developers do
... which looks like a strawman and a false dichotomy. Can anyone cite a credible development philosophy that believes testing can replace bad developers or bad process?
We could say that about <ANYTECHNOLOGY> such that <ANYTECHNOLOGY> can't replace quality developers. Garbage Collection doesn't improve quality, developers do. Array boundary checking doesn't improve quality, developers do. And so on.
Or maybe there's a difference in terminology? I wonder if Coplien considers SQLite testing "system test" or a "unit test"? Does he consider SQLite "white box testing" or "black box testing"?
Is there a reason why someone would throw away unit tests? I've never understood this. Time was taken to write it, test it, upheave the bug, and now we want to remove the safeguards that we spent time/money on? Leaving the possibility for the bug to infest again?
If you are testing at the wrong layer of abstraction (Your test is tightly coupled to the implementation of your class, as opposed to its observable behavior), then refactoring your code will require refactoring the test.
The correct solution to this is to not throw passing tests out, but to stop doing white-box testing.
(Not to mention that a test that may pass in your CI environment may fail - frequently - in your local workspace.)
> Can anyone cite a credible development philosophy that believes testing can replace bad developers or bad process?
None. But that's hardly the issue, I too often have seen people championing heavy testing as a way to deliver quality software, because it's the agile way. The problem is not with the philosophies but the way people interpret them or misinterpret them.
The sarcastic quote in the article sums my feeling on the behaviour I've seen in a lot of shops: "I find that weeks of coding and testing can save me hours of planning.".
I've had the hardest time convincing people to spend a week thinking about our approach and solution, but adding months of development time to expend test coverage and suddenly everyone thinks it's worth it (And usually with nothing to back that up).
Management keeps asking when we will write unit tests, and I tell them I’m too busy fixing bugs, because each time I fix a bug I
1) examine the entire code base to find any similar problems
2) implement full parameter validation in that code to try to ensure it can’t happen again and
3) add exception logging code so that we will get an immediate error reported directly on the line it occurred if it does.
If I have any extra time left over, I refactor to eliminate code duplication, update/verify/add more parameter and input data validation, and increase code encapsulation.
Then I point out our crash rate has dropped by 70% in the 6 months I’ve been working on the project.
I'm curious what you consider examining the entire code base to find similar problems. If this is more than a 5-10 minute search then I would feel like it's a significant waste of time. And if it is more than that amount of time and you are continuing to find bugs then I would ask what is the problem that allows the same bug to surface in multiple places where it is easy to identify but hard to search for?
Here’s an example. We got a crash report that had a stack trace that pinpointed the location in our code. The people who wrote the code used a design pattern called VIPER that uses 5 classes instead of the classic 3 from MVC. Our code base is in Swift, which uses optional values to protect against nil values. The protection is unwrapping with an if statement, if the value exists the if can execute.
The cause was that one of the VIPER classes forced unwrapped an optional reference to its view. Force unwrapping is without the if, just assume it’s always valid and crash if it’s not.
The problem was they made this assumption because the view is never nil except for one uncommon edge case, if a network operation completed after the view was disposed. So the fix was easy, remove the force unwrap (you should never force unwrap in Swift, it’s terribly bad).
But what about our other VIPER views? It took me a couple minutes to review all 20, and Trey all had the same identical flaw. Fixing them took seconds, but individually testing the fixes took a few hours.
Should I have left those other 20 views alone so our app could mysteriously crash for some of our hundreds of thousands of users?
A more recent bug I found on my own was a memory leak caused by our main views network closures holding strong references to the ViewController. This caused code to fail that was still in old dead copies. So I fixed the closures and we no longer have old copies of the VC handing around forever. I didn’t have time to do a search at that moment, but I took notes in my dev log so I can review every view controller (50 or 60) for similar problems. Why wouldn’t I? How many known bugs and random unduplicated problems would go away if we got our Viewcontrollers memory Managemnt right? im betting at least a few, and that I’ll eliminate some future problems before they can even be found.
Software engineering is usually 80% fixing bugs. Developing rigorous standards to prevent their formation can give you much more time to build new and improved features.
I strongly agree, and disagree with many parts of this.
One part I strongly disagree with, is this passage:
> Most programmers want to "hear" the "information" that their program component works. So when they wrote their first function for this project three years ago they wrote a unit test for it. The test has never failed. The question is: How much information is in that test? That is, if "1" is the passing of a test and "0" is the failing of a test, how much information is in this string of test results:
> 11111111111111111111111111111111
> There are several possible answers depending on which
formalism you apply, but most of the answers are wrong. The naive answer is 32, but that is the bits of data, not of information.
Just because a test has passed in your Continuous Integration environment 100% of the time, doesn't mean that test is worthless. I have checked in many tests that have never failed in CI - but have failed when I was working on my code. However, since there's no shiny team-visible metrics with bar charts about how often a test failed in a local workspace, people can wrongly assume that a unit test is worthless.
> Now, how many bits of information in this string of test runs?
> 1011011000110101101000110101101
> The answer is... a lot more. Probably 32.
If I see that in our CI environment, the answer is 'this test is nearly-worthless.' The long answer is 'If it's a unit test, there's a race condition, if it's an integration test, there's a race condition, or it's flaking for reasons outside our control.'
> Another client of mine also had too many unit tests. I pointed out to them that this would decrease their velocity, because every change to a function should require a coordinated change to the test. They informed me that they had written their tests in such a way that they didn't have to change the tests when the functionality changed. That of course means that the tests weren't testing the functionality, so whatever they were testing was of little value.
... That's the whole point of blackbox testing. If observed behaviour is not expected to change, neither should the test. If a refactoring forces you to update the test, then, yes, you are testing at the wrong level of abstraction.
This piece could have saved a dozen pages if it just told us to stop testing private methods, and write more integration tests.
Why the concern that the volume of tests is greater than the volume of code? Why should that be a constraint, other than some vague sense of wasted time spent typing them out?
One of the best uses for Unit Tests and Integration Tests, in my experience, is for testing things that would be incredibly difficult to test through QA, or the UI.
This is clearly an opinion-piece and as such I guess it's not terrible, but it certainly has enough points to disagree with.
From quickly reading through it, for instance I found the following things directly objectionable:
> "Unit testing was a staple of the FORTRAN days".
Such an attempt at discrediting something can be applied to anything, and it comes off as disingenuous, dishonest.
> "Unit tests are unlikely to test more than one
trillionth of the functionality of any given method in a
reasonable testing cycle. Get over it." ... Trillion is not used rhetorically here, but is based on the
different possible states given that the average object size
Who says you have to test the method? Who says you're not allowed to test functions which operate on a known, closed subset of data?
False dichotomy and plain bad math.
> If you find your testers splitting up functions to support
the testing process, you’re destroying your system architecture and code comprehension along with it.
That may be right. Or it may be possibly completely backwards. It's quite impossible to tell really, without asking why they are splitting up those functions.
If people do this for gaming some sort of system about "at least 80% coverage" or whatever, then clearly you should ask why people feel the need to game the system. Gaming the system, no matter what aspect, leads to bad choices.
Me however, I'm not splitting up my functions to game the system. I'm splitting up my system to separate data-retrieval from data-processing (so I can directly test processing without depending on whatever retrieval depends on).
I'm splitting up my functions to give them name and intents, this makes the code speak much clearer about what it is doing and why. The function name should be "what". The contents should be "how".
Basically I'm splitting up my function for good reasons.
If a function is long enough to contain enough actions, intermediate variables, loops and conditionals so that you need to stop thinking about and wonder what they do, and what their role is inside that function... That functions should be split up, because you do not have a clear "what" and "how" delimiter.
And guess what? That also assists testability. You can test that the "how" correctly assesses the "what".
This is not destroying your system.
Some of the functions you end up with may even turn out to be reusable across the class/system, meaning you increases consistency and correctness as a result too.
Again: This is not destroying your system. Quite the opposite.
I could go on, but I'm just at the 4th page of 21, and my comment is already the biggest in the thread, and I'm not planning on making a blog-post length response.
Just saying this document is severely biased, contains factual errors and is not a good thing to rely on to present an argument.
The author is correct, in the sense that over the lifetime of a software project 90% of the value from unit testing will come from 10% of the tests. The problem is that you won't know in advance which 10% that's going to be, so you have to write and maintain all of them.
I worked on projects that tested every getter and setter.
Many of these getters and setters existed only for other tests! Complete waste of time testing those...
I posted, I have no connection to the consultancy. I thought there were some good arguments in the PDF which I agree with after > 10 years dev experience.
The article mentions object orientation several times as a hindrance to unit testing. Perhaps that should be its biggest take-away?
Having written an accounting system, including web/user interface and asynchronous/deferred coordination (such is HTTP/browser programming), for the last three years, I can say that functional programming is increasingly helping my team stay sane.
We do TDD always; sometimes in the form of unit tests; sometimes in the form of integration tests and we tend to write as many of our tests as random/generative tests to avoid having to write large code-bases. I've spent the last two days making a piece of our domain monoidal, having defined the three laws as property/generative tests; the rest of the time is an interactive play with the generator, to see if it can come up with counter-examples, to the code I just wrote.
I normally go about coding by writing a very high-level integration test (at the top-most layer that I still have code in the service/frontend); then I write a huge chunk of the code until I think it's correct and looks pristine and easy to maintain. Now I run the test. If it fails and the test is correct — it tests the right thing and is easy to read — then I start at the top (highest level) of the stack, and write down the assumptions I have thought about while designing the code — as unit tests. Until one passes (at some level n); at the higher level n+1, an assumption/unit test is now broken and I can divide-and-conquer until I find the line of code that doesn't work.
This, together with purification; the methodological extraction of pure functions (things that only have one output ever, for a particular input), makes it possible to avoid testing any side-effects/async things (they simply flow non-async data between pure functions).
This, together with first-level-values for control flow, aka. not using exceptions, and generative/random testing, makes it so that ALL of the input has valid output, makes it so that all functions are total functions. And this in turn makes the code uncrashable and bugfree (for the domains of bugs the above methodology removes).
The domains of bugs that the above doesn't eradicate, are primarily cross-browser bugs on the GUI-side, or UX bugs, where a feature is hard to use/understand. On the server-side we sometimes crash when our logging storage in ElasticSearch goes down and never comes up, the intermediate buffer (Logstash) fills up, and then the app buffer fills up and then the app livelocks, waiting for the logging to drain. (=> operations). The second most frequent reason we have any exceptions/errors/bugs is DNS not working.
The first year of writing the software, we still invoked libraries that threw exceptions, but now we've rewritten them all to do control flow with first-class values, so that is not an issue any longer.
I just wanted to share how we do stuff at qvitoo :), in case it helps anybody.
not only is it often waste but it can tremendously add to code bloat and 'dept'. I saw a dev spending 80% of his time 'fixing test setup' instead of fixing code.
nit tests are not a silver bullet, but if you request passing tests and coverage to put in prod, you basically request a minimal & always up to date documentation.
This is truly the dumbest thing I've read all year about software development. It isn't just dumb though, it's harmful. Now moron developers everywhere will hold it up as an excuse why their code is so good it doesn't need testing. Now, especially after Equifax, is a bad time to have this attitude. There are already journalists calling for developer licensure.
This is a critique of popular testing practices by an experienced engineer who is trying to improve the practice of testing and increase software quality. This is a big part of how engineering practices improve. A culture where we call something "dumb" without reading it because we only care what "moron developers" will make of it (presumably also without reading it) isn't going to increase software quality or protect against the next Equifax. It's easy to have a knee-jerk reaction to a title and rationalize the desire to respond on that basis. Surely nobody else will read the article, and therefore I have already determined its true significance by having a knee-jerk response to the title or a few haphazardly skimmed sections. If that reasoning prevailed, though, people should just write headlines and never anything more thoughtful, and we should be stuck forever with "UNIT TESTING GOOD" and "UNIT TESTING BAD" as the two competing pinnacles of software testing wisdom.
In case anyone was fooled by this comment into thinking the article takes a lazy approach to testing, it argues for keeping a certain class of unit tests; for another class, preferring system tests instead; and for third class, turning them into assertions that ship with production code where possible. That's not lazy. Depending on how you currently test your code, it may actually be a harder standard to meet. It's possible you personally have nothing to learn from it, but I would wager there's something in it that will make you see your tests in a different way.
In my experience the usefulness of Unit Tests is very dependent on the quality of the team and its grasp on the business requirements. Some programmers can write high quality code that doesn't need unit/integration testing at all, some can't.
Legacy code = code without tests. You might write HQ code today, but without tests it'll become technical debt within a month or even a week down the line. Programmers come and go, unit tests are omnipresent.
oh no! that consultant who I paid $100K to tell me that unit testing would solve all my problems didn't tell me that. maybe its because they have never written a line of code in their life let alone delivered a successful large scale software project, but who knows.
All code is legacy code as soon as you type the semi-colon. It is wishful thinking that a suite of unit tests will prevent a refactor or small change from producing technical debt.
In a sense you are right. That state where you have the complete program as a mental model goes away when you stop for that session. But if you pick it up the morning after that mental model could be reconstructed in < 1hr and you can finish the feature and deliver it as value to a user. If you created good tests (unit/integration/regression) that captures the gist of what the code should do, two-year-older-you or another developer have a good chance of getting a similar mental model in order to change the code with confidence. So i consider "legacy code" as a scale where you on one end go "The reqs have changed, better rewrite it all!" to "The reqs has changed, but I can confidently add this functionality and keep some of the unchanged behavior!"
So is a big/bad test suite a burden? Sure. But is it a burden compared to maintaining specifications of other kinds? Is it a burden compared to working in a system with neither type of specification?
Further, the people writing long articles like this are (or at least were) very good developers. There is often an element of "good developers write good code, so just use good developers" in them. But writing good software with good developers was never the problem. The problem is making software that isn't terrible, with developers ranging from good to terrible, and most being mediocre.