The normal practice for large scale codebases in complex domains is "the code is the spec". That is, the only specification for how the system should work is how it worked yesterday.
In that case, unit tests serve as a great specification. Even tests that just duplicate the business code under test and assert that it's the same (A huge waste in normal cases) is useful. Because a Unit test is much better than a word document in describing how code should behave. I very much prefer a massive, hard to maintain set of poorly written unit tests, to a large number of outdated documents describing how every bit of the system should work.
So is a big/bad test suite a burden? Sure. But is it a burden compared to maintaining specifications of other kinds? Is it a burden compared to working in a system with neither type of specification?
Further, the people writing long articles like this are (or at least were) very good developers. There is often an element of "good developers write good code, so just use good developers" in them. But writing good software with good developers was never the problem. The problem is making software that isn't terrible, with developers ranging from good to terrible, and most being mediocre.
The article addresses your first point in a big section:
1.4 The Belief that Tests are Smarter than Code
Telegraphs Latent Fear or a Bad Process
Your other point about only good programmers not needing unit tests is moot as you haven't followed it through to the conclusion. Namely, if bad programmers write bad code that needs unit tests, they're also going to write bad unit tests that don't test the code correctly. So what's the point?
> Namely, if bad programmers write bad code that needs unit tests, they're also going to write bad unit tests that don't test the code correctly. So what's the point?
Let's assume you hire good programmers, because otherwise you're doomed. But oftentimes, the "good" programmer and the "bad" programmer are the same person, six months apart:
1. John writes some good code, with good integration tests and good unit tests. He understands the code base. When he deploys his code, he finds a couple of bugs and adds regression tests.
2. Six months later, John needs to work on his code again to replace a low-level module. He's forgotten a lot of details. He makes some changes, but he's forgotten some corner cases. The tests fail, showing him what needs to be fixed.
3. A year later, John is busy on another project, and Jane needs to take over John's code and make significant changes. Jane's an awesome developer, but she just got dropped into 20,000 lines of unfamiliar code. The tests will help ensure she doesn't break too much.
Also, unit tests (and specifically TDD) can offer two additional advantages:
1. They encourage you to design your APIs before implementing them, making APIs a bit more pleasant and easier to use in isolation.
2. The "red-green-refactor-repeat" loop is almost like the "reward loop" in a video game. By offering small goals and frequent victories, it makes it easier to keep productivity high for hours at a time.
Sometimes you can get away without tests: smaller projects, smaller teams, statically-typed languages, and minimal maintenance can all help. But when things involve multiple good developers working for years, tests can really help.
Here's the crux of your argument - and I honestly think it's a flawed premise: The failure of unit tests indicates something other than "Something Changed".
Were the failing tests due to John/Jane's correctly coded changes, regressions, or bad code changes? The tests provide no meaningful insight into that - it's still ultimately up to the programmer to make that value judgement based on the understanding of what the code is supposed to do.
What happens is that John and Jane make a change, find the failing unit tests, and they deem the test failures as reasonable given the changes they were asked to make. They then change the tests to make them pass again. Again, the unit tests are providing no actual indication that their changes were the correct changes to make.
WRT the advantages:
1. "design your APIs before implementing them" - this only works out if we know our requirements ahead of time. Given our acknowledgement that these requirements are usually absent via Agile methodologies, this benefit typically vanishes with the first requirement change.
2. "makes it easier to keep productivity high for hours at a time" This tells me that we're rewarding the wrong thing: the creation of passing tests, not the creation of correct code. Those dopamine hits are pretty potent, agreed, but not useful.
>> Here's the crux of your argument - and I honestly think it's a flawed premise: The failure of unit tests indicates something other than "Something Changed".
Even if all you know is "something changed", that's valuable. Pre-existing unit tests can give you confidence that you understand the change you made. You may find an unexpected failure that alerts you to an interaction you didn't consider. Or maybe you'll see a pass where you expected failure, and have a mystery to solve. Or you may see that the tests are failing exactly how you expected they would. At least you have more information than "well, it compiled".
And if "the unit tests are providing no actual indication that their changes were the correct changes to make", even the staunchest proponents of TDD would probably advise you to delete those tests. If there's 1 thing most proponents and opponents of unit testing can agree on, it's probably that low-value tests are worse than none at all.
I have no idea if this describes you, but I've noticed a really unfortunate trend where experienced engineers decide to try out TDD, write low-value tests that become a drag on the project, and assume that their output is representative of the practice in general before they've put the time in to make it over the learning curve. People like to assume that skilled devs just inherently know how to write good unit tests, but testing is itself a skill that must be specifically cultivated.
Somebody could do the TDD movement a great big favour and write a TDD instruction for people who actually already know how to write tests.
There probably are good resources about this somewhere, but compared to the impression of "TDD promotes oceans of tiny, low-value tests" they lack visibility.
Agreed, testing is an art that most developers do not have. It must be cultivated and honed. I have been programming for over 35 years and developing tests is painstakingly difficult to cover all your bases.
> Here's the crux of your argument - and I honestly think it's a flawed premise: The failure of unit tests indicates something other than "Something Changed".
No, the failing tests indicate something changed, and where it changed - what external behavior of a function or class changed. Is the change the right thing, or a bug? Don't know, but it tells you where to look. That's miles better than "I hope this change doesn't break anything".
> "design your APIs before implementing them" - this only works out if we know our requirements ahead of time.
No. "API" here includes things as small as the public interface to a class, even if the class is never used by anything other than other classes in the module. You have some idea of the requirements for the class at the time when you're writing the class; otherwise, you have no idea what to write! But writing the test makes you think like a user of that class, not like the author of that class. That gives you a chance to see places where the public interface is awkward - places that you wouldn't see as the class author.
> This tells me that we're rewarding the wrong thing: the creation of passing tests, not the creation of correct code.
We're rewarding the creation of provably working code. I fail to see how that's "the wrong thing".
If we were discussing integration tests, where the interactions between different methods and modules are validated against the input - I'd agree.
But this is about unit tests, in which case the "where" is limited to the method you're modifying, since it's most likely mocked out in other methods and modules to avoid tight coupling.
> includes things as small as the public interface to a class
If we're talking about internal APIs as well, we can't forget that canonical unit testing and TDD frequently requires monkey patching and dependency injection, which can make for some really nasty internal APIs.
> I fail to see how [provably working code is] "the wrong thing".
So, thinking back a bit, I can recall someone showing me TDD, and they gave me the classical example of "how to TDD": Write your first test - that you can call the function. Now test that it returns a number (we were using Python). Now test that it accepts two numbers in. Now test that the output is the sum of the inputs. Congratulations, you're done!
Except you're not, not really. What happens when maxint is one of your parameters? minint? 0? 1? -1? maxuint? A float? A float near to, but not quite 0? infinity? -infinity?
Provably working code is meaningless. Code that can be proven to meet the functionality required of it - that's what you really want. But that's hard to encapsulate in a slogan (like "Red, Green, Refactor"), and harder use as a source of quick dopamine hits.
> But this is about unit tests, in which case the "where" is limited to the method you're modifying...
Sure, but the consequences may not be. Here's a class where you have two public functions, A and B. Both call an internal function, C. You're trying to change the behavior of A because of a bug, or a requirement change, or whatever. In the process, you have to change C. The unit tests show that B is now broken. Sure, it's the change to C that is at fault, but without the tests, it's easy to not think of checking B. That's the "where" that the tests point you to.
> Provably working code is meaningless. Code that can be proven to meet the functionality required of it - that's what you really want.
Um, yeah, of course that's what you really want. Maybe you should write your tests for that, and not for stuff you don't want? If you don't care about maxint (and you're pretty sure you're never going to), don't write a test for maxint. But it might be worth taking a minute to think about whether you actually do need to handle it (and therefore test for it).
> canonical unit testing and TDD frequently requires monkey patching and dependency injection
This is probably the root of most of our disagreement. I belong to the school of thought that says (in most cases) "Mocking is a Code Smell": https://medium.com/javascript-scene/mocking-is-a-code-smell-... And dependency injection (especially at the unit test level) is giant warning sign that you need to reconsider your entire architecture.
Nearly all unit tests should look like one of:
- "I call this function with these arguments, and I get this result."
- "I construct a nice a little object in isolation, mess with it briefly, and here's what I expect to happen."
At least 50% of the junior developers I've mentored can learn to do this tastefully and productively.
But if you need to install 15 monkey patches and fire up a monster dependency injection framework, something has gone very wrong.
But this school of thought also implies that most unit tests have something in common with "integration" tests—they test a function or a class from the "outside," but that function or class may (as an implementation detail) call other functions or classes. As long as it's not part of the public API, it doesn't need to be mocked. And anything which does need to be mocked should be kept away from the core code, which should be relatively "pure" in a functional sense.
This is more of an old-school approach. I learned my TDD back in the days of Kent Beck and "eXtreme Programming Explained", not from some agile consultant.
I feel like you're perhaps defining unit tests to exclude "useful unit tests" by relabeling them. Yes, if you exclude all the useful tests from unit testing, unit testing is useless. Does a unit test suddenly become a regression test if the method it tests contains, say, an unmocked sprintf - which has subtle variations in behavior depending on which standard library you link against? No true ~~scotsman~~ unit test would have conditions that would only be likely to fail in the case of an actual bug?
> But this is about unit tests, in which case the "where" is limited to the method you're modifying
That's still useful. C++ build cycles mean I could have easily touched 20 methods before I have an executable that lets me run my unit tests, telling me which 3 of the 20 I fucked up in is useful. Renaming a single member variable could easily affect that many methods, and refactoring tools are often not perfectly reliable.
Speaking of refactoring, I'm doing pure refactoring decently often - I might try to simplify a method for readability before I make behavior changes to it. Any changes to behavior in this context are unintentional - and if they occur, 9 times out of 10 discover I have a legitimate bug. Even pretty terrible unit tests, written by a monkey just looking to increase code coverage and score that dopamine hit can help here - to say nothing of unit tests that rise to the level of being mediocre.
Further, "where" is not limited to "the method you're modifying". I'm often doing multiplatform work - understanding that "where" is in method M... on platform X in build configuration Y is extremely useful. Even garbage unit tests with no sense of "correctness" to them are again useful here - they least tell me I've got an (almost certainly undesirable) inconsistency in my platform abstractions, likely to lead to platform specific bugs (because reliant code is often initially tested on only one of those platforms for expediency). This lets me eliminate those inconsistencies at my earliest convenience. When writing the code to abstract away platform specific details, these inconsistencies are quite common.
I recently started maintaining a decades old PHP application. I have been selectively placing in unit and functional tests as I go through the massive codebase. Whenever I refactor something I've touched before, I tend to get failing tests. If nothing else, this tells me that some other component I worked on previously is using this module and I should look into how these changes affect it.
Unfortunately, the existing architecture doesn't make dependencies obvious. So simply knowing that "something has changed" is very, very helpful.
I appreciate your answer, but the destructiveness of the reward loop is directly addressed in the PDF.
I also have no problems designing APIs, experience is what you need and the experience in asking the right questions. No amount of TDD will solve you getting half-way through a design and then finding you needed many-many because you misunderstood requirements.
Other than that API design is (fairly) trivial. I spend a tiny amount of my overall programming time on it. I will map out entire classes and components for large chunks of functionality without filling in any of the code in less than an hour or two and without really thinking about it. Then the hard part of writing the code begins which might take a month or two, and those data structures and API designs will barely change. I see my colleagues doing similar check-ins, un-fleshed components with the broad strokes mapped out.
I'd say it's similar to how we never talk about SQL normal forms any more. 15 years ago the boards would discuss it at length. Today no-one seems to. Why? Because we understand it and all the junior programmers just copy what we do (without realizing it took us a decade to get it right). API design was hard, but today it's not, we know what works and everyone uses it. We've learnt from Java Date and the vagaries in .net 1.0 or PHP's crazy param ordering and all the other difficult API designs from the past and now just copy what everyone else does.
> I appreciate your answer, but the destructiveness of the reward loop is directly addressed in the PDF.
I've gone back through most of the PDF, and I can't figure out what section you're referring to. There's a bunch of discussion of perverse incentives (mostly involved incompetent managers or painfully sloppy developers, who will fail with any methodology), but I don't see where the author addresses what I'm talking about.
Specifically, I'm talking about the hour-to-hour process of implementing complex features, and how tests can "lead" you through the implementation process. It's possible to ride that red-green-refactor loop for hours, deep in the zone. If a colleague interrupts me, no problem, I have a red test case waiting on my monitor as soon as I look back.
This "loop" is hard to teach, and it requires both skill and judgment. I've had mixed results teaching it to junior developers—some of them suddenly become massively productive, and others get lost writing reams of lousy tests. It certainly won't turn a terrible programmer into a good one.
If anything, the biggest drawback of this process is that it can suck me in for endless productive hours, keeping me going long after I should have taken a break. It's too much of a good thing. Sure, I've written some of the most beautiful and best-designed code in my life inside that loop. But I've also let it push me deep into "brain fry".
> No amount of TDD will solve you getting half-way through a design and then finding you needed many-many because you misunderstood requirements.
I've occasionally been lucky enough to work on projects where all the requirements could be known in advance. These tend to be either very short consulting projects, or things like "implement a compiler for language X."
But usually I work with startups and smaller companies. There's a constant process of discovery—nobody knows the right answer up front, because we have to invent it, in cooperation with paying customers. The idea that I can go ask a bunch of people what I need to build in 6 months, then spend weeks designing it, and months implementing it, is totally alien. We'd be out of business if it took us that long to try an idea for our customers.
It's possible to build good software under these conditions, with fairly clean code. But it takes skilled programmers with taste, competent management and a good process.
Testing (unit, integration, etc.) is a potentially useful part of that process, allowing you to adapt to changing requirements while minimizing regressions.
My point was that even bad unit tests are better than both good/bad word docs or no unit tests. So even bad tests written by bad programmers have a value. They have a large COST too (which is what he's arguing), my argument was merely that the value might not be less than the cost, which is his assertion.
> Software engineering research has shown that the most costeffective
> places to remove bugs are during the transition from
> analysis to design, in design itself, and in the disciplines of
> coding. It's much easier to avoid putting bugs in than to take
> them out.
Everyone knows that. It's just beating a dead horse. But companies demonstrably want to ship buggy and feature rich software fast, rather than ship well designed and lean software. And b) don't want to pay for developers of the kind that write good code from the beginning. So with that out of the way, the whole point of real world development methodology is to make sure that the feature-bloated code hastily written by average developers doesn't lack a specification, doesn't deteriorate over time into something that has to be abandoned, and doesn't have such a high risk of modification/refactoring that it can't be maintained.
Now there are some good points to 1.4 "code is better than tests" actually applies: make asserts in the code rather than in the tests, where possible. Fully agreee with that. Or even better I'd say: don't "assert" things, just make the invalid state impossible. This is what types are for. If you can't take a null into a method, don't even accept an object that could be null by accident. Accept an Option type. Yes even in Java. Even in bloody C. Anything that a compiler could have caught shouldn't be either a runtime assert nor a test.
> if bad programmers write bad code that needs unit tests, they're also going to write bad unit tests that don't test the code correctly. So what's the point?
Well firstly they are usually better than no tests. And second, they tend to fail too often, rather than too little. This is what causes the enormous cost incurred by these tests. But apart from that 90% unnecessary cost (test failures are failures of the code rather than the costs, so after changing the code you have to change the test) - they still do add a lot of confidence for refactoring. Because in my experience with a large but bad test suite, there are often false positives but false negatives are much rarer.
The logical conclusion of this line of argument, which is dependent on the proposition that even bad unit tests have value, is that we could just automatically generate copious numbers of unit test cases for a given code base, and we would have something of value. In reality, this approach would be useless even for functionally-invisible refactoring or rewriting, precisely because it would be at the unit level. For that, let alone any substantive changes, Coplien's claim that higher-level tests are better stands, so this article cannot be dismissed as "beating a dead horse."
> The logical conclusion of this line of argument, which is dependent on the proposition that even bad unit tests have value, is that we could just automatically generate copious numbers of unit test cases for a given code base, and we would have something of value.
Actually, this is a surprisingly useful strategy! There are two common versions of this idea:
1. Invariant testing with random data. This usually goes by the name of "QuickCheck", and it allows writing tests that say things like, "If we reverse a random list twice, then it should equal the original list." In practice, this catches tons of corner case bugs.
2. Fuzz testing. In this case, the invariant is, "No matter how broken the input, my program should never crash (or corrupt the heap, or access uninitialized memory, or whatever)." Then you generate a billion random test cases and see what happens. This usually finds tons of bugs, and it's a staple of modern security testing.
So yeah, even random test cases are very valuable. :-)
These are indeed useful strategies, but they do not invalidate the point I am making here. Fuzz testing is system-level testing, not unit tests, and so is along the lines Coplien is proposing (especially when combined with assertions in the code.) Invariant testing implies an invariant, i.e. an interface-level design contract, while the post I was replying to was predicated on systems where "the code is the spec", and my reply considers only automated unit-test generation that is based only on what the code does, not what it is supposed, in more abstract terms, to do. The "logical conclusion" is based on the possibility of automatically generating unit test cases that function as such, up to any given level of coverage, without adding value.
OK, just to back up a step, here's my best argument in favor of automated testing:
Virtually nobody just makes a change and ships it users without some kind of testing. Maybe you run your program by hand and mess around with the UI. Maybe you call the modified function from a listener with a few arguments. Maybe you have sadistic paid testers who spend 8 weeks hammering on each version before you print DVDs. (And those testers will inevitably write a "test plan" containing things to try with each new release.)
The goal of automated testing is to take all those things you were going to do anyway (calling functions, messing with the UI, following a test plan), and automate them. Different kinds of testing result in different kinds of tests:
1. Manually calling a function can be replaced by unit tests, or even better, doc tests. Everybody loves accurate documentation showing how to use an API!
2. Manually messing around with the UI can be replaced by integration tests.
3. Certain things are much faster to test thoroughly in isolation, so you may mock out parts of your system while testing other parts in great detail, then tie everything together with a few system-wide tests.
People love to invent elaborate taxonomies of testing (unit, request, system, integration, property, etc.). But really, they're all just variations on the same idea: You'd never ship code without testing it somehow, and it's boring and error-prone to do all that testing by hand every time. So automate it!
No-one is arguing against automated testing - in fact, I have said elsewhere that I think test automation is the best single thing most development organizations could do to improve their process. In the article, Coplien says that his position is often mistaken for an attack on test automation, which it is not - and it is certainly not an argument against testing, either; it is an argument about how to test effectively.
What I am doing here is using the idea of automatic unit test generation to show that Coplien's argument cannot be dismissed with the claim that all unit testing adds value.
> The logical conclusion of this line of argument, which is dependent on the proposition that even bad unit tests have value, is that we could just automatically generate copious numbers of unit test cases for a given code base, and we would have something of value.
I don't think that conclusion is any more valid than the proposition that if we just skipped the developer alltogether and generated both the business code AND the test, we'd have something of value.
> Coplien's claim that higher-level tests are better stands, so this article cannot be dismissed as "beating a dead horse."
i think higher level tests are good, but I don't think that makes lower level tests bad. I think there should usually be a health mix, with systems tested on several levels.
> I don't think that conclusion is any more valid than the proposition that if we just skipped the developer alltogether and generated both the business code AND the test, we'd have something of value.
That would have some value, as evidenced by all the useful spreadsheets produced by end users, but the the value of the conjunction ("the business code AND the test") lies entirely in the 'business code' part, which is why we do not automatically generate unit tests for spreadsheet macros, or any other code for that matter.
>I think higher level tests are good, but I don't think that makes lower level tests bad.
That's not my point; my point is that unit tests are not automatically useful. Once we accept that, Coplien's argument that there are better ways to spend our limited resources cannot be dismissed (and resources are always limited, because of the combinatorial explosion, which is most evident at the unit level).
Any idiot can ship version 1.0. I get exasperated with people who try to act like they have some special wisdom to share about getting the first version out. Yes, there are some specific things you need to do that some of us lack, but you can find those in very nearly every single self-help or motivational book.
It takes some proper skill, vision, and maturity to ship version 3.0. And you probably won’t get a big payday if you can’t.
> ... if bad programmers write bad code that needs unit tests, they're also going to write bad unit tests that don't test the code correctly. So what's the point?
Without getting into the greater unit testing debate... the point would be having a testable code base.
Those who test-first guarantee that tests exist and that code can be exercised in a test-harness. Those who do not generally have code that cannot be exercised in a test-harness. Granting bad developers who write bad code and bad tests: they're at the least writing tested bad code. That lowers the burden to fixing it, and ensures that any given issue will be solved in isolation instead of requiring a major, risky, rewrite to get the system to the point of further maintenance.
Having that test apparatus puts your maintenance developers on a much nicer foot and provides the blueprints for migrating, abstracting, or replacing most any system component, along with meaningful quality baselines. It's a mitigation technique for the high-level big-bang rinse-and-repeat system cycles that pop up in the Enterprise space, but no silver bullet.
Bad tests can be deleted or improved with impunity, but a bad logical system core made with no eye towards verification is expensive and risky to even touch. For a small website that means nothing, for an complex legacy monster system about to be rewritten for the 4th time it can make all the difference in the world.
Testable code is not a good thing in and of itself. Code should only be testable at its high level interfaces i.e. interfaces that are close to the requirements. So if the code is a library then the high level interface may be the public interface of a class. In that case you may need single class tests. But to do anything useful (in the sense of high level requirements) most code use a combination of various different classes or modules. Write tests at this level because it gives you two advantages:
1) If you decide to change your implementation (refactor low-level code to make it more performant/re-usable/maintainable) then you won't have to change your tests.
2) It will simplify your implementation code.
The second point is as important as the first: In general your code should have the minimum number of abstractions needed to satisfy your requirements (including requirements for re-use/flexibility etc). Testable code often introduces new abstractions or indirections just for the sake of making it possible to plug in a test. One or two of these are OK. But when you add up all the plug points for low level tests in a system it can make the code a lot more complex and indirect than it needs to be. This actually makes it harder to change which was not the idea behind making it testable in the first place.
Gui code may be an exception here. Because gui's are often slow and unstable to test it is worth cleanly separating the gui from the model just to make the system more testable, even if you never intend to use the model with any other kind of ui (the usual motivator for separating model and gui). Any model-view system will help here. If you have a system where the gui is more or less a pure function of the model then you can write all your scenario/stateful tests at the model level and just write a few tests for the ui to verify that it reflects the model accurately
Well-structured code is a good thing in and of itself, and more testable code is inherently more well-structured (or rather, untestable code is inherently badly structured; it's still possible to have testable but badly structured code, but if you insist your code be testable then you will avoid many of the pitfalls)
> Testable code is not a good thing in and of itself. Code should only be testable at its high level interfaces i.e. interfaces that are close to the requirements. So if the code is a library then the high level interface may be the public interface of a class.
Perhaps I misunderstand you, but I have to disagree with this.
Testable code is absolutely a good thing in and of itself.
When something breaks, being able to isolate each piece of the codebase and figure out why it's breaking is essential.
Far too often I've got bug reports from other developers saying "System X is broken, what changed?". When I ask for more detail, a test case proving it's broken, etc - they can't come up with that. Their test case is "our entire service is broken and I think the last thing that happened was related to X". Sometimes it is a problem with System X, but more frequently it's something else in their application.
The smaller you can make your test case, the better, and the easier it is to get a grasp of the acutal problem.
In my experience, it would be pretty hard to consistently write unit tests that pass and at the same time don't test the code. They might not test the code complete enough, but they will test the code. That's still better than not testing at all, and can always be fixed by adding tests later.
Tests don't have to be smarter than the code. They just have to be different code. They're screening tests, not diagnostic tests - if the test and the code disagree, you might have a problem. If they don't, then hopefully you don't.
But exactly these tests tend to be the useless ones that just pass and don't find bugs.
On 2 separate occasions I had an app with extensive unit tests that seemed to work fine but also seemed to have strange rare bugs.
Both times I wrote a single additional "unit" test that fired up the environment (with mocks, same as the other unit tests) but then acted like a consumer of the API and spammed the environment with random (but not nonsense) calls for several minutes. These tests were quite complex so basically the exact opposite of what you're suggesting.
Not only did I immediately find the bug, but in both cases I found like 10 bugs before I got the test to even pass for the first time.
At the same time all the small unit tests were happily passing. Because they didn't hit edge cases (both in data and in timing) that nobody had thought of.
Yeah but the cases I had in mind, I probably coded correctly. The cases I forgot about are more likely to cause problems (weird cases in both data and especially timing / concurrency handling).
>, if bad programmers write bad code that needs unit tests, they're also going to write bad unit tests that don't test the code correctly. So what's the point?
I don't think one necessarily follows the other. Consider the examples of test pseudo-code:
A novice bootcamp graduate could write that test harness code.
However, it requires an experienced programmer with a PhD in Machine Learning to write the actual neural net code for image_to_text(). (Relevant XKCD[1])
(Yes, there's also variability of skill in writing test code. An experienced programmer will think of extra edge cases that a novice will not and therefore, write more comprehensive tests.)
That still doesn't change the fact that writing code that tests to check for correct results is often easier than writing the algorithm itself.
It also contradicts the notion that refactoring code requires that the test code be refactored. That's not always true. Since the jpg file format came out in 1992, that test code could have been written in that same year and it would still be valid today, 25 years later. We still want to ensure the image_to_text() function will return "cat" for "photo1.jpg" even if we refactor the low-level code to use the latest A.I. algorithms.
Another example of a regression test to test performance:
Again, it's easy for a novice programmer to write a "performance timing specification executes in less than 500ms". It's much harder to refactor the code from slow code O(n^2) to faster code O(n log n).
To do a meta-analysis of the debate with Coplien: a big reason we're arguing his conclusion is that he complains about this abstract thing he called "unit tests". If he had copy-pasted concrete examples of the bad unit testing code into his pdf, maybe more of us would agree with him? In other words, I'd like to see and judge for myself if Coplien's "unit tests" look like the regression tests that SQLite and NASA use to improve quality -- or -- if they are frivolous tests that truly wastes developers' time. Since he didn't do that, we're all in the dark as to the "test code waste" he's actually complaining about.
Yes, and furthermore, due to the combinatorial explosion, and as explained in the article, unit tests can only cover an infinitesimal fraction of all the possibilities at that level. The sort of interfaced-based testing favored in the article ameliorates the problem because it uses the inherent abstractions of the code to prune the problem space.
In addition, if you depend on unit tests, you cannot tell if a change has broken the system - you can expand or rewrite your tests to cover the change (if we put aside the question of what requirements these tests are testing), but that cannot tell you if you have, for example, violated a system-wide constraint. Only testing at higher levels of abstraction, up to the complete-system level, can help you with that.
Therefore, to the extent that the "the code is the specification" argument is a valid one, it actually leads to the conclusion that Coplien is right.
In my opinion, the way that most unit tests are written is mostly a waste of effort, but there are a class of unit tests that are much better suited to the task. The class of unit tests I'm referring to is property-based tests.
The most famous property-based testing framework is QuickCheck, but there are plenty of others, e.g. FsCheck for C#/F#, ScalaCheck for Scala, Hypothesis for Python, etc...
The general idea being that you write out a specification that the code should meet, and the test parameters are determined at runtime, rather than being hardcoded. There's more to it than that, but that's the general idea. If you want a simple introduction, I'd recommend this video:
That’s actually one of the reasons I like Elixirs approach to inline documentation so much. If you include a code example, the example is run as part of the test suite. Makes is dead simple to automatically include simple unit tests right where the code is along with documentation that had to remain up to date.
This is also somewhat popular in Python, Haskell and Rust. Nice, but doesn't seem applicable to all the situations. Works quite well for collections of simple independent functions e.g. I did that for a regex library https://github.com/myfreeweb/pcre-heavy/blob/9873fb019873340... — but didn't use much for other projects…
Yea, definitely not for everything. It's more of a complement to the rest of the test suite and also verifies that the documentation/examples in code comments are still accurate.
> That is, the only specification for how the system should work is how it worked yesterday.
You'd think, if this was the desired model, you could partially automate the "writing unit tests" part of this process. (Integration tests no, but unit tests yes.) The "spec" for the unit tests is already there, in the form of the worktree of the previous, known-good commit.
That means that, in a dynamic language, you'd just need a "test suite" consisting of a series of example calls to functions. No outputs specified, no assertions—just some valid input parameters to let the test-harness call the functions. (In a static language, you wouldn't even need that; the test-harness could act like a fuzzer, generating inputs from each function's domain automatically.)
The tooling would then just compare the outputs of the functions in the known-good-build, to the outputs of the same functions from your worktree. Anywhere they differ is an "assertion failure." You'd have to either fix the code, or add a pragma above the function to specify that the API has changed. (Though, hopefully, such pragmas would be onerous enough to get people to mostly add new API surface for altered functionality, rather than in-place modifying existing guarantees.) A pre-commit hook would then strip the pragmas from the finalized commit. (They would be invalid as of the next commit, after all.)
Interestingly, given such pragmas, the pre-commit hook could also automatically derive a semver tag for the new commit. No pragmas? Patch. Pragmas on functions? Minor version. Pragmas on entire modules? Major version.
Ah, but the spec is often really poorly defined. That is all sorts of edge cases and uncommon code paths do not work as imagined. And, of course, it is full of bugs.
I have found countless "bugs" and issues when writing tests for existing code. I also often have to refactor the code to make it clearer what it's doing or to make it testable.
Unit tests are a development tool more than they are a testing tool. They are a means to produce correct, well documented, well speced, cleanly separated code.
> The normal practice for large scale codebases in complex domains is "the code is the spec".
”The code is the spec” is just a way of saying “we don't know what the code is supposed to do, and the intent was never, even initially, properly validated by domain experts, or, if it was, we tossed the knowledge generated in that process.”
This is especially the case if that phrase is used about a large system in a complex domain.
I don't see such a large overlap between code review and unit testing. Here's an example from one of my code reviews, where I discovered two issues:
1) A relaxed requirement was agreed to by the PO, but the remote developer wasn't aware of that and wrote code to fulfil the original complex requirement. The implementation was harder to understand and touched more areas of the code, leading to issue number 2.
2) By analysing the interactions between multiple units, I was able to determine that the code as implemented was in fact reading a setting too early, when it was not available, thereby having no effect at all compared to the already existing code. Ironically, this exact concern had prompted the negotiation with the PO which resulted in the relaxed requirement.
Interestingly the error was not caught by unit tests, because it didn't happen in a single unit. It wasn't caught by integration tests, because that particular online component was mocked and it passed code review by two other developers.
How many times do you need to repeat the review of a given unit of code? (actually, that would have more chance of catching additional bugs than would rerunning the same unit tests.) Once you change the code, the existing unit tests are, by definition, invalidated (i.e. there is no value to being able to rerun them), while most higher-level testing will retain their validity. You have actually created an argument for automated higher-level testing (the author mentions that his position is often mistaken for an attack on test automation.)
By definition, unit tests test the implementation. A test at the API design contract level is an integration test. The only place where these are the same thing are pure (side-effect-free) functions that do not call other functions.
... unit testing is automated, repeatable, code review!
Except when it isn't.
"Unit tests can be one form of code review (that also happens to have the advantage of being automated and repeatable). But should never be mistaken as substitute for actually understanding what the code does, why it does it and how it got that way. Which requires an incompressible amount of effort, analysis, and plain and pure grit", would be another take on the matter.
> Even tests that just duplicate the business code under test and assert that it's the same (A huge waste in normal cases) is useful.
Maybe, but there's no reason to write such tests by hand. Just automatically test each method against the previous version and see what changed.
I think the generated results would be too noisy and developers would stop paying attention to test failures, but maybe a strict enough code review process could prevent that.
Most unit tests don't test functionality at the business requirement level. Most unit tests I've seen are more fine grained. I think keeping your requirement documentation as comments on unit tests that are a literal translations of those documentations is amazing and very helpful.
But I would argue that represents in my experience just 10% of the unit tests I see. Instead I see a lot more testing of implementation details.
I agree. Espacially since most projects I worked on didn't have a doc at all. When the doc existed, it was always obsolete. Unit tests are not a silver bullet, but if you request passing tests and coverage to put in prod, you basically request a minimal & always up to date documentation.
And even with all the (very real) downsides of it, it still a huge plus.
Besides, a lot of persons tend to see tests in a very rigid way. You don't have to do tdd. You don't have to do have full coverage. You don't even need all your tests to be unit tests, you can mix end2end and functional tests with it. You can have hacky mocks in it too. It doesn't need to be perfect for your team to benefit from it.
Actually, I would say that you probably want it not to be perfect so that the benefits outweight the costs. Cause they, tests have a huge cost.
So is a big/bad test suite a burden? Sure. But is it a burden compared to maintaining specifications of other kinds? Is it a burden compared to working in a system with neither type of specification?
Further, the people writing long articles like this are (or at least were) very good developers. There is often an element of "good developers write good code, so just use good developers" in them. But writing good software with good developers was never the problem. The problem is making software that isn't terrible, with developers ranging from good to terrible, and most being mediocre.