Am I correct in reading that they performed this experiment only for two days, and entirely with graduate students?
If so, they have missed the point of TDD.
In the short term, TDD probably doesn't make a difference, one way or another.
But software as a business is not a short-term game.
I would love to see a study where the participants are, over a period of six months, given the same series of features (including both incremental improvements, as well as major changes in direction).
In my experience, teams that don't test at all quickly get buried in technical debt.
Untested code is nigh impossible to refactor, so nobody ever does, and the end result is usually piles of hacks upon piles of hacks.
As far as testing after development goes, there are three problems that I see regularly:
One, tests just don't get written. I have never seen a TLD (Test Later Development) team that had comprehensive code coverage. If a push to production on Friday at 6pm sounds scary, then your tests (and/or infrastructure) aren't good enough.
Two, tests written after code tend reflect what was implemented, not necessarily what was requested. This might work for open-source projects, where the developers are also the users, but not so much when building, say, software to automate small-scale farm management.
Three, you lose the benefit of tests as a design tool. Code that is hard to test is probably not well-factored, and it is much easer to fix that when writing tests, then it is to change the code.
The goal of the study was not to measure if testing is valuable at all, but to measure if there is any difference between TDD and TLD. I think no-one questions the value of testing per se.
As for your points why TLD is theoretically worse:
1. "in TLD tests just don't get written" - this is a dual argument to "in TDD the code just doesn't get refactored". After tests are green, there is no motivation to do so and it feels like a waste of time (if it works, why change it, right?)". I think the latter is much, much worse for longevity of the project than low test coverage. Not enough coverage doesn't mean automatically bad structure of the code and you can always fix that by adding more tests later (and maybe fixing a few bugs detected by them). But writing code quickly to just make the tests green and then not doing refactoring quickly leads to bad structure of the code very quickly. Whether you chose TDD or TLD, you need to apply some discipline: in TDD to refactor often, in TLD to keep high test coverage.
2."tests written after code tend reflect what was implemented, not necessarily what was requested" - a dual argument for this exists as well: "code written after tests tends to reflect the tests on case-by-case basis, not necessarily the minimal general solution covering all possible inputs that should be the true goal of the implementation". Whenever I hear the argument that tests help write better code I always remind myself a famous Sudoku Solver written by Ron Jeffreys, TDD proponent: http://ravimohan.blogspot.com/2007/04/learning-from-sudoku-s...
I also saw that happen in a few real world projects - the code written after tests was just a giant if-else (or switch) ladder handling each test case separately. Bad, bad code, additionally missing a few important cases. And the most funny thing was that after seeing this during the code review, I rewrote the implementation in a more general way, got two tests failing and after investigation it turned out the tests were wrong and the code was good. Lol, verifying tests by implementation :D
3. Tests are not as a design tool. Tests are for... testing. They are often a reason for over-engineering and over-abstraction that makes the code more complex and harder to read. See FizzBuzz Enterprise Edition. It is definitely testable to death.
"Branches aren't merged without peer approval" has sufficed plenty in my experience.
Whether people code tests before thinking interfaces, before writing a prototype, before implementing, during the inevitable interface rewriting, after coding, after manual verification that it seems to work, or right before submitting the branch to review, doesn't matter as long as someone on the team looks over and sees that "yup, here are tests and they seem to cover the important parts, the interface makes sense, and the documentation is useful".
Then people can do TDD, TLD, TWD or whatever they personally feel most productive with. Developers being happy and feeling in control of their own work does more for quality than enforcing a shared philosophy.
If so, they have missed the point of TDD.
In the short term, TDD probably doesn't make a difference, one way or another.
But software as a business is not a short-term game.
I would love to see a study where the participants are, over a period of six months, given the same series of features (including both incremental improvements, as well as major changes in direction).
In my experience, teams that don't test at all quickly get buried in technical debt.
Untested code is nigh impossible to refactor, so nobody ever does, and the end result is usually piles of hacks upon piles of hacks.
As far as testing after development goes, there are three problems that I see regularly:
One, tests just don't get written. I have never seen a TLD (Test Later Development) team that had comprehensive code coverage. If a push to production on Friday at 6pm sounds scary, then your tests (and/or infrastructure) aren't good enough.
Two, tests written after code tend reflect what was implemented, not necessarily what was requested. This might work for open-source projects, where the developers are also the users, but not so much when building, say, software to automate small-scale farm management.
Three, you lose the benefit of tests as a design tool. Code that is hard to test is probably not well-factored, and it is much easer to fix that when writing tests, then it is to change the code.