Practitioners have a problem recruiting subjects. There is often a tradeoff between applying more rigorous experiment design and using convenience sampling (students) versus sacrificing controlled environments (so that professionals would actually join the study).
It's easy to condemn work like this but there's no other option. In this case the researchers chose to replicate a study (which often risks similar ire for telling us nothing new) with a commendable level of rigour and have provided more evidence that, for the scope of experiments we can construct that TDD is probably no different to TLD when using a population of relatively unqualified developers (students).
As to the problem being trivial, what else can be done? There's a finite time you can ethically expect participants to give to you, even if you pay them. If anything the criticism of this work is better directed at the limitations academics are forced to bear.
Your argument reminds me of the joke about the man searching for his keys under the streetlamp.
"A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, and that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, 'this is where the light is.'" [https://en.wikipedia.org/wiki/Streetlight_effect]
I agree that this is a well-designed study given its constraints. And it's admirable that it's a replication study.
That doesn't change the fact that it's largely irrelevant to professionals. It doesn't test the claims made by TDD proponents (TDD leads to better design, reduces long-term maintenance, allows for team coordination, etc.), nor does it address any of the interesting questions about TDD:
* Is TDD more effective in a professional setting than commonly-used alternatives?
* Is a mock-heavy approach to TDD more effective than a mock-light approach?
* Do people using TDD refactor their code more or less than people using a different but equally rigorous approach?
* Is the code done with TDD more maintainable than code done rigorously in another way?
* Is TDD easier or harder to sustain than equivalently-effective alternatives?
As a study, it's fine, if only of interest to academics. The problem isn't the study. It's the credulous response on the part of industry developers who then turn the false authority of the study into statements like "TDD doesn't lead to higher quality or productivity."
I skimmed the Fietelson's paper you linked and this jumped out at me:
"Students should generally not be used in studies that depend on specific expertise which requires significant experience and a long learning curve to achieve, or in studies of professional practices.
Such studies are best performed by observing and interviewing professionals, not by controlled experiments."
This seems directly relevant to this TDD study. Every one of these contraindications are true in this case.
Practitioners have a problem recruiting subjects. There is often a tradeoff between applying more rigorous experiment design and using convenience sampling (students) versus sacrificing controlled environments (so that professionals would actually join the study).
It's easy to condemn work like this but there's no other option. In this case the researchers chose to replicate a study (which often risks similar ire for telling us nothing new) with a commendable level of rigour and have provided more evidence that, for the scope of experiments we can construct that TDD is probably no different to TLD when using a population of relatively unqualified developers (students).
As to the problem being trivial, what else can be done? There's a finite time you can ethically expect participants to give to you, even if you pay them. If anything the criticism of this work is better directed at the limitations academics are forced to bear.