Is TDD Dead? (2014)

VHRanger · on Aug 26, 2020

The best talk on this topic, IMO is Ian Cooper: "TDD, Where did it all go wrong?" https://www.youtube.com/watch?v=EZ05e7EMOLM

Couple of notes:

- TDD, much like scrum, got corrupted by the "Agile Consulting Industry". Sticking to the original principles as laid out by Kent Beck results in fairly sane practices.

- When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.

- What triggers writing a test is what matters. Overzealous testers test for each new public method in a class. This leads to testing on implementation details of the actual unit, because most classes are only consumed within a single unit.

- Behavior driven testing makes most sense to decide what needs tests. If it's required behavior to the people across the boundary, it needs a test. Otherwise, tests may be extraneous or even harmful.

- As such, a good trigger rule is "one test per desired external behavior of the unit, plus one test per bug fixed". The test for each bugfix comes from experience -- they delineate tricky parts of your unit and enforce workign code around them.

mumblemumble · on Aug 26, 2020

If I recall correctly, another very important point he makes in that talk is that it's fine to delete tests. TDD tends to result in a lot of tests being created as a sort of scaffolding to support initial development. The thing about scaffolding is, when you're done using it, you tear it down.

I don't think he mentions it during the talk, but the next step after deleting all those tests is a little bit more refactoring for maintainability. Now that you've deleted all the redundant tests, you can then guard against future developers unwittingly becoming tightly coupled to your implementation details, by taking all the members that used to only be exposed for testing purposes, and either deleting them or making them private.

VHRanger · on Aug 26, 2020

Yes, you should delete tests for everything that isn't a required external behavior, or a bugfix IMO.

Otherwise you're implicitly testing the implementation, which makes refactoring impossible.

A big smell here is if the large majority of your tests are mocked. This might mean you're testing at too fine-grained a level.

crazygringo · on Aug 26, 2020

> you should delete tests for everything that isn't a required external behavior

Wait, I'm terribly confused here.

Aren't a huge part of tests to prevent regression?

In attempting to fix a bug, that could cause another "internal" test to fail and expose a flaw in your bugfix that you wouldn't have caught otherwise. And it's not uncommon for your flawed bugfix to not cause an "external" test to fail, because it's related to the codepath there was never a good enough external test for in the first place -- hence why the bug existed.

I can't imagine why you would ever delete tests prematurely. I mean, running internal tests is cheap. I see zero benefit for real cost.

And not only that, when devs don't document the internal operation of a module sufficiently, keeping the tests around serves as at least a kind of minimal reference of how things work internally, to help with a future dev trying to figure it out.

If you're refactoring an implementation, then obviously at that point you'll delete the tests that no longer apply, and replace them with new ones to test your refactored code. But why would you delete tests prematurely? What's the benefit?

_t0du · on Aug 26, 2020

> In attempting to fix a bug, that could cause another "internal" test to fail and expose a flaw in your bugfix that you wouldn't have caught otherwise.

If an external test passes and an internal test fails, the external test isn't really adding any value, is it? And if the root of your issue is "What if test A doesn't test the right things", doesn't the whole conversation fall apart (because then you have to assume that about every test)?

IME this is a common path most shops take. "We have to write tests in case our other tests don't work." Which is a pretty bloaty and wildly inefficient strategy to "Our tests sometimes don't catch bugs." Write good tests, manage and update them often. Don't write more tests to accommodate other tests being written poorly.

> I mean, running internal tests is cheap.

Depends on your definition of cheap, I guess.

My last job was a gigantic rails app. Over a decade old. There were so many tests that running the entire suite took ~3 hours. That long of a gap between "Pushed code" and "See if it builds" creates a tremendous amount of problems. Context switching is cost. Starting and unstarting work is cost.

I'm much more of the "Just Enough Testing" mindset. Test things that are mission critical and complex enough to warrant tests. Go big on system tests, go small on unit tests. If you can, have a different eng write tests than the eng that wrote the functionality. Throw away tests frequently.

crazygringo · on Aug 26, 2020

I understand what you're saying, but in my experience that's not very robust.

I've often found that an internal function might have a parameter that goes unused in any of the external tests, simply because it's too difficult to devise external tests that will cover every possible internal state or code path or race condition.

So the internal tests are used to ensure complete code coverage, while external tests are used to ensure all "main use cases" or "representative usage" work, and known frequent edge cases.

That doesn't mean the external tests aren't adding value -- they are. But sometimes it's just too difficult to set up an external test to guarantee that a deep-down race condition gets triggered in a certain way, but you can test that explicitly internally.

It's not that anyone is writing tests poorly, it's just that it simply isn't practically feasible to design external tests that cover every possible edge case of internal functionality, while internal tests can capture much of that.

And if your test suite takes 3 hours to run, there are many types of organizational solutions for that... but this is the first I've everd heard of "write less tests" being one of them.

_t0du · on Aug 26, 2020

> I've often found that an internal function might have a parameter that goes unused in any of the external tests,

It seems that you're still thinking about "code". What if you thought about "functionality"? If an external test doesn't test internal functionality, what is it testing?

> But sometimes it's just too difficult to set up an external test to guarantee that a deep-down race condition gets triggered in a certain way, but you can test that explicitly internally.

I would argue that if you're choosing an orders of magnitude worse testing strategy because it's easier, your intent is not to actually test the validity of your system.

> while internal tests can capture much of that.

We can agree to disagree.

> And if your test suite takes 3 hours to run, there are many types of organizational solutions for that... but this is the first I've everd heard of "write less tests" being one of them.

I was speaking about a real scenario that features a lot of the topics that you're describing. My point was not that it was good, my point was that testing dogmatism is very real and has very real costs. To describe writing/running lots of (usually unnecessary) tests as "cheap" is a big red flag.

jharger · on Aug 27, 2020

Not the poster you replied to, but I've been thinking of it lately in a different way. Functional tests show that a system works, but if a functional test fails, the unit test might show where/why.

Yes, you'll usually get a stack trace when a test fails, but you might still spend a lot of time tracing exactly where the logical problem actually was. If you have unit tests as well, you can see that unit X failed, which is part of function A. Therefore you can fix the problem quicker, at least for some set of cases.

jacobr1 · on Aug 27, 2020

It is a combinatorial explosion problem.

Internal code A has 5 states, piece B has 8 states.

Testing them individually requires 13 tests.

Testing them from out the outside, requires 5x8=40 tests.

Now, if you think of it that way, maybe you _do_ want to test the combinations because that might be source of bugs. And if you do it well, you don't actually need to write 40 tests, you have have some mechanism to loop through them.

But the basic argument is that the complexity of the 40 test-cases is actually _more_ than the 13 needed testing the internal parts as units.

FWIW, my own philosophy is to write as much pure-functional, side-effect free code that doesn't care about your business logic as possible, and have good coverage for those units. Then compose them into systems that do deal with the messy internal state and business-logic if statments that tend to clutter real systems, and ensure you have enough testing to cover all branching statements, but do so from an external to the system perspective.

btschaegg · on Aug 27, 2020

I've got the impression that you are both talking slightly past each other.

At least my impression is that these "internal tests" you talk about are valid unit tests -- but not for the same unit. We build much of our logic out of building blocks, which we also want to be properly tested, but that doesn't mean we have to re-test them on the higher level of abstraction of a piece of code that composes them.

From that thought, it's maybe even a useful "design smell" you could watch out for if you encounter this scenario (in that you could maybe separate your building blocks more cleanly if you find yourself writing a lot of "internal" tests)?

Jumziey · on Aug 26, 2020

Isn't the idea with unit testing forgotten here? The point is to validate the blocks you build and use to build the program. In order to make sure you've done each block right you test them, manually or automated... Automated testing just is generally soooo much easier. If you work like that, and do not add test after y,ou've written large chunks of code you should have constructed your program so that there's no overhead in the test. Advanced test which does lots of setup and advanced calculations generally ain't the test fault, but the code itself that requires that complexity to be tested.

Wanna underline here that system tests are slow, unit tests are fast.

This said, i agre that you should throw away tests in a similar fashion as you do code. When it does not make sense don't be afraid to throw it, but have enough left to define the function of the code, in a documenting way. Let the code /tests speak! :D

_t0du · on Aug 26, 2020

> Wanna underline here that system tests are slow, unit tests are fast.

System tests are slow but, in my experience, are far far far more valuable medium and long term. Unit tests are fast and relatively unhelpful.

hderms · on Aug 27, 2020

Imo the value of unit tests is partially a record for others to see "hey look, this thing has a lot of it's bases covered".

Especially if you're building a component that is intended to be reused all over the place, would anyone have confidence in reusing it if it wasn't at least tested in isolation?

ptx · on Aug 26, 2020

If the test suite took hours, couldn't part of the problem be that a lot of those tests should have been more focused unit tests? With small unit tests and mocking, you could run millions of tests in 3 hours.

_t0du · on Aug 27, 2020

There were all kinds of problems with the test suite that could've been optimized. The problem was that there were too many to manage, and that deleting them was culturally unacceptable.

Lots of them made real DB requests. It's hard to get a product owner to justify having devs spend several months fixing tests that haven't been modified in 9 years.

nkingsy · on Aug 26, 2020

If it can cause a regression, it's not internal. My rule of thumb is "test for regression directly", meaning a good test is one that only breaks if there's a real regression. I should only ever be changing my unit tests if the expected behavior of the unit changes, and in proportion to those changes.

geophile · on Aug 26, 2020

This is wrong.

A well-known case is the Timsort bug, discovered by a program verification tool. Also well known is the JDK binary search bug that had been present for many years. (This paper discusses the Timsort bug, and references the binary search bug: http://envisage-project.eu/proving-android-java-and-python-s...)

In both cases, you have an extremely simple API, and a test that depends on detailed knowledge of the implementation, revealing an underlying bug. Obviously, these test cases, when coded, reveal a regression. Equally obviously, the test cases do test internals. You would have no reason to come up with these test cases without an incredibly deep understanding of the implementations. And these tests would not be useful in testing other implementations of the same interfaces, (well, the binary search bug test case might be).

In general, I do not believe that you can do a good job of testing an interface without a good understanding of the implementation being tested. You don't know what corner cases to probe.

saagarjha · on Aug 26, 2020

Using implementation to guide your test generation ("I think my code might fail on long strings") is fine, even expected. Testing private implementation details ("if I give it this string, does the internal state machine go through seventeen steps?") is completely different.

geophile · on Aug 26, 2020

Sure.

yoden · on Aug 26, 2020

That's not what he's saying. He's saying the test should measure an externally visible detail. In this case that would be "is the list sorted". This way the test will still pass without maintenance if the sorting algorithm is switched again in the future. You can still consider the implementation to create antagonistic test cases.

Beldin · on Aug 26, 2020

One of my colleagues helped find the Timsort bug and recently another such bug (might be the Java binary search, don't remember).

The edge case to show a straightforward version of that recent bug basically required a supercomputer. The artifact evaluation committee complained even.

So you can try to test for that only based on output. But it's gigantically more efficient to test with knowledge of internals.

jeffffff · on Aug 27, 2020

this sounds like a case where no amount of unit testing ever would've found the bug. someone found the bug either through reasoning about the implementation or using formal methods and then wrote a test to demonstrate it. you could spend your entire life writing unit tests for this function and chances are you would never find out there was an issue. i'd say this is more of an argument for formal methods than it is for any approach to testing.

Beldin · on Aug 27, 2020

But once you've found the bug, you'd like to add a test case that prevents regression - a test case that doesn't require a supercomputer.

That might not always be possible - but if it is, the test would be based on implementation details.

eithed · on Aug 26, 2020

One doesn't need to have detailed knowledge of the implementation, but merely if provided initial state creates invalid output then we can write test for that. Though yes, having knowledge of implementation allows you to define the state that produces invalid result.

the_af · on Aug 26, 2020

> If it can cause a regression, it's not internal

Fair enough. And how do you know, before causing a regression, whether your test could detect one? In other words, how can you tell beforehand whether your test checks something internal or external?

bmh100 · on Aug 26, 2020

"External" functionality will be behavior visible to other code units or to users. If you have a sorting function, the sorted list is external. The sorting algorithm is internal. Regression tests are often used in the context of enhancements and refactorings. You want to test that the rest of the program still behaves correctly. Knowing what behavior to test is specific to the domain and to the technologies used. You can ask yourself, "how do I know that this thing actually works?"

fiddlerwoaroof · on Aug 26, 2020

Isn’t the point that internal functions often have a much smaller state space than external functions, so it’s often easier to be sure that the edge cases of the internal functions are covered than that the edge cases of the external function are covered?

So, having detailed tests of internal functions will generally improve the chances that your test will catch a regression.

marcosdumay · on Aug 26, 2020

> Isn’t the point that internal functions often have a much smaller state space than external functions

That's the general theory, and why people recommend unit tests instead of only the broader possible integration tests. But things are not that simple.

Interfaces do not only add data, they add constraints too. And constraints reduce your state space. You will want to cut your software over the smallest possible interface complexity you can find and test those, those pieces are what people originally called "unities". You don't want to test any high-complexity interface, those tests will harm development and almost never give you any useful information.

It's not even rare that your unities are composed of vertical cuts through your software, so you'll end up with only integration tests.

The good news is that this kind of partition is also optimal for understanding and writing code, so people have been practicing it for ages.

bmh100 · on Aug 26, 2020

I agree that they would help in the regression testing process, especially in diagnosing the cause. However, I think those are usually just called "unit" tests, not "regression" tests. For instance, the internal implementation of a feature might change, requiring a new, internal unit test. The regression test would be used to compare the output of the new implementation of the feature versus the old implementation of the feature.

ska · on Aug 26, 2020

Having regression tests greatly improves your chances of catching a regression.

amalcon · on Aug 26, 2020

Worth noting that performance is an externally visible feature. You shouldn't be testing for little performance variations, but you probably should check for pathological cases (e.g. takes a full minute to sort this particular list of only 1000 elements).

the_af · on Aug 26, 2020

> "how do I know that this thing actually works?"

Agreed, but how do you know your test tests this? Or to re-phrase it: why would you even write a test that doesn't test this?

nkingsy · on Aug 26, 2020

For bugfixes, I just write a failing test.

For features, I need to take the time to think of required behavior. If I just focus on the implementation, the tests add no documentation and I'm not forced through the exercise of thinking about what matters.

the_af · on Aug 26, 2020

> If I just focus on the implementation [...]

Agreed, but why would you even write those tests to begin with?

neallindsay · on Aug 26, 2020

> Aren't a huge part of tests to prevent regression?

Just a quibble: I would argue that a huge benefit of tests is preventing regression, but that's a very small part of the value of tests.

The main value I get out of tests is informing the design of the software under test.

* Tests are your straight-edge as you try to draw a line.

* They're your checklist to make sure you've implemented all the functionality you want.

* They're your double-entry bookkeeping to surface trivial mistakes.

But I think I mostly agree with your point. I delete tests that should no longer pass (because some business logic or implementation details are intentionally changing). I will also delete tests that I made along the way when they're duplicating part of a better test. If a test was extremely expensive to run, I suppose I might delete it. But in that case I would look for a way to cover the same logic in tests of smaller units.

a1369209993 · on Aug 27, 2020

All legitimate tests are[0] regression tests. TDD, to the extent that it's actually useful, is the notion that sometimes the bug being regression-tested is a feature request.

Edit: 0: I guess "can be viewed as" if you want to be pedantic.

shados · on Aug 26, 2020

> Aren't a huge part of tests to prevent regression?

Depends on the kind of tests. Old school "purist" unit tests are meant to help you verify the correctness of the code as you're writing it. Preventing regressions is better left to integration tests and E2E tests, or smoke tests. Alternatively to "unit tests" if your definition of "unit" is big enough (in which case it only works within the unit).

It's totally fine and common to write unit tests that are not meant to catch bugs of significant refactors. If you do it right, they should be so easy to author that throwing them away shouldn't matter.

hendershot · on Aug 26, 2020

Integration, E2E, and smoke tests are generally slow, flakey, hard to write. They should not cover/duplicate all the cases your unit tests cover.

They are good at letting you know all your units are wired up and functioning together. In all the codebases I've ever worked in, I would feel way more comfortable deleting them vs deleting the unit tests.

iateanapple · on Aug 26, 2020

> Integration, E2E, and smoke tests are generally slow, flakey, hard to write.

This is not really true anymore in a modern system.

I can spin up an entire cluster to mirror prod - including databases and all - and run approx 10k integration tests all in under 5 minutes.

hendershot · on Aug 26, 2020

Why would you want to? when the same unit test coverage will run under 1 minute, and be smaller easier to understand/change tests and can all be done on your laptop.

it all depends on your definition of unit/integration, what I am talking about as unit tests you may very well be talking about as integration tests...

one of the main points I was making is you shouldn't have significant duplication in test coverage and if you do, I'd much rather stick with the unit tests and delete the others.

iateanapple · on Aug 27, 2020

> Why would you want to?

Because they catch more bugs than unit tests, are easier for our product team to understand, and rarely break when refactoring.

Even a simple business flow like registering a new user will touch half a dozen systems.

5 or 6 integration tests can cover this flow far better than 100 unit tests.

> and be smaller easier to understand/change tests

That’s not my experience at all.

Unit tests are generally much harder to understand and need to be changed much more frequently.

Where unit tests help in my experience is:

A) in pinpointing where in a complex bit of logic the bugs are.

B) for generic libraries and building blocks where you don’t know exactly how your users will actually use them.

shados · on Aug 27, 2020

> Unit tests are generally much harder to understand and need to be changed much more frequently.

Changed more frequently, yes.

Harder to understand is usually because they're not-quite-unit-tests-claiming-to-be.

Eg: a test for function that mocks some of its dependencies but also does shenanigans to deal with some global state without isolating it. So you get a test that only test the unit (if that), but has a ton of exotic techniques to deal with the globals. Worse of all worlds.

Proper unit tests are usually just a few line long, little to no abstraction, and test code you can see in the associated file without dealing with code you can't see without digging deeper.

softwarefounder · on Aug 26, 2020

Yes. I believe you shouldn't delete tests.

After all, one of the best reasons for tests is to be able to refactor code confidently.

CraigJPerry · on Aug 26, 2020

If you can refactor (make a commit changing only implementation code, not touching any test code) and the tests still pass then you’re probably fine.

If you’re changing tests as you change the code you’re not refactoring. You have zero confidence that your changed behaviour and changed test didn’t introduce an unintended behaviour or regression.

So many developers miss this in my experience.

jeffffff · on Aug 27, 2020

if you can refactor without touching your tests and your tests still compile afterwards either the refactor was extremely trivial and didn't change any interfaces or you only had end to end tests.

hnick · on Aug 27, 2020

I think the point is that if you have to change a test to make it pass or run after refactoring, it is not useful as a regression test. By changing it you might have broken the test itself so you have less confidence.

There is also the question of what a unit is. If you test (for example) the public interface of a class as a black box unit, you can refactor your class internals as much as you want and your tests don't need to change. You have high confidence you've done it correctly. At this point adding more fine-grained tests inside the class seems like more of a compliance activity than one that actually increases confidence, since you probably would've had to change a bunch of them to make them work again anyway.

jasonwatkinspdx · on Aug 26, 2020

Personally the way I'd phrase it is you need to refactor your tests just like you'd refactor the app code, but even looking at doing that independent of any app code refactoring.

mumblemumble · on Aug 26, 2020

Agreed. I would take an even stronger position, and say that a high degree of mocking actually implies two things: First, yes, you're testing at too fine-grained a level. Second, it's a code smell that suggests you may be working with a fundamentally untestable design that relies overmuch on opaque, stateful behavior.

411111111111111 · on Aug 26, 2020

Mocks are worthwhile though. Otherwise you end up not being able to unittest anything which accesses an external api such as databases, rest services etc.

dagss · on Aug 26, 2020

IMO, databases is often an integral part of the program and should be part of the test (a real database in a docker image).

For instance, if you are not relying on unique constraint in the DB to implement idempotency you are probably doing something wrong, and if you are not testing idempotent behaviour you are probably doing something wrong.

matkoniecz · on Aug 26, 2020

I have plenty of tests actually calling database. Maybe it is not a proper unit test, but it is not a problem for me.

I want test to protect me against regressions, and I do not really care how they are classified.

mumblemumble · on Aug 26, 2020

It really depends on your definition of unit. In the London school of TDD, no, a unit cannot extend across an I/O boundary. The classicist school takes a more flexible, pragmatic approach.

the_other · on Aug 26, 2020

I'd love to know the story behind "the London school".

insertnickname · on Aug 26, 2020

It's all in this book: http://www.growing-object-oriented-software.com/

the_other · on Aug 26, 2020

Intriguing. Thanks!

Joker_vD · on Aug 26, 2020

You mean fakes/stubs, right? Unless you're testing whether you're correctly implementing the protocol exchange with an external party, you don't need to record the API calls.

mannykannot · on Aug 26, 2020

How do you test your mocks?

jacobsenscott · on Aug 26, 2020

How do you test the tests that are testing your mocks? That said verifying mocks are a great help - they won't let you mock methods that don't exist on the real object.

Some mocking libraries, like the VCR library in ruby, can be turned off every now and then so you tests hit real endpoints. It is worth doing from time to time.

war1025 · on Aug 26, 2020

> A big smell here is if the large majority of your tests are mocked.

We fell into this hard for a few years at my work.

Going back to any of that code is a nightmare because the test suites are so fragile.

Testing behavior and setting up as much of the system as possible leads to much better results in my experience.

hinkley · on Aug 26, 2020

Bertrand Meyer had the right of it, but I had to figure this out myself before I ever saw him quoted on the subject.

Me:

Code that makes decisions has branches. Branches require combinatoric tests.

Code with external actions requires mocks.

Therefore:

Code that makes decisions and calls external systems requires combinatorics for mocks.

Bertrand, more (too?) concisely:

Separate code that makes decisions from code that acts on them.

Follow this pattern to its logical conclusions, and most of you mocks become fixtures instead. You are passing in a blob of text as an argument instead of mocking the code that reads it from the file system. You are looking at a request body instead of mocking the PUT function in the HTTP library.

The tests of external systems are much fewer, and tend to be testing the plumbing and transportation of data. If I give you a response body do you actually propagate it to the http library? And even here, spies and stubs are simpler than full mocks.

bmh100 · on Aug 26, 2020

I used this strategy when developing a client library for a web socket API. It was hugely helpful. I could just include the string of the response in my tests, instead of needing a live server or even a mock server for testing. Tests were much simpler to write and faster to execute.

danenania · on Aug 26, 2020

This is great until the API response changes and you have to painstakingly update all your string fixtures to match.

jschwartzi · on Aug 26, 2020

One would argue that you should change your string fixtures to match and verify that the new API response doesn't break anything with your existing API client. Then you change the API client and verify that all the old tests still work as expected.

Better yet is if you keep the old fixtures and the new fixtures and ensure that your API client doesn't suddenly throw errors if the API server downgrades to before the new field was added.

hinkley · on Aug 26, 2020

The mocks have the same fixtures, plus a bunch of plumbing you have to sort out with every major and some minor version number upgrades.

You pay a bigger tax on the mocks down the road.

_pius · on Aug 26, 2020

Yes, you should delete tests for everything that isn't a required external behavior, or a bugfix IMO.

For the edification of junior programmers who may end up reading this thread, I’m just going to come right out and say it: this is awful advice in general.

For situations where this appears to be good advice, it’s almost certainly indicative of poor testing infrastructure or poorly written tests. For instance, consider the following context from the parent comment:

Otherwise you're implicitly testing the implementation, which makes refactoring impossible.

A big smell here is if the large majority of your tests are mocked. This might mean you're testing at too fine-grained a level.

These two points are in conflict and help clarify why someone might just give up and delete their tests.

The argument for deleting tests appears to be that changing a unit’s implementation will cause you to have to rewrite a bunch of old unrelated tests anyway, making refactoring “impossible.” But indeed that’s (almost) the whole point of mocking! Mocking is one tool used for writing tests that do not vary with unrelated implementations and thus pose no problem when it comes time to refactor.

Now there is a kernel of truth about an inordinate amount of mocking being a code smell, but it’s not about unit tests that are too fine-grained but rather unit tests that aren’t fine-grained enough (trying to test across units) or just a badly designed API. I usually find that if testing my code is annoying, I should revisit how I’ve designed it.

Testing is a surprisingly subtle topic and it takes some time to develop good taste and intuition about how much mocking/stubbing is natural and how much is actually a code smell.

In conclusion, as je42 said below:

Make sure you tests run (very) fast and are stable. Then there is little cost to pay to keep them around.

The key, of course, is learning how to do that. :)

VHRanger · on Aug 26, 2020

Did you ever actually refactor code with a significant test suite written under heavy mocking?

The mocking assumptions generally end up re-creating the behavior creating the ossification. Lots of tests simply mock 3 systems to test that the method calls the 3 mocked systems with the proper API -- in effect testing nothing, while baking in lower level assumptions into tests for people refactoring what actually matters.

You might personally be a wizard at designing code to be beautifully mocked, but I've come across a lot of it and most has a higher cost (in hampering refactoring, reducing readability) than benefit.

_pius · on Aug 26, 2020

Did you ever actually refactor code with a significant test suite written under heavy mocking?

I have. The assumptions you make in your code are there whether you test them or not. Better to make them explicit. This is why TDD can be useful as a design tool. Bad designs are incredibly annoying to test. :)

For example if you have to mock 3 other things every time you test a unit, it may be a good sign that you should reconsider your design not delete all your tests.

monkpit · on Aug 26, 2020

It sounds like your argument is “software that was designed to be testable is easy to test and refactor”.

I think a lot of the gripes in the thread are coming from folks who are in the situation where it’s too late to (practically) add that feature to the codebase.

UK-Al05 · on Aug 26, 2020

Mocks allow you test a certain method was called, with certain parameters, and in certain order.

That's extreme test implementation coupling.

Most don't use those features, but in my experience mocks indicate implementation coupling.

dragonwriter · on Aug 26, 2020

You seem to think the rationale is testing performance; but from GP it seems that the rationale is avoiding the tests ossifying implementation details against refactoring rather than protecting external behavior to support refactoring.

_pius · on Aug 26, 2020

I think you wrote this before I finished elaborating on my comment. :)

ptx · on Aug 27, 2020

> Mocking is one tool used for writing tests that do not vary with unrelated implementations

What if I chose the wrong abstractions (coupling things that shouldn't be coupled and splitting things in the wrong places) and have to refactor the implementation to use different interfaces and different parts?

All the tests will be testing the old parts using the old interfaces and will all break.

tonyedgecombe · on Aug 26, 2020

So today I have been writing a lexer and parser. The public interface is the parser, the lexer isn't exposed.

The problem is if I delete all the tests for the lexer then any bugs in the lexer will only get exposed through the parser's tests.

This makes no sense to me.

VHRanger · on Aug 26, 2020

The lexer is a unit then.

The lexer has a clear boundary from the parser.

The issue that takes experience here is how to determine what's a unit. "The whole program" is obviously too big. "every public method or function" is obviously too small.

Just be pragmatic.

AnimalMuppet · on Aug 26, 2020

> "The whole program" is obviously too big.

Of course.

> "every public method or function" is obviously too small.

Why "obviously"? If it's public, someone outside the class can call it. That's an external behavior.

VHRanger · on Aug 26, 2020

If the class is only consumed in the context of one code unit (module, service, whatever) then the class itself is an implementation detail.

ajna91 · on Aug 26, 2020

"every" being the operative word.

The "feel" for good code that comes with experience is not reducible in practice to a set of black-and-white rules.

Chabsff · on Aug 26, 2020

Ideally, the lexer should be a system in of itself, exposing a public interface that is consumed by its client, the parser.

Public doesn't necessarily mean "not you".

hinkley · on Aug 26, 2020

Even if your code never graduates to being used by multiple teams in your project or on others, “You” can turn into “you and your mentee” anyway, if you’re playing your cards right.

TeMPOraL · on Aug 26, 2020

Or more trivially, "you and you half a year from now".

kazinator · on Aug 26, 2020

Every feature of the lexer should be testable through test cases written in the syntax of the language. That includes handling of bad lexical syntax also. For instance, a malformed floating-point constant or a string literal that is not closed are testable without having to treat the lexer as a unit. It should be easy to come up with valid syntax that exercises every possible token kind, in all of its varieties.

For any token kind, it should be easy to come up with a minimal piece of syntax which includes that token.

If there is a lexical analysis case (whether a successful token extraction or an error) that is somehow not testable through the parser, then that is dead code.

The division of the processing of a language into "parser" and "lexer" is arbitrary; it's an implementation detail which has to do with the fact that lexing requires lookahead and backtracking over multiple characters (and that is easily done with buffering techniques), whereas the simplest and fastest parsing algorithms like LALR(1) have only one symbol of lookahead.

Parsers and lexers sometimes end up integrated, in that the lexer may not know what to do without information from the parser. For instance a lex-generated lexer can have states in the form of start conditions. The parser may trigger these. That means that to get into certain states of the lexer, either the parser is required, or you need a mock up of that situation: some test-only method that gets into that state.

Basically, treating the lexer part of a lexer/parser combo as public interface is rarely going to be a good idea.

tonyedgecombe · on Aug 27, 2020

For any token kind, it should be easy to come up with a minimal piece of syntax which includes that token.

There is the problem, any tests that fail in the lexer now reach down through the parser to the lexer. The test is too far away from the point of failure. I'll now spend my time trying to understand a problem that would have been obvious when the lexer was being tested directly.

>Basically, treating the lexer part of a lexer/parser combo as public interface is rarely going to be a good idea.

This is part of the original point, the parser is the public interface which is why the OP was suggesting it should be the only contact point for the tests.

kazinator · on Aug 27, 2020

When a test fails, your understanding is informed by the nature of the code change that is responsible.

If you keep code changes small, and keep tests working, you're good.

pydry · on Aug 26, 2020

Lexer/Parsers are one of the few software engineering tasks I do routinely where it's self evident that TDD is useful and the tests will remain useful afterwards.

lioeters · on Aug 26, 2020

Indeed! I recall a lexer and parser built via TDD with a test suite that specified every detail of a DSL. A few years later, both were rewritten completely from scratch while all the tests stayed the same. When we got to passing all tests, it was working exactly as before, only much more efficiently.

From that experience, I would say that in some contexts, tests shouldn't be removed unless what it's testing is no longer being used.

bluGill · on Aug 26, 2020

So what?

If you have a good answer to that, then the lexer is separate (as others said). If you don't then wirte parser tests for the lexer so that you can more easily refractor the interface between them.

There is no on right answer, only trade-offs. You need to make the right decision for you. (though I will note that there is probably a good reason parse and lex are generally separated and that probably means that the best tradeoffs for you is they are separate. But if you decide different you are not necessarily wrong)

tonyedgecombe · on Aug 26, 2020

So what?

Well, I was responding to "you should delete tests for everything that isn't a required external behavior".

rbanffy · on Aug 26, 2020

If bugs in the lexer never cause the parser to fail for any possible input, does it really have bugs? ;-)

Or, as @VHRanger pointed out, the lexer can be considered a unit and be tested independently.

saagarjha · on Aug 26, 2020

Sounds like your lexer has a public interface to you.

kaens · on Aug 26, 2020

these are rules of thumb, not laws

tonyedgecombe · on Aug 26, 2020

The trouble is they get presented as laws and then some jobsworth will make damn sure you are following the rules.

kazinator · on Aug 26, 2020

Rules of thumb are just low-fidelity windows allowing glimpses of poorly researched, not yet understood laws.

hinkley · on Aug 26, 2020

I’ve watched this play out a few times with different teams and different code bases (eg, one team two projects).

Part of the reason existing tests lock in behavior and prevent rework/new features is that the tests are too complicated. Complicated tests were expensive to write. Expense leads to sunk cost fallacy.

I’ve watched a bunch of people pair up for a day and a half trying to rescue a bunch of big ugly tests that they could have rewritten solo and in hours if they understood them, learn nothing, and do the same thing a month later. The same people had no problem deleting simple tests and replacing them with new ones when the requirements changed.

Conclusions:

- the long term consequences of ignoring the advice of writing tests with one action and one assertion are outsized and underreported.

- change your code so the don’t need elaborate mocks

- choose a test framework that supports setup methods

- choose a framework that supports custom/third party assertions, sometimes called matchers. You won’t use this often, but when you do, you really do.

shados · on Aug 26, 2020

> Otherwise you're implicitly testing the implementation, which makes refactoring impossible.

Red green refactoring isn't, and shouldn't, be a goal of unit testing. Integration and E2E tests provide that. Unit tests are mostly about making sure the individual pieces works as you author them, as well as implicitly documenting the intent of those individual pieces.

If done properly, they're always quick/easy/cheap to author, and thus are throwaway. When you refactor significantly (more than the unit), you just throw them away and write new ones (at which point their only goal is for you to understand the intent of the code you were shuffling around, and making sure you're breaking what you expected to break). Delete, rewrite.

People are resistant to getting rid of unit tests when they did complex integration tests that took forever to write instead. So the tests feel like they were wasted effort. Those tests are totally valuable, in this case for things such as red green refactoring, but then yes, you have to carefully pick and choose what you're testing to avoid churn.

mumblemumble · on Aug 26, 2020

I would also test implementation details that are legitimately complicated and might fail in subtle ways, or where the intended behavior isn't obvious.

If I've implemented my own B+ tree, for example, you better bet your butt I'll be keeping some property tests to document and verify that it conforms to all the necessary invariants.

undergrowth54 · on Aug 26, 2020

+1

There are two audiences for tests. 1. Your current self, who is trying to ensure you're going in the right direction.

2. A future dev, who wants to understand the relevant info about what to expect from a module.

kazinator · on Aug 26, 2020

Tests took work to produce and provide some sort of information.

It seems foolhardy to start off a process by throwing away information which could inform it.

Not having tests which cover the implementation makes refactoring impossible if the goal of refactoring is to preserve certain salient aspects of the implementation, rather than uproot it entirely.

Why not just start refactoring first. Then see what breaks, and then decide on a case-by-cases who wins: do you keep the refactoring which broke the test, and delete the test? Or do you back out that aspect of the refactoring.

eitland · on Aug 27, 2020

This is my preferred approach as well.

When one does this one hopefully also get a feel for what tests will be useful and which ones will be thrown out early and start writing more of the first ones.

Couple this with a well designed language and a good IDE that can do the trivial refactorings (method rename - including catching overload collisions etc) and it becomes easy to maintain tests.

je42 · on Aug 26, 2020

you do not want to delete tests that provide insight how your unit works internally.

If you were to delete these and you happen to have a regression, you need more time to analyse the faulty external behaviour and make conclusion about how the inner parts are working to produce the behaviour.

If you didn't write the code yourself, you might be in a situation that you will never able to be fix issue fully within reasonable time.

Side-note: I have seen these problems multiple times in production, where missing tests resulted in a large and expensive engineering effort to figure out the inner-mechanics of a particular piece of code.

Make sure you tests run (very) fast and are stable. Then there is little cost to pay to keep them around.

city41 · on Aug 26, 2020

I agree with deleting tests. But when raising this with any team I've ever worked on, I might as well have said I was going to go drop the prod database. Deleting tests, in my experience, comes with a massive stigma that I am not sure how to surmount.

Insanity · on Aug 26, 2020

+1 especially to the final point.

If there was a bug, this bug should be replicated in the test. You then solve the bug, make sure the test (and others pass), and you'll be (relatively) sure the bug will not be reintroduced with a later change.

Every bug you find is an "edge-case" you didn't anticipate. Leave it in the test for the future. I find that the "table test" approach of Go works surprisingly well with this. You just add a case to the table, and often that's all you have to do.

dkarl · on Aug 26, 2020

> TDD, much like scrum, got corrupted by the "Agile Consulting Industry" > Overzealous testers test for each new public method in a class.

The Agile Consulting Industry can be summed up as: know only one rigid way to do things, and if it doesn't work, blame the context.

In the case of unit testing, if my blind mechanical rules for unit testing don't apply sanely to your code, then your code is wrong.

tilolebo · on Aug 26, 2020

This.

This is why I stopped following agile and other XP gurus who just live from consulting.

It's easy to have strict principles when you never need to apply them yourself.

randompwd · on Aug 26, 2020

Uncle Bob comes to mind

VHRanger · on Aug 26, 2020

He's to blame for popularizing a lot of the unit testing insanity IMO, yes.

rvense · on Aug 26, 2020

Agile is a religion.

Well-meaning, smart people wrote some good, general guidelines. Somehow it got turned into an industry of people who want money to tell me how I'm living my life wrong.

TeeMassive · on Aug 27, 2020

Yeah.

Agile can be summed up by making little releases which you show to the client and so the developers can make decisions on what to develop next quickly.

Everything else naturally derive from this. All the other stuff are just ways, which you can adopt or not, to accomplish that. You don't need to hire a consultant to tell you that you're not doing your poker story points meeting wrong because of metaphysical explanation.

api · on Aug 26, 2020

It can also be summed up as: "we see there are managers with budget, and we have a solution to this problem..."

Any place you have money, you will find someone selling snake oil.

dalbasal · on Aug 26, 2020

The problem with formulas is that they have a tendency to become formulaic, if successful.

quickthrower2 · on Aug 26, 2020

Next introduce some metrics that measure how "well" you are doing. Then make those your primary goal.

brightball · on Aug 26, 2020

This is an excellent point. I once had to deal with an external contract firm on a project that I was hired to fix. We had issues of production code breaking so badly that it brought down the entire server (N+1 query issue that triggered 50k queries).

The tests passed. When I emergency patched the issue and deployed it to production, the contract firm got mad at me for breaking their tests...to fix a production emergency.

It’s put me on guard again militant test ideologues with no concept of real priorities ever since.

panopticon · on Aug 26, 2020

We generally wouldn't allow that either. We've ran into cases where emergency fixes cause even more damage (e.g., the system is up, but now it's processing payments wrong), so you have to prove beyond a shadow of a doubt that the test failures are irrelevant or less bad than the current incident.

Often times it's less effort/more expedient to make the change pass tests (or update the tests) than convince all of the stakeholders that what you're about to do is safe, but the break-glass is there if needed.

Maybe you'd call this a militant test ideology, but I think it's perfectly reasonable. Systems are complex, and people can get tunnel vision during a bad outage.

brightball · on Aug 26, 2020

The way the tests were written in that case, it was hard coded to how the work was being done and not the result produced. Both the code and the tests were bad.

Normally, I’d agree with you though.

Benjammer · on Aug 26, 2020

Be careful not to confuse militant ideology and local incentive structures.

brightball · on Aug 26, 2020

Fair point. In this particular case the primary developer for them wanted to “publicly shame” me in Slack. Seemed much more ideology driven at the time.

saagarjha · on Aug 26, 2020

Public shaming of a developer is rarely the right thing to do–and I would expect most design paradigms do not include a section on it ;)

bmh100 · on Aug 26, 2020

That may have just been the rationalization for an ulterior motive.

TeeMassive · on Aug 27, 2020

> The tests passed. When I emergency patched the issue and deployed it to production, the contract firm got mad at me for breaking their tests...to fix a production emergency.

I get the point, but it depends on which tests failed.

Tests for unreleased features and trivial UX stuff is not the same as breaking a test making sure not every customer gets a 50% discount.

cjfd · on Aug 26, 2020

I think most disenchantment with TDD comes from the second point that you notice. If one attempts to test every method of every class one ends up testing implementation details that are very much subject to change. Also, one can easily end up testing trivial points like 'is the processor still capable of adding two integers'. As you note in your third point it seems much more productive to test properties of code that the customer could potentially recognize as something they value.

TDD really isn't dead for me. I do it pretty much every day. Both in work and in personal projects.

I am not sure talks/conversations like these are very valuable. In the end it turns out that every practical question has the answer 'it depends'. Maybe the important thing to realize is that most questions do have the answer 'it depends' and that one can never stop using ones brain.

theptip · on Aug 26, 2020

> When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.

This is a really important aspect, and I think is one of the key things that separates "journeyman" level from "master" on the subject of testing.

The most concise piece I've found on this is Bob Martin's "Testing Contra-Variance" (https://blog.cleancoder.com/uncle-bob/2017/10/03/TestContrav...), if you can look past the slightly forced "Socratic dialog" style of the article.

> The structure of the tests must not reflect the structure of the production code, because that much coupling makes the system fragile and obstructs refactoring. Rather, the structure of the tests must be independently designed so as to minimize the coupling to the production code.

The "one test class per class" or "one test file per file" approach is an extremely common anti-pattern, and it's insidious because a lot of engineers think it's the obviously correct way of writing tests.

rpadovani · on Aug 26, 2020

> one test per bug fixed

this, and you should try to write the test before fixing the bug - you cannot trust a test that you haven't seen failing

parenthesis · on Aug 26, 2020

I often find that before I've diagnosed and fixed a bug, I don't know what the best test is.

So I tend to: fix the bug; write the test; see the test fail on the old version of the code; see the test pass on the new version of the code.

Something else I'll throw in on tests. Many times I've caught a bug I would otherwise be introducing because an existing test — which wasn't written to catch my new mistake — fails.

karmelapple · on Aug 26, 2020

Sometimes you’re in a hurry to fix a bug, so you write the functionality and ship it out fast after some manual testing, without an automated test reproducing the bug. That’s okay!

Write the test afterwards in some branch, and after committing the tests, make an extra commit that undoes the bug fix. Let your CI run it and confirm it fails there, then bring back the bug fix on the next commit.

That way, you can have flexibility to fix things fast, but still keep the regression test that’s proven to check for it around for the future.

undergrowth54 · on Aug 26, 2020

> to fix things fast

Note that if you see _another_ developer taking the time to write a test when you wouldn't, it doesn't necessarily mean they are wasting time.

As I am debugging something, I need to write myself a very clear description of my hypothesis of the steps to reproduce the bug -- otherwise it is hard to see incremental progress. I work faster if I can have the machine execute those steps.

karmelapple · on Aug 26, 2020

Fully agreed. I didn't intend to imply a developer writing a test is wasting time, either!

Our team has been in situations where we're highly confident of the root cause, but we know creating a test to duplicate this might take hours, if not days. It might even be a fairly finicky scenario to try and setup.

Rather than letting customers have to handle the negative consequence of our bug for hours, we'll make the change, run it through our existing test suite (to make sure we're not making yet more troubles for ourselves!), and then release it after another teammate reviews the change.

But a test certainly will help with the confidence that the right thing was changed. I would definitely encourage writing a test if it's easy for your group to handle whatever negative consequences the bug existing in the wild is producing for as long as it takes to write a test.

solarengineer · on Aug 26, 2020

"That's okay!"

Why is that Ok? How does the dev know that they've not undone anything else? Also, how does the dev know that the fix is complete? Or that it caters to the defect?

This anxiety of pushing out at fix - at the risk of undoing other working functionality - ought to be addressed first. It is better (and safer) to get into the habit of writing a test to reproduce the defect and then write the fix. After all, if the fix appears trivial, then the test for the defect ought not to take too much time either. I have been able to get to this mindset with practice.

kyralis · on Aug 26, 2020

Because being dogmatic is exactly what causes people to start ignoring this sort of methodology. Pragmatism really does have to win out sometimes. Maybe you're in a hurry because the bug is causing active downtime. Some bugs really are "obvious" once they've failed and you're looking at the code. Maybe development of a proper test involves some test infrastructure work that's a larger undertaking than you have the opportunity for at the moment. Maybe you have a solid manual/QA testing system behind you, allowing you at least temporary assurance that your fix is valid.

No team does everything perfectly all the time, and that's fine. The real question is what gets done about it afterward: is the technical debt that you've incurred paid down in a reasonable time frame?

karmelapple · on Aug 26, 2020

This nicely encapsulates what I was hoping to say, but didn't take the time to write out. Thank you!

> Maybe development of a proper test involves some test infrastructure work that's a larger undertaking than you have the opportunity for at the moment.

This is something I've encountered many times.

> Maybe you have a solid manual/QA testing system behind you, allowing you at least temporary assurance that your fix is valid.

I would hope most people are doing this anyway, especially when a big production bug has been found.

magicalhippo · on Aug 26, 2020

I've fixed plenty of bugs which I could not reproduce and thus could not test, simply based on the source code and description of problem from customer. Based on the symptoms it was clear what the code must be doing, studying the code reveals it is clearly wrong, and so the fix was obvious.

So make a patch, make a new build, send to customer and get a call "yeah that fixed it, thanks!".

Sometimes the issue is I simply don't have time to reproduce it, like that time a blocking bug had to be fixed within 30 minutes, or the customer would have had to charter their own helicopter to get a few packages to an offshore oil platform, rather than piggy-back on the worker transport heli.

Other times it's some combination of the customers system it's running on and configuration of our software that I can't reproduce.

Not saying it's ideal, but it's quite possible to successfully fix issues without being able to reproduce and test.

bluGill · on Aug 26, 2020

I once fixed a bug like that, then went wait a minute. Sure enough source control revealed I'd made exactly the same fix 2 years ago, and 2 years before that someone else had done exactly the same. In the odd years someone else undid that fix to fix a different bug that seemed unrelated. Once I figured out what the other problem was I was able to find the more complex fix for both situations.

I wished I had heard of automated tests then

magicalhippo · on Aug 26, 2020

Yeah I always check source control (blame/annotate), even if I wrote the code myself, just to be sure I'm not missing some context.

Automated tests is pretty great, but a lot of the stuff we do is difficult to test, mostly due to a lot of legacy code that's not well confined. As we work on a piece of code we try to clean that part up, but it takes time.

smileypete · on Aug 29, 2020

Maybe add a dated note as a code comment breifly describing the reason for the change.

hnick · on Aug 27, 2020

I'd imagine stuff like race conditions would fit into this category nicely. Obvious upon inspection but annoying to test for.

pydry · on Aug 26, 2020

>Why is that Ok? How does the dev know that they've not undone anything else?

If the bug is "I thought we were supposed to have a 10 minute timeout and we accidentally set the timeout to 10 seconds" it's pretty screamingly obvious that if you change the time from "10" to "600" the problem is now solved and you haven't broken anything else.

Religiously applying this rule of thumb as you describe causes ALL sorts of problems including the problem of people writing tests for the above kind of behavior. That test will fail when it's changed from 600 to 1800 deliberately and that will create a pointless waste of time for everybody.

Joker_vD · on Aug 26, 2020

Yeah, and then you find out that that constant was also used in another piece of code as number of minutes, so you've just changed another timeout in the system from 10 minutes to 10 hours.

Yes, of course that constant should never have been a naked integer in the first place, but we live in an imperfect world. One thing I like about Go's standard library is that almost everything takes time.Duration instead of a plain integer that's then interpreted to be milliseconds (or micro/nanoseconds in the most unexpected place, gotcha, developer, you should've read the docs!)

leonroy · on Aug 26, 2020

> This anxiety of pushing out at fix - at the risk of undoing other working functionality - ought to be addressed first.

It really depends on how urgent the fix is. Naturally you want the fix to be as isolated as possible so it does not regress other functionality. But even non-urgent fixes can sometimes benefit from getting pushed out following quick manual testing. For example we use Sentry.io for capturing errors in our applications and LogDNA for logging.

Occassionally we'll encounter some kind of edge case where we see a spike in logged errors which blows us past our Sentry and LogDNA quotas. Pushing a fix out before the test is written can be beneficial in cases like this although yes, it's worth avoiding if possible.

karmelapple · on Aug 26, 2020

Sounds like I could have been more thorough in describing why I think "that's okay."

> How does the dev know that they've not undone anything else?

I didn't write it, but I was assuming a scenario where a system already has a comprehensive automated test suite. If you have most functionality under test, then hopefully you're pretty confident that it won't undo anything else.

> Also, how does the dev know that the fix is complete?

The same way a dev knows if one single automated test that addresses the one known failure scenario is a complete fix.

In other words: you don't know. You keep doing some manual testing, and watching whatever logs or status indicators, to see if things go back to normal after deployment.

> Or that it caters to the defect?

I also didn't write this, but I had in mind some level of manual testing before deploying the code change to ensure it caters to the defect.

> Why is that Ok?

Hopefully my answers above help explain why I said that's okay.

I'm not advocating for flippantly shipping code without having a variety of other guard rails in place; I was primarily talking about when a bug has really bad consequences for users, your team is confident that writing a test is going to take a long time, and you do some level of manual testing to confirm all seems generally well.

KptMarchewa · on Aug 27, 2020

>How does the dev know that they've not undone anything else?

No one says that rest of the test suite is not run.

tonyedgecombe · on Aug 26, 2020

Sometimes you’re in a hurry to fix a bug

Most of the time I find it faster to write the test first. If you haven't got a test what are you going to do? Manually test?

MockObject · on Aug 26, 2020

And since everything usually takes longer than expected, the benefit of automating that manual testing is usually greater than expected.

hnick · on Aug 27, 2020

We tend to have a web focus on this site but not all IT is like that.

If I have a data processing job that takes 3 hours and it fell over at the 1 hour mark (and let's say whoever wrote it didn't have the foresight to make it resume neatly, because that's added complexity that never got budgeted), I'm going to fix the obvious bug and kick off reprocessing immediately. Possibly after some messing around in a REPL to confirm how the code acts.

While it's going, I can then do some manual tests and sanity checks and cancel/restart if necessary - but if not, I've gained a lot of time.

dhosek · on Aug 26, 2020

Nope. With experience comes wisdom. I've had the situation happen when I was sure I knew what the bug was. I wrote a test that I expected to fail—and it passed. My diagnosis was incorrect and the bug was somewhere else. Had I not done this, I might have not only not fixed the bug, I would have likely introduced new bugs.

rad_gruchalski · on Aug 26, 2020

> That’s okay!

The effort put into manual tests can be committed directly to adding a test case. Furthermore, a fix should be accepted only when a full test suite with the new tests pass. Such manual testing, depending on what the issue was, may result in a regression in other parts of software. Just a thought...

draw_down · on Aug 26, 2020

I agree with the last part, but the ordering isn’t crucial. For example you can stash your changes in the implementation to check that the test fails when the change is not present and passes when it is. For this (among other reasons) it is nice to have a test runner that re-runs when it detects file changes.

The ordering is one part of TDD that always bugged me- you have to write the tests first. But I often prefer to experiment and try a couple approaches before deciding on one. Having tests first would make a lot more overhead for that way of working.

wh33zle · on Aug 26, 2020

To be fair though, if your tests are at the right abstraction level, the specific approach you are choosing for the implementation shouldn't matter for the test.

Writing the test first also forces you to think abouy what API do you actually want to expose. Once you got the API right, there is still room for experimenting.

draw_down · on Aug 26, 2020

That’s only true when you have decided what the abstraction will be. That’s my point, a lot of times you don’t know yet!

wrs · on Aug 26, 2020

In this approach you decide on the abstraction (i.e., the API) by writing example code (i.e., some tests) that uses the API. The tests are how you decide which abstraction seems to make the most sense.

It sounds like you actually implement the abstraction to decide whether it seems like the right one, which is a lot more work.

draw_down · on Aug 26, 2020

Yes, my position is that tests as client don't really tell you the truth about the abstraction because they don't represent a real usage of it.

It is better to write tests for code when you know what it is and what it should do. Tests also introduce a drag on changing strategies- if the choice you made when you wrote them is now not necessarily the optimal one, you must now change your tests or convince yourself that you were actually right the first time.

If people like to work this way then great, I'm just explaining why for me it feels bad and runs counter to my instincts.

wh33zle · on Aug 26, 2020

I think I understand what you mean. At the same time though, one crucial takeaway for me from Ian's talk is that my tests might be on a too small scale if they are not useful whilst I am changing the implementation strategy.

For example, I found it useful to ditch concepts like the testing pyramid and focus on writing e2e tests for my HTTP API instead of trying to cover everything with module or function level tests. That makes it much less likely that they need to change during refactorings and hence provide more value.

I generally think that "What is going to break this test?" is a really powerful question to ask to evaluate how good it is. Any answer apart from "a change in requirements" could be a hint that something is odd about the software design. But to ask this question, I need to write the test first or at least think about what kind of test I would write. At some point, writing the actual test might be obsolete as just thinking about it makes you realize a flaw in the design.

Other interesting questions I like to ask myself are: "How much risk is this test eliminating?" and "How costly is it to write and maintain?"

wrs · on Aug 26, 2020

In reality I tend to do both: write example client code to think through the abstraction (some call this “README-driven development”) and then write tests once the implementation is under way. Though you can get the first as a side effect of the second, I find that good tests aren’t really good example code (too fragmented, focus on edge cases, etc.).

spaetzleesser · on Aug 26, 2020

“TDD, much like scrum, got corrupted by the "Agile Consulting Industry". Sticking to the original principles as laid out by Kent Beck results in fairly sane practices.”

Totally agree. Somehow every good idea gets converted to a rigid ideology after a while. Same for OOP. It’s a solid idea but then the ideologues pushed it way too far. And instead of dialing back a little we see other ideologues declare “X is dead“ and the pendulum swings into another extreme direction.

My company is generally behind the curve so now people have been bitten by the REST, JSON, Microservice bug. They don’t know why or what it really is but things have to be done that way. That together with calling themselves “agile” without understanding what it means besides using JIRA and having fixed sprints.

reading-at-work · on Aug 26, 2020

> My company is generally behind the curve so now people have been bitten by the REST, JSON, Microservice bug. They don’t know why or what it really is but things have to be done that way.

This resonates with me. My first job out of college was with a big, very old insurance company. My team lead became obsessed with using microservices for some reason, even though we were only building internal web apps that would have about 1,000 users on a busy day. There would be no performance concerns whatsoever that would warrant "breaking up a monolith" to make it more scalable. But microservices were a great way for the team to feel like we were using trendy tech despite not having any idea how to really go about doing it or any particular reason for doing so.

u801e · on Aug 26, 2020

> When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.

Every example I've read pertaining to unit testing uses a function as the unit to test. The easiest functions to test are ones that don't have side-effects (network, I/O, disk, etc). Could you point me to an example where a unit test applies to something beyond a function?

pydry · on Aug 26, 2020

This bugs me as well. The people who argue about whether unit tests works will often redefine "unit" to mean anything from "the entire application" to a single function.

Colloquially, however, anything above the level of a self contained function or class is called something else - typically an integration test.

UK-Al05 · on Aug 26, 2020

Unit tests, integration tests were never fully defined that well. They change based on who you talk to. Especially integration tests.

Better definition is the small, medium and large tests from Google's testing blog.

bmh100 · on Aug 26, 2020

I've used unit tests on attributes and methods of a mock class instance, in order to test the class's construction method.

phkahler · on Aug 26, 2020

>> When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.

Its OK to dislike unit testing, but please don't redefine the term to avoid it. That's not helpful. Instead try to find the papers (by NASA or IBM?) That show unit testing finds only very few actual bugs, making it low value.

That said, there are IMHO some units worth testing more.

bcrosby95 · on Aug 26, 2020

They aren't redefining it though. The term has always been fuzzy. A common boundary of unit testing has always been a module's publicly exposed interface.

Izkata · on Aug 27, 2020

Wasn't the "clear boundary" definition the original, that was later interpreted as syntactic boundary (function/class) instead of semantic boundary (a chunk of business logic)?

sixdimensional · on Aug 26, 2020

Integration tests vs. unit tests.

humanrebar · on Aug 26, 2020

It's worth noting that the greater the granularity of your unit, the more likely you will be able to write tests targeted at bugs.

For instance, if you are only testing through the API of the service, you may have a hard-to-impossible time confirming you recover gracefully from certain exceptions. You generally don't have service-level APIs to throw various exceptions intentionally.

Point being, the overzealous testers do have some good points, even if they miss the forest for the trees.

gvjddbnvdrbv · on Aug 26, 2020

The bug triggered the exception. The test case encapsulates reproducing the bug. The bug is fixed. The bug can no longer be reproduced. As long as the bug remains fixed the test passes..

Another way of thinking about it. Unless your exceptions are a documented part of your API no one cares about them - they only care about the outcome they actually expect. If you construct tests that pass for positive outcomes or fail for any other outcome then your exceptions remain implementation details.

snewman · on Aug 26, 2020

I think GP is referring to nondeterministic exceptions. For instance, if the service under test depends on some other service, then you may need to test the scenario where the other service is unavailable. The exception is not triggered by a bug, it is triggered by an operational externality.

dropofwill · on Aug 26, 2020

For networking related problems you can deterministically control failures from the test using something like Toxiproxy. This can be especially useful if you’re working out a particular bug (e.g. gracefully handling a split brain situation or something).

A more general approach would be to just run your happy path tests while wrecking the environment (e.g. randomly killing instances, adding latency, dropping packets, whatever).

I’ve found that the latter often uncovers problems that you can use the former to solve.

Testing these sort of things with unit tests can work, but I’m more confident in tests that run on a ‘real’ networking stack, instead of e.g. mocking a socket timeout exception.

gvjddbnvdrbv · on Aug 27, 2020

This is more integration testing than unit testing. Certainly valuable but it shouldn't replace your unit tests.

dropofwill · on Aug 27, 2020

Yeah, reading it again I mixed up what you wrote with someone else.

gvjddbnvdrbv · on Aug 26, 2020

Then test the externalities. They probably have healthcheck endpoints....

Don't overthink everything. KISS

snewman · on Aug 26, 2020

Imagine I am implementing a service that queries 20 separate MySQL database servers to generate a report. (I'm not saying this is a good architecture, it's merely to illustrate the point.) I know that sometimes one of the MySQL instances might be down, e.g. due to a hardware failure. When this happens, my service is supposed to return data from the other 19 databases, along with a warning message indicating that the data is incomplete.

I would like to write a test to verify that my code properly handles the case where one of the MySQL instances has experienced a hardware failure. The point is that I can't do this as a strict black-box test where I merely issue calls to my service's public API.

[edit] And of course "testing the externalities" doesn't help here. I can test the MySQL instances and verify that they are all running, but that doesn't remove the need for my code to handle the possibility that at some point one of them goes down.

gvjddbnvdrbv · on Aug 26, 2020

First. Don't do this!

Second. You've done this/someone else has done this and now you need to maintain it (we've all been there!). In this case my original post holds. Your test suite mocks the databases for unit tests anyway right? So write some test(s) checking that when the various databases are down appropriate responses are given by your service.

VHRanger · on Aug 26, 2020

Yeah, sometimes for practical reasons you don't want to, or can't test directly across the API, as good testing practice would dictate.

Taken to the extreme, the philosophy I laid out leads to something that looks like "only integration and end-to-end tests" depending on your architecture.

So I try to be pragmatic whenever possible, but I think leaning towards BDD works better, after 18 months of doing it.

mmcnl · on Aug 26, 2020

Corrupted is not the right way to describe it. Both TDD and agile provide something amazing to management: a way to make the black hole called "software engineering" into something tangible and quantifiable. This ofcourse also makes it possible to execute some bad management practices on software engineering as well. People like to complain about agile (and apparently TDD) but I would argue that there are also huge success stories that don't make it to HN.

nickbauman · on Aug 26, 2020

Let's keep in mind this Fowler post has no science in it. It's just some "lauded practitioners'" view of TDD. Our industry is driven by this kind of discourse. There's very little good scientific research to answer questions like these in software in general. How much is any good on TDD? One paper? Two?

nickbauman · on Aug 28, 2020

Replying to myself (facepalm): just to make sure people know what I think. I've been doing TDD since 2002. I'm of the "You will take this out of my cold, dead hands" variety on the usefulness of TDD. That doesn't mean that there's any science that proves it.

markpeppers · on Aug 26, 2020

If you have a reference, I’d love to read about some of these success stories.

simiones · on Aug 26, 2020

> - When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.

I don't understand how this matches to the idea of "don't write one line of code unless it's necessary to make a failing test to pass".

Ma8ee · on Aug 26, 2020

What is it that you think is contradictory?

simiones · on Aug 27, 2020

All of the design advantages I've ever heard advertised for TDD come from writing a bit of test, a bit of code, a bit more test, a bit more code.

If instead you are writing a few tests and then an entire module, you're doing test-first development, but it's definitely not test-driven development as I've ever seen it presented by proponents.

Ma8ee · on Aug 31, 2020

You need to write a module. You write tests that tests the functionality of the module. But to pass those tests you need a couple of classes. You write tests for the classes (or at least the one you intend to work on next). The class needs some methods, you write tests for those. You write code for the methods until all methods tests pass. Hopefully the tests for the class pass, otherwise you might need to update methods and tests for those. Unt so weiter.

koonsolo · on Aug 26, 2020

> plus one test per bug fixed

I used to be a big proponent of this, until I suggested this to my manager: "We ran the stats on our bugtracker, and bugs coming back are really rare, so we like to focus our effort on testing with a higher ROI".

And in the end I agreed with him on this.

stkdump · on Aug 26, 2020

Yeah, I never understood this safeguarding against bugs resurfacing after being fixed once. I only saw bugs coming back at a company that didn't use version control and instead copied source code back and forth with a USB stick.

I can understand something like test driven bug fixing, where you basically create a simple test to reproduce the bug quickly and then fix the bug using that. In many cases that is the most efficient workflow.

The test succeeding can then serve as evidence of the bugfix (though it might not be enough). So if you have already written the test, you might as well leave it in there, because it doesn't bother anyone usually, and the tiny chance that someone breaks this exact same thing again, while tiny isn't non-existant.

But fixing a bug and then putting extra work just for a test, if there is another easier way to prove that the bug is fixed? No, thanks.

crazygringo · on Aug 26, 2020

> Yeah, I never understood this safeguarding against bugs resurfacing after being fixed once.

In my experience it's not infrequent for bugs to unknowingly only get half-fixed, not realizing that the true problem actually lies a level deeper, or has a mirror case, or whatever. Maybe a good example is that a parameter to a command is 0, the bugfix sets it to be 1, but a later bugfix changes it back to 0, when the correct bugfix would set it to be 0 in some cases and 1 in others.

And that if you fix the bug without a test, then the second related bug crops up a couple months later, and somebody else tries to fix it similarly naively, and can wind up re-introducing the first bug if there isn't a test for it.

Basically, in practice bugs have this nasty habit of clustering and affecting each other -- if the code was trickier than usual to write in the first place, it's going to be trickier than usual to fix, and more likely than usual to continue to have problems.

So keeping tests for fixed bugs is kind of like applying extra armor where your tank gets hit -- statistically, it's going to pay off.

tonyedgecombe · on Aug 26, 2020

I remember reading that half of all bug fixes introduce a new bug. I'm pretty sure it comes from these sort of scenarios.

colomon · on Aug 26, 2020

I have a perfect real-world example. About four years ago some of my code broke in certain cases. I came up with a fix that relied on a case-sensitive regex to check for those cases. I think I made it case-sensitive because I wanted to make sure it didn't trigger accidentally on something added in the future. And these case names had never changed, right?

Yep, now that I've spelled it out, what happened is obvious. Three years later, I got ordered to change one letter in these case names from lower case to upper case. Of course I didn't remember that I'd used a case-sensitive test against the names three years before. And bam, the bug was back, and as there was no test for it, I shipped code with the bug.

The good news is the bug was obvious as soon as the customers tried to compile my code, so it didn't cause any harm but embarrassment on my part. Even so, it took me a while to track down what was going on. Imagine my shock when I got into the code and found the fix I thought I needed to make was already there... but itself needed to be fixed!

Leherenn · on Aug 26, 2020

I have seen bugs resurface regularly, but it was always due to a missing merge/cherry-pick.

Tests wouldn't help in that case.

linearVexponent · on Aug 26, 2020

Avoiding too many layers of useless indirection is not hard: avoid refactoring effort that looks like a line by line, linear normalization of syntax.

Target tests and refactor for the confusing, incomplete chunks according to the team if there is one.

Semantic understanding of the system will improve. Instead of fetishizing code patterns, fetishize systemic understanding.

Personally, that habit has made it so I write better code from the start. It’s acted like a forcing function to reconsider if a habit is useful or just a habit.

My code went from deep OOP hierarchies of indirection, to composable, more functional, chunks. I import less, define fewer objects to begin with, compute a larger variety of useful objects, and can pull together features faster.

Have standardized machine? Nursing my own symbol library is where most of the fun is. With respect to Martin Fowler and the rest, who are great engineers in their own right, but this all smells like pandering to efficiency, importing a shared model, which impacts resiliency.

We shouldn’t have people think within the same context box everyday. Software philosophy has been taken over by the equivalent of popular bean counters. Focused on minimize the idea space for the perception of productivity gains. It’s cognitive indirection, IMO

brobdingnagians · on Aug 26, 2020

Tedu had an interesting blog post about testing recently:

https://flak.tedunangst.com/post/against-testing

What I got out of it was that tests for regressions are really good, but there are lots of considerations to make when determining which other tests to write, and why you are doing it. A good read nevertheless.

rad_gruchalski · on Aug 26, 2020

> When people talk about "unit tests", a unit doesn't refer to the common pattern of "a single class". A unit is a piece of the software with a clear boundary. It might be a whole microservice, or a chunk of a monolith that is internally consistent.

Yes, this. However, when we go in the direction of testing a complete service, is it not more convenient to call it an integration test?

KallDrexx · on Aug 26, 2020

My personal view is that it's an integration test only if the test actively involves external components (like a real database, some other api, etc..). I don't use integration test terminology if all those external components are mocked/stubbed out (even if it encompasses a large "unit").

I also call those mocked surface area tests "functional" tests, partially to differentiate them from non-mocked integration tests, and partially because people get too hung up on the "unit" term.

closeparen · on Aug 26, 2020

Often it's convenient to test the functions in your database access layer against a real or convincingly simulated database (e.g. sqlite). This is "like a unit test" in that it focuses on the details of individual functions, but "like an integration test" in that it crosses a system boundary rather than mocking it. I've not found it productive to use either term when talking about it.

digitalsushi · on Aug 26, 2020

The parent above you didn't necessarily imply that the whole microservice was being integrated before testing it. It is reasonable that a microservice can exist as whole, but nonintegrated with its other deliverable components, and remains sufficiently unit-testable.

It's also extremely likely that the microservice needs to be integrated to test it :)

closeparen · on Aug 26, 2020

Test coverage is the most conveniently accessible numeric value in the neighborhood of software quality, therefore it is software quality. Merging a change that reduces test coverage is reducing software quality. That's okay, sometimes we all have to cut corners, but defending this choice means you don't value quality, and are therefore not a culture fit.

/s

GordonS · on Aug 26, 2020

This is a rampant problem in the enterprise world, and it drives me nuts. I regularly have to work for clients who mandate using Sonar Qube (or/and other SAST tools) with strict policies, and also require 85%-100% test coverage on all projects regardless of how much sense it makes.

Predictably, teams have to spend way too much energy getting "waivers" approved by some ridiculous group, and inevitably end up creating tests that don't actually test anything, just to get the coverage figures up.

lowbloodsugar · on Aug 26, 2020

Can you give us an example of where it doesn't make sense?

closeparen · on Aug 26, 2020

In Go, it turns testing into a game of “how do I make this stdlib function return an error”?

noisem4ker · on Aug 26, 2020

Error handling taking up 50% of the code is definitely a problem with Go itself. For each meaningful line, there's an accompanying "if err != nil {return err}", so if you want coverage you end up testing this kind of boilerplate.

GordonS · on Aug 26, 2020

A trivial example would be getters and setters for plain old classes.

lowbloodsugar · on Aug 26, 2020

Surely the getters and setters of POJOs would be exercised in the tests of other classes? If not, why are they there?

dropofwill · on Aug 26, 2020

E.g. it’s public api in a library and they’re only used by other applications.

radicalbyte · on Aug 26, 2020

I've lost so much time making these arguments with people. Unfortunately the combination of dogma + an industry of sham consultants who have monetized that have created a monster.

geebee · on Aug 26, 2020

I think I won an argument with an interviewer, but I lost any chance at the job ;)