The day I started believing in unit tests

donatj · on Dec 19, 2023

I was ambivalent on unit tests until I discovered how much the mere act of writing them was finding bugs.

I very vividly remember writing a test for a ~40 loc class of pure functions. I started out thinking the exercise was a waste of time. This class is simple, has no mutable state, and should have no reason to change. Why bother testing it?

By the time I was done writing the test I had found three major bugs in that 40 loc, and it was a major aha moment. Truly enlightening.

linsomniac · on Dec 19, 2023

That reminds me of this time I wrote some code to add a method to Python string objects. The first reply to my issue on it in the bug tracker was "We shouldn't accept this, it's trivial to implement in your own code, see: XXXX". The second reply was "You have a bug in your implementation in the first reply."

It took a couple years to be accepted.

nerdponx · on Dec 20, 2023

Sounds familiar. Was that str.removeprefix?

linsomniac · on Dec 20, 2023

str.rsplit()

jamesu · on Dec 19, 2023

I bumped into so many corner case and dumb bugs on a recent python project that I'm even more of a unit testing enthusiast than before. Past a certain level of complexity they are definitely a net benefit.

throwaway2037 · on Dec 20, 2023

You mentioned Python. I struggle with the weak(er) typing. It is a bottomless well of bugs. Did your unit tests find type issues or (business) logic / state issues?

jamesu · on Dec 21, 2023

It was more business logic issues relating to conditional logic which needed to factor in a lot of edge cases. I don't think strong typing would have helped much in this case.

IanCal · on Dec 20, 2023

I had this kind of thing when it came to property based testing.

I built a property based testing library for ActionScript 3 (a fun journey in itself, with full test case reduction).

I was testing my testing library, and tried one of the most basic tests:

    For any object A
        A == decode(encode(A))

And discovered the fun of floating point values not being perfectly representable as strings.

The more significant one came from testing a UI library we'd built for TVs (so you has up, down, left, right as movement). We had in the spec that if you moved focus by pressing right, pressing left would take you back to the thing you were on before. The test looked something like

    For an arbitrary series of API calls generating the UI:
        For an arbitrary list of movements the user makes:
            If the focus changes, pressing the opposite direction moves your focus back where you came from

Now, this was actually very easy to write as a test, but it's extremely powerful. It found a bug in an interesting corner case, so I fixed it. Fixing the bug broke an existing unit test. I checked and the unit test correctly tested something in the spec.

The spec was inconsistent, but because we'd tested explicit examples it had never been spotted. I've been a convert since.

I've never implemented property tests in an existing project without finding some bug.

I don't recommend building your own property testing library unless you really want to, I highly recommend in python hypothesis: https://hypothesis.readthedocs.io/en/latest/

Tainnor · on Dec 20, 2023

Floating point values are definitely perfectly representable as strings - worst case, you just output the binary as a string - but what you may refer to is that many exact fractions can't be represented exactly as floats and/or that floating-point arithmetic doesn't obey "normal" rules of arithmetic (e.g. addition isn't associative).

joshjje · on Dec 20, 2023

Sometimes a framework will do weird things, like convert it to scientific notation, or round it, or add a ton of zeroes. Like BigDecimal in Java and Decimal in C#. They can sneak up on you.

Tainnor · on Dec 21, 2023

BigDecimals aren't floats, though. They're arbitrary precision rational numbers.

IanCal · on Dec 20, 2023

I'll clarify, as yes it's possible, but not using the built in conversions from strings to floats. That is, the following is not true for all x: Number(String(x)) == x

UncleMeat · on Dec 21, 2023

NaNs also cause problems here, since it is not equal to itself.

joshjje · on Dec 20, 2023

I think they are definitely valuable. My issue is I would have to disassemble most of the large legacy code base to be able to effectively test things, and a large part of it is UI (Windows Forms). I know you can do it, but just takes so much time and effort. We do have some though.

virtue3 · on Dec 19, 2023

according to some studies it's around 50 bugs per 1,000 LoC.

So that puts it at about 1 bug per 20 LoC

Some estimates are something as high as 75 bugs per 1,000 LoC but that many bugs don't make it out to customers because of QA / Developer actions. So yeah, right on the money.

throwaway290 · on Dec 19, 2023

Was this statically or dynamically typed language?

eddd-ddde · on Dec 19, 2023

I don't think it really matters. Major bugs are not "oh this can be null", major bugs are "this combination of preconditions yield a business logic edge case that wasn't accounted for". Static typing doesn't really help more than dynamic typing in these cases.

newZWhoDis · on Dec 19, 2023

Oh it absolutely matters. Rail apps in particular are full of “tests” a basic compiler would catch.

So you save a “ton of time” writing the code without “cognitive overhead of types”, then you spend 3x as long writing the tests

_dain_ · on Dec 19, 2023

>"this combination of preconditions yield a business logic edge case that wasn't accounted for". Static typing doesn't really help more than dynamic typing in these cases.

Depends on the language and the business logic. Types are a way of specifying preconditions and postconditions; a more expressive type system lets you rule out more edge cases by making illegal states unrepresentable.

In particular, I'm pretty sure it's not possible to have the thread bug from the article in Rust's type system.

makeitdouble · on Dec 19, 2023

Speaking only for me, but one can get used to the limits of the system and adapt (put another, I'd tend to introduce as many logical bugs, but in different ways)

For instance in a method that absolutely requires a specific type of object as a return, setting up a sacrificial default value to have the compiler happy, and actually build the inners of the function from there would be a normal course of action. That lets us run the code as we build it. But at the end if you forgot a case, it will still be a bug, except instead of having a wrong return type you get a wrong value. Weither it's better or not is up for debate.

fijiaarone · on Dec 20, 2023

It’s also not possible to write a non-trivial program in Rust.

cstrahan · on Dec 22, 2023

> It’s also not possible to write a non-trivial program in Rust.

You should probably clarify that it’s not possible for you to write non-trivial programs in Rust.

And that’s okay! No one comes into this world knowing how to do any thing, but we can all learn if we choose to.

If you have a particular sticking point, I’d be glad to give advice.

throwaway2037 · on Dec 20, 2023

"Servo is a web rendering engine written in Rust"

Is Servo non-trivial? I think yes.

lmm · on Dec 20, 2023

> Major bugs are not "oh this can be null"

Not true. Even today the majority of security bugs are simple buffer overflows, for example. The subtle bugs are more memorable, but that doesn't mean they're actually more common.

> major bugs are "this combination of preconditions yield a business logic edge case that wasn't accounted for". Static typing doesn't really help more than dynamic typing in these cases.

It absolutely does, if you put a little bit of effort into actually using it. Represent your business logic invariants in the type system, then the compiler won't let you accidentally violate them.

throwaway290 · on Dec 19, 2023

It matters enough that the question "does static typing dramatically reduce the benefits of unit testing" is an open question or at least seriously discussed in the industry. All other replies are about dynamic languages.

kibwen · on Dec 19, 2023

Having static types is like having an automatic, exhaustively comprehensive, efficient unit test generator for the classes of bugs whose units tests are the most boring and annoying to write and maintain. Static types don't prevent you from needing unit tests, they free you up to focus on writing tests that are actually interesting.

throwaway290 · on Dec 20, 2023

This, but depending on how good the type system is and how well it's used most remaining interesting tests may be integration aside from legit things like https://news.ycombinator.com/item?id=38692634

throwaway2037 · on Dec 20, 2023

I agree with your post. To be more specific, you can focus more on logic and state bugs.

noselasd · on Dec 19, 2023

I'm writing unit tests in rust and c++, I'm in the same boat as the op, often finding logical errors while writing the tests.

Not to mention peace of mind when you go and mess around with code you wrote 9 months ago - if you mess up or didn't think of a corner case, there's decent chance it'll get caught by existing tests.

randomdata · on Dec 19, 2023

A fully evolved type system (e.g. Coq) overlaps with unit testing. But the vast majority of languages people actually use have only partial type systems that require unit tests to stand in where the type system is lacking.

In practice, when you write those tests for where the type system is lacking, you end up also incidentally testing where there is type system coverage, so there likely isn't much for industry to talk about.

ska · on Dec 19, 2023

I don't think it is really unresolved. Static languages with sufficiently rich and modifiable type systems avoid a fraction of cases where you may well want a unit test, but it's not the overwhelming majority. Merely static helps too but not all that much. So while there is a reduction, it's a stretch to call it "dramatic".

throwaway290 · on Dec 19, 2023

> I don't think it is really unresolved

Well, you are currently participating in a thread that discusses this very question so there's that... and such threads are regular on HN.

I meant just that. People discuss it. What you want to say is that you have a strong opinion about it, that's OK and still possible with open questions

ska · on Dec 19, 2023

People discuss lots of things that are pretty well solved; i wouldn't equate "open question" with "lots of discussion".

I guess in this context I mean that the question of static vs. dynamic in unit testing turns out to not be that hard, but the questions like "what is a unit test" and "should we unit test at all" are much muddier. Because people are confused or argumentative about the latter, they tend to pull the former into discussions that don't really have much to do with static vs. dynamic.

pfdietz · on Dec 19, 2023

Why isn't the question "does adequate testing dramatically reduce the benefits of static typing" asked? Why is static typing privileged by default?

tshaddox · on Dec 19, 2023

Probably because using a static type system gives you those benefits "for free," whereas unit tests are things you need to write and maintain.

("For free" in scare quotes because of course there are always tradeoffs between different programming languages.)

pfdietz · on Dec 20, 2023

The counterargument is that for sufficiently reliable software extensive testing is needed anyway, and if you do that you find the type errors "for free".

If the testing is insufficient, as in practice it often surely is, then static typing would seem to be more valuable.

throwaway2037 · on Dec 20, 2023

"[A]dequate testing": Woah, I love this term. It is judgmental right from the start. It's right up there with "convention over configuration" and "well, if you wear your face mask _correctly_..." I once saw a blog post from an embedded programmer talking about how difficult it is to write "adequate" unit tests for embedded code. If you are writing code that will run in a heart pace maker or aeroplane auto-pilot/lander, it needs to be insanely well tested.

pfdietz · on Dec 20, 2023

It's not judgmental in some dubious way. "Adequate" here means adequate to ensure the software achieves a specified (and high) level of reliability.

If testing is enough to ensure the software is reliable, does the extra benefit of static typing make it worth the cost? This could be quite a lot of testing! The more testing that is done, the fewer type bugs remain that static typing would have found.

The argument you want to make against this is that static typing would have benefit beyond just finding bugs. The argument you don't want to make is that static typing reduces the need for testing.

mayoff · on Dec 19, 2023

I wouldn’t say static typing is privileged, but that testing is disadvantaged, because, in the words of Edsger Dijkstra, “Program testing can be used to show the presence of bugs, but never to show their absence!”

https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/E...

funcDropShadow · on Dec 20, 2023

We could paraphrase Dijkstra --- with some liberty -- to say, "Formal verification of programs can only show the specification is fulfilled, not that the specification is adequate."

pjc50 · on Dec 20, 2023

It's not "privileged", it's better.

funcDropShadow · on Dec 20, 2023

A bold claim, which is obviously false. To see that compare Clojure's dynamic type system with C's static type system. So perhaps a more nuanced stance is needed.

peterfirefly · on Dec 19, 2023

It is asked. A lot.

_dain_ · on Dec 19, 2023

I've come to believe that "statically typed" is too large a category to be useful in types-vs-tests discussions. Type system expressiveness varies enormously by language. Better just to ask "what language, specifically?".

donatj · on Dec 20, 2023

It was PHP, for what it's worth. I've had similar experiences in Go though.

corndoge · on Dec 19, 2023

I started believing in unit tests the day I finished my patch, ran the program and watched it work perfectly. I then grudgingly wrote a test, ran it and immediately observed it fail. One of the test inputs was some garbage input and that exposed a poorly written error handling path. Humbling!

I still hate writing them and it grates on my aesthetic sense to structure code with consideration to making it testable, but if we want to call ourselves engineers we need to hold ourselves to engineering standards. Bridge builders do not get to skip tests.

cfiggers · on Dec 19, 2023

> if we want to call ourselves engineers we need to hold ourselves to engineering standards. Bridge builders do not get to skip tests.

Bravo. We need more of this mindset in the world, and also more collective will to encourage it in one another.

YOU are the kind of engineer I want writing the code that goes in my Dad's pacemaker or the cruise control in my wife's car.

m3kw9 · on Dec 19, 2023

If you have worked in places where safety is critical, you wouldn’t say something so shallow. In those places they place human verification above all else. They have a thick book where you do a full run and is double checked, they don’t f around with unit tests and say this is good to go

ska · on Dec 19, 2023

I don't think anyone is saying unit tests and you are good to go are they?

In any critical system work, there are multiple layers and you can't really skip any of them.

It's also sort of meaningless to talk about such testing without requirements and spec to test against. Traceability is as much a part of it as any of the testing.

By the time you get to the "thick book/full run" as you put it, there has typically been a metric crapload of testing done already.

rcxdude · on Dec 19, 2023

For all the testing and paperwork, the code in safety critical applications is still frequently awful and riddled with bugs, following such a process does not actually guarantee good software, it mostly just means you need a lot of paper pushers.

Ancapistani · on Dec 19, 2023

Human verification is very expensive, compared to unit tests. It costs money to pay that human to do it, time for them to test it, time to describe issues found, time to send it back for a fix.

Unit tests - actually, all automated tests - are comparatively cheap. The developer can run them immediately.

All code will have bugs. The "trick" to building a productive development pipeline is to catch as many of those bugs as possible as early as possible, and thereby reduce both the temporal and monetary cost of resolving them.

diarrhea · on Dec 19, 2023

Interesting take. I find structuring code to be testable to make the code much clearer: mainly, by making dependencies explicit via dependency injection. I do that even if I don’t end up testing the code.

throwaway2037 · on Dec 20, 2023

I have an identical experience. What really made me understand dependency injection (in Java) was being forced to write 100% code coverage unit tests. To be clear: 100% code coverage was absolutely overkill for my domain, but it was a lesson about how to structure your code for dependency injection.

icedchai · on Dec 19, 2023

I work on a team with 75% of the developers don't write any tests. You never know what you're going to run into. Did I cause a new bug, or did I discover an old one? It's embarrassing when you discover completely non-working code paths.

I'm not even looking for a particularly high level of test coverage, just a basic "I wrote an API, here's a test (integration, unit, doesn't matter) for the happy path"-level of coverage would be great.

On the opposite end, I worked at places that wanted unit tests for every new function, even if it was something simple (like a getter or setter) used elsewhere. That's also terrible.

m3kw9 · on Dec 19, 2023

You could have watch the program and observed the failure, why need to write a test to be “surprised” it failed

therealdrag0 · on Dec 19, 2023

A huge benefit of tests is their regression protection against future changes, often by other engineers. You don’t get that from ad hoc manual execution.

nerdponx · on Dec 20, 2023

This. I like tests because it's hard to know if I accidentally broke something other than the thing I'm working on. Even software of modest size is at a level of complexity beyond what a human can QA in a reasonable amount of time for every revision. If you're at the point of needing a checklist with even 3 items on it, you're past the point of needing tests.

pfdietz · on Dec 19, 2023

CPU cycles are so cheap these days that this is a gross waste of manpower.

Even better than manually written unit tests are automatically generated property-based tests (which can be unit or integration tests). One can literally run millions of tests in a day this way, far, FAR more than could ever be manually verified. All because computation is so darned cheap now.

corndoge · on Dec 19, 2023

The program worked! As I wrote, the test tried a rare error condition and the handling for that was faulty.

Updated the original to clarify. Hope that helps!

teeray · on Dec 19, 2023

I still like one of the defining characteristics of Unit Tests (paraphrasing Michael Feathers from memory): they are fast and cheap to run. Sure, they might not perfectly simulate production like integration tests, but they also don’t take hours burning cash in cloud infrastructure while risking failure from unrelated races dealing with those dependencies. You can use Unit Tests to get to a place where you’re fairly confident that the integration tests will pass (making that whole expensive affair cheaper to run).

gbacon · on Dec 19, 2023

From Working Effectively With Legacy Code by Feathers, p. 14[0]:

Unit tests run fast. If they don’t run fast, they aren’t unit tests.

Other kinds of tests often masquerade as unit tests. A test is not a unit test if:

1. It talks to a database.

2. It communicates across a network.

3. It touches the file system.

4. You have to do special things to your environment (such as editing configuration files) to run it.

Tests that do these things aren’t bad. Often they are worth writing, and you generally will write them in unit test harnesses. However, it is important to be able to separate them from true unit tests so that you can keep a set of tests that you can run fast whenever you make changes.

[0]: https://www.google.com/books/edition/Working_Effectively_wit...

atticora · on Dec 19, 2023

My "unit tests" do hit the database and file system, and I have found and fixed many many problems during testing by doing so. I have found many other problems with those calls in production when I didn't do so. Yes, they make testing a lot slower. Our main app takes around 40 minutes to build which isn't good. I'd like it to be faster. But writing a bunch of separate integration tests to cover those functions would be a steep price. I can understand reasonable people choosing either approach.

berkes · on Dec 19, 2023

> My "unit tests" do hit the database and file system, and I have found and fixed many many problems during testing by doing so. I have found many other problems with those calls in production when I didn't do so.

No-one said that integration tests can't also be very valuable.

From the little context I get that you write integration tests, and that is fine. They are useful, valuable! But they are not unit-tests.

edit: on re-reading, I get the feeling that for you "integration tests" are a synonym for "end to end tests". But -at least in most literature- end-to-end tests are a kind of integration-test. But not all integration tests are end-to-end tests. In my software, I'll often have integration tests that swap out some adapter (e.g. the postgres-users-repository, for the memory-users-repository, or fake-users-repository. Or the test-payment for the stripe-payment) but that still test a lot of stuff stacked on top of each-other. Integration tests, just not integration tests that test the entire integration.

kcartlidge · on Dec 20, 2023

I find the easiest way (for me) to identify what type of test is running is by looking at responsibility.

- A unit test is single responsibility. It tests just that one bit of code with all dependencies stubbed, abstracted, mocked, or removed from consideration in some way.

- An integration test is multiple responsibility. It tests just one bit of functionality as a vertical slice through the stack (including [only] relevant dependencies) with all other aspects of the code base eliminated from consideration.

- An end to end test is full responsibility. It tests a complete path through all the functionality necessary to complete a 'journey' as a user/consumer of the app/tool.

So for example VAT calculation is unit tested as isolated code, invoicing is integration tested as a vertical slice including database etc, and order processing is end-to-end tested from placing the order through to its completion.

That's a simplified example and not always accurate depending upon the system and the team perspectives/opinions, but the principle of looking at responsibilities is a very useful rule of thumb.

MoreQARespect · on Dec 20, 2023

>No-one said that integration tests can't also be very valuable.

Integration tests are a better kind of default test because they bring value under pretty much all circumstances.

Nobody said that unit tests cant also be valuable and under just the right circumstances i.e. - complex stateless code behind a stable API.

Unit tests shine in that environment - theyre not impeded by their crippling lack of realism because that stable abstraction walls off the rest of reality. And theyre very fast.

Most code isnt parsers, calculation engines, complex string manipulation, etc. - but when it is unit tests really do kick ass.

They just suck so badly at testing code that doesnt fit that mold. Which, to be fair, is most code. I dont write a lot of parsers at work. My job involes moving data into databases, calling APIs, linking up message queues, etc.

berkes · on Dec 21, 2023

> Integration tests are a better kind of default test because they bring value under pretty much all circumstances.

I respectfully disagree. Not with the last part, that is true: they do bring value under pretty much all circumstances. But the first. Because integration tests come with (extremely) high costs.

They are expensive to run. They are much harder (costlier) to write. They are even harder (costlier) to maintain. The common pushback against tests -but they slow down our team a lot- applies to integration tests much more than to unit tests - factors more. And so on.

As with everything software-engineering, choosing what tests to write is a tradeoff. And taking all into consideration, e2e or integration tests are often not worth their investment¹. The testing pyramid fixes this, because testing always (well- it depends) is worth the investment. But when you skew the testing pyramid, or worse, make it an testing-ice-cream-cone, that ROI can and will often quickly become negative.

¹Edit: I meant to say that many of these e2e tests are not worth their investment. Testing edge-cases for example: if you need man-hours to write a regression test e2e style and then man-weeks to maintain and run that over coming years, it's often better ROI to just let that regression re-appear and have customers report it. Whereas a unit-test that captures this edge-case costs maybe an hour to write, milliseconds to run and hardly any time to maintain.

MoreQARespect · on Dec 21, 2023

>Because integration tests come with (extremely) high costs.

Unit tests usually have lower capex and higher opex. It often takes less time and effort to write a single lower level unit test but that test will require more frequent maintenance as the code around it evolves due to refactoring.

Integration often tests have higher capex because they rely upon a few complex integration points - e.g. to set up a test to talk to a faux message queue takes time. Getting playwright set up takes quite a chunk of up front time. Building an integration with a faux SMTP endpoint takes time. What is different is that these tools are a lot more generic so it's easier to stand on the shoulders of others and they are more reusable and it's easier to leverage past integrations to write future scenarios. E.g. you don't have to write your own playwright somebody already did that and once you have playwright integrated into your framework any web-related steps on future scenarios suddenly become much easier to write.

Whereas with unit tests the reusability of code and fixtures written in previous tests is generally not as high.

You have to also take into account the % of false negatives and false positives.

I find unit tests often raise more false positives because ordinary legitimate refactoring that introduced no bugs is more likely to break them. This reduces the payoff because you will have more ongoing test failures requiring investigation and maintenance work to mitigate this.

I also find that the % of false negatives is lower. This is harder to appreciate because you wouldn't ever expect, for instance, a unit test to catch that somebody tweaked some CSS that broke a screen or broke email compatibility with outlook, but these are still bugs and they are bugs that integration tests at a high level can catch with appropriate tooling but unit tests will never, ever, ever catch.

>But when you skew the testing pyramid, or worse, make it an testing-ice-cream-cone, that ROI can and will often quickly become negative.

The pyramid is an arbitrary shape that assumes a one size fits all approach works for all software. I think it is one of the worst ideas to ever grace the testing community. What was particularly bad was Google's idea that flakiness should be avoided by avoiding writing tests and applying good engineering practices to root out the flakiness. It was an open advertisement that they were being hampered by their own engineering capabilities.

I do agree that this is a cost/benefit calculation and if you shift some variable (e.g. E2E test tooling is super flaky and you've got good, stable abstractions to write your unit tests against, you've got a lot of complex calculations in your code), then that changes the test level payoff matrix, but I find that the costs and benefits work out pretty consistently to favor integration tests these days.

berkes · on Dec 21, 2023

> single lower level unit test but that test will require more frequent maintenance as the code around it evolves due to refactoring.

"more frequent" is not the same as "high maintenance costs" though.

Unit tests should only change when the unit-under-test (sut) changes. Which, for many units is "never". And for some with high churn, indeed, a lot.

Actual and pure e2e tests should never have to change except when the functionality changes.

But all other integration tests most often change whenever one of the components changes. I've had situations where whenever we changed some relation, or added a required-field in our database, we had to manually change hundreds of integration tests and their helpers. "Adding a required field" then became a chore of days of wading through integration tests¹.

With the unit-tests, only one, extremely simple test changed in that case. With the end-to-end-tests, also, hundreds needed manual changes. But that was because they weren't actual end-to-end tests, and did all sorts of poking around in the database. Worse: that poking-around wasn't abstracted even.

What I'm trying to convey with this example, is that in reality, unit-tests change often if the SUT has a high churn, but that those changes are very local and isolated and simple. Yet, in practice, with integration-tests, the smallest unrelated change to a "unit" has a domino-effect on whole sections of these tests. (And also that in this example, our E2E were badly designed and terribly executed)

¹Edit: one can imagine the pressure of management to just stop testing.

lokar · on Dec 19, 2023

And integration tests can also be fast

alemanek · on Dec 19, 2023

Test containers really help with this. Should still have the big system tests that run overnight but a set of integration tests using Test Containers to stand in for the infrastructure dependencies is awesome.

My team has a ton of those and they run inside a reasonable time frame (5min or so) but we still allow for excluding those from test runs so you can run just the unit tests.

darkteflon · on Dec 19, 2023

I hadn’t heard of Test Containers[1], but it looks really useful - thanks for the rec.

[1] https://testcontainers.com/

berkes · on Dec 19, 2023

Indeed! It's one of the reasons I like the adapter pattern (aka hexagonal architecture) so much.

Data flowing through some 100 classes and 300 conditionals then into a `memory-payments` and back takes mere milliseconds. "Memory payments" is then some silly wrapper around a hashmap with the same API as the full-blown production payments-adapter that calls stripe over HTTP. Or the same api as the "production adapter" that wraps some RDBMS running on the other end of the data-center.

gbacon · on Dec 19, 2023

No one suggests discarding integration tests. The end of the quoted excerpt from Feathers above explicitly supports them.

Tests that do these things aren’t bad. Often they are worth writing, and you generally will write them in unit test harnesses.

You hinted at the value of separating unit tests and integration tests with your observation about 40-minute unit test runs being way too slow. The process friction it creates means people will check in “obviously correct” changes without running the tests first.

Feathers continues:

However, it is important to be able to separate them from true unit tests so that you can keep a set of tests that you can run fast whenever you make changes.

You want your unit tests to be an easy habit for a quick sanity check. For the situation you described, I’d suggest moving the integration tests to a separate suite that run at least once a day. Ripping that coverage out of your CI may make you uncomfortable. That’s solid engineering intuition. Let your healthy respect for the likelihood of errors creeping in drive you to add at least one fast (less than one-tenth of a second to run is the rule of thumb from Feathers, p. 13) test in the general area of the slower integration tests.

The first one may be challenging to write. From here forward, it will never be easier than today. Putting it off is how your team got to the situation now of having to wait 40 minutes for the green bar. One test is better than no tests. Your first case with the fixture and mocks you create will make adding more fast unit tests easier down the road.

Yes, just as it’s possible to make mistakes in production code, it’s certainly possible to make mistakes in test code. Unit tests are sometimes brittle and over-constrain. Refactoring them is fair game too and far better than throwing them away.

fesc · on Dec 19, 2023

What would "integration tests" (that you don't write) look then in your opinion?

I ask because in my team we also a long time made the destinction between unit/integration based on a stupid technicality in the framework we are using.

We stopped doing that and now we mostly write integration tests (which in reality we did for a long time).

Of course this all arguing over definitions and kind of stupid but I do agree with the definition of the parent commenter.

atticora · on Dec 19, 2023

> What would "integration tests" (that you don't write) look then in your opinion?

In our local lingo, an integration test is one that also exercises the front-end, while hitting a fully functional back-end. So you could think of our "unit tests" as small back-end integration tests. If you think that way, we don't write very many pure unit tests, mostly just two flavors of integration tests. That works well for our shop. I'm not concerned about the impurity.

ebiester · on Dec 19, 2023

The "impurity" isn't the problem. The problem is that such integration tests take a longer time to run and in aggregate, it takes minutes to run your test suite. This changes how often you run your tests and slows down your feedback loop.

That's why you separate them: not because the integration test isn't valuable, but because it takes longer.

dllthomas · on Dec 19, 2023

I've never liked the conflating of target size and test time constraints.

I very much agree that there's benefit to considering the pace of feedback and where it falls in your workflow; immediate feedback is hugely valuable, but it can come from unit tests, other tests, or things which are not tests.

Meanwhile, some tests of a single unit might have to take a long time. Exhaustive tests are rarely applicable, but when they are it's going to be for something small and it's likely to be slow to run. That should not be in your tightest loop, but it is probably clearer to evict it simply because it is slow, rather than because it is not a unit test for being slow.

jcranmer · on Dec 19, 2023

I'd actually quibble a lot on this definition--if you want to unit test code that needs to do any of the first three things, well, you have to do them to test the code. I would say that a test for a network protocol that works by spinning up an echo server on a random port and having the code connect to that echo server is still a unit test for that network protocol.

In my definition, it would still be a unit test if it is fast and it talks to a database, a network server, or the file system, so long as such communication is done entirely in a "mock" fashion (it only does such communication as the test sets up, and it's done in such a fashion that tests can be run in parallel with no issues).

munksbeer · on Dec 21, 2023

I've dropped my archaic thinking on what constitutes a unit test or an integration test. I now seldom write what most people consider unit tests, in the "one class, one test" sense.

Instead, I classify units of logical coherence and write unit tests for those. I write financial trading systems - not hft, but still latency sensitive. I will test an order through our pipeline as a unit of work. This will necessarily touch multiple classes. So as an example, a test will cover orders which are accepted and then filled, orders which are rejected, and so on.

Many people would classify these as integration tests, and to be fair, I don't really care what you name them. To me these are much more valuable than the traditional "one class, one test" mechanism because it means I am free to refactor the internals of our pipeline as much as I want with very low impact on the test code.

One of the whole points of test code, that I think has been lost, is that it should be there to give you confidence in the correctness of your application under change. Writing "one class, one test" is a bad way to achieve this.

physicles · on Dec 19, 2023

I haven't found the distinction between unit tests and non-unit tests to be that useful in practice. The important questions are:

1. Is it kinda slow? (The test suite for a single module should run in a few seconds; a large monorepo should finish in under a minute)

2. Is there network access involved? (Could the test randomly fail?)

3. Do I need to set up anything special, like a database? (How easy is it for a new developer to run it?)

If the answer to any of those is Yes, then your test might fall in the liminal space between unit tests and integration tests -- they're unit-level tests, but they're more expensive to run. For example, data access layer tests that run against an actual database.

On the other hand, even if a test touches the filesystem, then it's generally fast enough that you don't have to worry about it (and you did make the test self-cleaning, right?) -- calling that test "not a unit test" doesn't help you. Likewise, if the database you're touching is sqlite, then that still leaves you with No's to the three questions above.

closeparen · on Dec 19, 2023

It is pretty excruciating (and IMO useless) to write true, DB-isolated unit tests for DB access layer code.

randomdata · on Dec 19, 2023

In other words, unit tests – unless, perhaps, you count an empty test as being a unit test – do not exist. "Talking" to structured sets of data, i.e. a database, is fundamental to computing.

_dain_ · on Dec 19, 2023

>1. It talks to a database.

>3. It touches the file system.

These are BS. Maybe they made sense in the beforetimes when when we didn't have Docker containers or SSDs, but nowadays there's no reason you can't stand up a mini test database as part of your unit test suite. It's way simpler than mocking.

icedchai · on Dec 19, 2023

100% this. A guy I work with just rebuilt our CI/CD pipeline and we're spinning up a database and all dependent services in containers. There are no mocks and it works great.

In previous lives, I worked on tests that mocked everything. We spent more time creating and maintaining mocks than writing the actual tests.

nerdponx · on Dec 20, 2023

I think those are still considered "integration" tests in the traditional way of thinking. The problem is that a lot of applications don't do much other than interact with external resources, to the point where isolated unit tests are rare or only cover non-critical code, while just about any substantive test is an "integration" test.

icedchai · on Dec 20, 2023

Yes, you are correct. I never cared much for the traditional way of thinking about tests. I've had more arguments with people claiming this is an integration test, not a unit test, write some mocks, etc. Outside of very specific pieces of code, you generally get more value from integration tests.

brightball · on Dec 19, 2023

That's a great rule of thumb.

AtlasBarfed · on Dec 20, 2023

THey are fast and cheap WHEN YOU FIRST WRITE THE CODE. Or, if you are the original author of the code.

The problem is, and there will be people that disagree with me, is that unit tests make refactoring of other people's code a lot harder.

STAY WITH ME!

If the unit tests were "good" and helped document what the code does, then they don't. You won't believe this, but in dogmatic high-breadth-coverage (low depth coverage), there are tons of test code that is SO TIED TO IMPLEMENTATION rather than interface than any monkeying of the presumed encapsulated logic breaks the unit tests, so you have double the things to fix.

You'll never believe what happens next. Some developer in some Agile thing that got assigned 2 unicorn shits for the task panics because the unit tests are SERIOUSLY slowing down his "velocity". So what does he do? Delete tests, change tests to make them work at any costs.

BeetleB · on Dec 19, 2023

> they are fast and cheap to run.

But expensive to write. Especially if you want them to be fast and cheap to run.

Cthulhu_ · on Dec 19, 2023

That's exactly it; QA is a layered / tiered / pyramid shaped process, the more you catch lower down, the less reliance there is on the upper layers, and the faster the development iterations.

agentultra · on Dec 19, 2023

It's all degrees. Unit tests are great at finding examples of errors or correct behaviours. However they prove nothing and they definitely do not demonstrate the absence of errors.

They are often sufficient for a great deal of projects. If all it takes to convince you it's "good enough," are a handful of examples then that's it. As much as you need and no less.

However I find we programmers tend to be a dogmatic bunch and many of us out there like to cling to our favoured practices and tools. Unit tests aren't the only testing method. Integration tests are fine. Some times testing is not sufficient: you need proof. Static types are great but fast-and-loose reasoning is also useful and so you still need a few tests.

What's important is that we sit down to think about specifying what it means for our programs to be, "correct." Because when someone asks, "is it correct?" You need to as, "with respect to what?" If all you have are some hastily written notes from a bunch of meetings and long-lost whiteboard sessions... then you don't really have an answer. Any behaviour is, "correct," if you haven't specified what it should be.

IggleSniggle · on Dec 19, 2023

The correct behavior is the behavior it has, of course! It is all the other programs that can't integrate with it that are wrong. /s

Unit tests or not, so much code I interact with is like this. This is part of why I love integration tests. It's usually at the point of integrating one thing with another that things go bad, where bugs occur, and where the intention of APIs are misunderstood.

I like unit tests for the way that they encourage composition and dependency injection, but if you're already doing that, then (unit tests or not) I prefer integration tests. They might not be as neat and tidy as a unit test OR as a e2e test, and they might miss important implementation edge cases, but well made integration tests can find all sorts of race conditions, configurations that we should help users avoid, and much much more precisely because they are looking for problems with the side effects that no amount of pure-function unit-tested edge-cased code will make obvious or mitigate.

Integration tests are like the "explain why" comments that everyone clamors for, but in reproducible demo form. "Show me" vs "tell me"

pfdietz · on Dec 19, 2023

> they prove nothing

If they fail, they prove there's a bug (in either the test or the code.)

This is like literally any other kind of test.

agentultra · on Dec 19, 2023

I meant "prove" as in, "mathematically proven." That is, for all possible inputs your theorem holds. A unit test is only an example of one such input. They don't prove there are no bad inputs.

There are many places in programming where you don't care to prove properties of your program to this level of rigor; that's fine -- sufficiency is an important distinction: if a handful of examples are enough to convince you that your implementation is correct with regards to your specifications, then it's good enough. Does the file get copied to the right place? Cool.

However there are many more places where unit tests aren't sufficient. You can't express properties like, "this program can share memory and never allows information to escape to other threads." Or, with <= 10 writers all transactions will always complete. A unit test can demonstrate an example of one such case at a time... but you will never prove things one example at a time.

pfdietz · on Dec 19, 2023

Well that's silly. You're not arguing against unit tests, you're arguing against testing in general. But empirically testing does improve the reliability of software. You're tossing the baby out with the bathwater in the interest of an academic ideal of proved correctness.

I will add that if you are verifying the correctness of some code, you have a formal specification of what the code is supposed to do. That is, you have a description of valid inputs, and a formula determining if the output is correct, given those inputs. But if you have those, you can also do property-based testing: generate random inputs that satisfy the input properties, and check that the output satisfies the output condition. This is all easier that proving correctness (it requires little or no manual intervention) and gives much of the same benefit.

agentultra · on Dec 19, 2023

Maybe you need to re-read my original comment. I’m arguing for sufficient evidence of correctness. Unit tests often provide that for a good deal of software. I think people ought to write more of them.

However I think folks do get dogmatic about testing and will claim things like, “all you need is unit/integration/types.”

erikpukinskis · on Dec 19, 2023

A bug in test code is not a real bug. It’s just a test that’s not giving you useful information. Lots of tests don’t give you useful information. Some that fail and some that pass.

It’s easy to write a test that doesn’t provide useful information across time. Harder to write a test that does.

baq · on Dec 19, 2023

There are times for constructive advice and there are times when '...so don't do that' is the right answer.

> It’s easy to write a test that doesn’t provide useful information across time.

I firmly believe this is one of those times.

(Currently my only issue with tests in the product I work on is that they take too long to run. Can't have it all.)

lanfeust6 · on Dec 19, 2023

> A bug in test code is not a real bug.

The bug is in the library that the test code invokes. Tests themselves should be simple, if there are bugs in tests they are trivial once you have a framework figured out.

lizardking · on Dec 20, 2023

They can prove that by building out some new feature you implicitly break some existing functionality that didn't occur to the person writing the requirements.

the_sleaze9 · on Dec 19, 2023

Good story.

I for one do not believe in Unit Tests and try to get LLM tooling to write them for me as much as possible.

Integration Tests however, (which I would argue is what this story is actually praising) are _critical components of professional software. Cypress has been my constant companion and better half these last few years.

HideousKojima · on Dec 19, 2023

Unit tests are useful for:

1) Cases where you have some sort of predefined specification that your code needs to conform to

2) Weird edge cases

3) Preventing reintroducing known bugs

In actual practice, about 99% of unit tests I see amount to "verifying that our code does what our code does" and are a useless waste of time and effort.

jkubicek · on Dec 19, 2023

> In actual practice, about 99% of unit tests I see amount to "verifying that our code does what our code does" and are a useless waste of time and effort.

If you rephrase this as, "verifying that our code does what it did yesterday" these types of tests are useful. When I'm trying to add tests to previously untested code, this is usually how I start.

    1. Method outputs a big blob of JSON
    2. Write test to ensure that the output blob is always the same
    3. As you make changes, refine the test to be more focused and actionable

DontchaKnowit · on Dec 19, 2023

The problem with this for me is that most of the time "verifying that our ccode does what it did yesterday" is not a useful condition : if you make no change to code, its going to do what it did yesterday. If you do make a change to the code, then you are probably intending for it to do something different, so now you have to change the test accordingly. It usually just means you have to make the same change in 2 different spots for every piece of unit-tested code you want to change.

jkubicek · on Dec 19, 2023

> If you do make a change to the code, then you are probably intending for it to do something different, so now you have to change the test accordingly. It usually just means you have to make the same change in 2 different spots for every piece of unit-tested code you want to change.

Sure, but that's how unit-tested code works in general.

randomdata · on Dec 19, 2023

> then you are probably intending for it to do something different

If you have decided that your software is going to do something different, you probably want to deprecate the legacy functionality to give the users some time to adapt, not change how things work from beneath them. If you eventually remove what is deprecated, the tests can be deleted along with it. There should be no need for them to change except maybe in extreme circumstances (e.g. a feature under test has a security vulnerability that necessitates a breaking change).

If you are testing internal implementation details, where things are likely to change often... Don't do that. It's not particularly useful. Test as if you are the user. That is what you want to be consistent and well documented.

pixl97 · on Dec 19, 2023

Then think of the unit test as the safety interlock.

HideousKojima · on Dec 19, 2023

I had to migrate some ancient VB.NET code to .NET 6+ and C#. The code outputs a text file, and I needed to nake sure the new output matched the old output. I could have written some sort of test program that would have been roughly equal in length to what I was rewriting to verify that any change I made didn't affect the output, and to verify that the internal data was the same at each stage. Or... I could just output the internal state st various points and the final output to files and compare them directly. I chose the latter, and it saved me far more work than writing tests.

If I need to verify that my code works the same as it did yesterday, I can just compare the output of today's code to the output of yesterday's code.

jkubicek · on Dec 19, 2023

I see two advantages in creating tests to check output

    1. You did the work to generate consistent output from the code as a whole, plus output intermediate steps. Writing those into a test lets future folks make use of the same tests.
    2. Having the tests in place prevents people from making changes that accidentally change the output

Don't get me wrong, tests that just compare two large blobs of output aren't fun to work with, but they _can_ be useful, and are an OK intermediate stage while you get proper unit tests written.

tshaddox · on Dec 19, 2023

> In actual practice, about 99% of unit tests I see amount to "verifying that our code does what our code does"

That’s my experience too, especially for things like React components. I see a lot of unit tests that literally have almost the exact same code as the function they’re testing.

jay_kyburz · on Dec 19, 2023

I've found that often find that a little bit of code that helps you observe that your code is working correctly is easier than checking that you code is working in the UI. The tests are a great place to store and easily run that code.

throwaway2037 · on Dec 20, 2023

3) Preventing reintroducing known bugs

When I was learning unit testing, my mentor taught me this strategy when fixing production bugs. First, write the unit test to demonstrate the bug. Second, fix the bug.

__MatrixMan__ · on Dec 19, 2023

That's what you get when you don't write the tests first.

HideousKojima · on Dec 19, 2023

That's just doubling your work. If you don't already have a spec, your unit tests and actual code are essentially the same code, just written twice.

__MatrixMan__ · on Dec 19, 2023

Determining which states are authentically hazardous and mocking data and adjacent services to make those states accessible at the press of a button is definitely not the same as writing code which handles those states appropriately.

__MatrixMan__ · on Dec 19, 2023

You should try switching it up. Write the tests and then ask the LLM to write the code that makes them pass. I find I'm more likely to learn something in this mode.

makeitdouble · on Dec 19, 2023

I'd argue having useable LLMs kind of brings out how problematic TDD is.

Imagine the dumbest function you have to write: a product A and a street address as input, and the shipping cost as an output.

How many test cases would you write to be absolutely sure that function actually does what you want it to do, and be confident it doesn't have weird exceptions that the LLM injected randomly ? I'd assume you'd still vet the code written by the LLM, but if it's hundreds of rambling lines doing weird stuff to get the right result, is it really faster than writing it yourself ?

__MatrixMan__ · on Dec 19, 2023

If it's hundreds of rambling lines then I'm not going to be able to get it past my linter anyhow (complexity thresholds), nor am I going to be able to get it past my team when they review it. So yeah, that's a problematic case, but it's one I'm going to have to refactor to avoid with or without an LLM in the loop.

throwaway2037 · on Dec 20, 2023

About the problems of TDD: Cedric Beust has a legendary blog post about it here: https://www.beust.com/weblog/the-pitfalls-of-test-driven-dev...

MoreQARespect · on Dec 20, 2023

TDD works best if you default to testing at the outer shell of the app - e.g. translating a user story into steps executed by playwright against your web app and only TDDing lower layers once youve used those higher level tests to evolve a useful abstraction underneath the outer shell.

It seems to be taught in a fucked up way though where you imagine you want a car object and a banana object and you want to insert the banana into a car or some other kind of abstract nonsense.

qup · on Dec 19, 2023

How effective is the LLM when used this way, compared to normally?

__MatrixMan__ · on Dec 19, 2023

I don't know what normally is, but I'd say it works pretty well.

Often the challenge is that the context for what you're trying to do is sprawling. There's just too many files and they're all too long: you end up exceeding the context window or filling it with 99% irrelevant stuff. Typically the structures you build for tests are smaller and more focused on the particular instance you're worried about, which I think is a better way to talk to an LLVM.

You don't have to explain, for instance, that there's data in production which doesn't match the schema in the code so it must be cautious to avoid running afoul of that difference. Instead you've mocked that data, so it's right there in the same code with the test that it's trying to make pass.

randomdata · on Dec 19, 2023

In reality, unit tests and integration tests are different names for the same thing. All attempts at post facto differentiation fall flat.

For example, the first result on Google states that a unit test calls one function, while an integration test may call a set of functions. But as soon as you have a function that has side effects, then it will be necessary to call other functions to observe the change in state. There is nothing communicated by calling this an integration test rather than a unit test. The intent of the test is identical.

cjfd · on Dec 19, 2023

No. Or maybe only if you also consider 'village' and 'city' to be the same thing.

pjc50 · on Dec 19, 2023

That's a good example, because while they're clearly different things, any distinction you draw between them such as "population > 100k" or "has cathedral" is always going to be a bit arbitrary, and many cities grew organically from villages in an unplanned manner.

randomdata · on Dec 19, 2023

Is it? Kent Beck, coiner of unit test, made himself quite clear that a unit test is a test that is independent (i.e. doesn't cause other tests to fail). For all the ridiculous definitions I have come across, I have never once heard anyone call an integration test a test that is dependent (i.e. may cause other tests to fail). In reality, a unit test and an integration test are the same thing.

The post facto attempts at differentiation never make sense. For example, another comment here proposed that a unit test is that which is not dependent on externally mutable dependencies (e.g. the filesystem). But Beck has always been adamant that unit tests should use the "real thing" to the greatest extent possible, including using the filesystem if that's what your application does.

Now, if one test mutates the filesystem in a way that breaks another test, that would violate what Beck calls a unit test. This is probably the source of confusion in the above. Naturally, if you don't touch the file system there is no risk of conflicting with other tests also using the filesystem. But that really misses the point.

__MatrixMan__ · on Dec 19, 2023

There are only two kinds of tests: ones you need and ones you don't. Splitting hairs over names of types of tests is only useful if you're trying to pad a resume.

WendyTheWillow · on Dec 19, 2023

Clusters of humans cohabiting a confined space? If you squint hard enough…

randomdata · on Dec 19, 2023

Implying that integration tests (or vice versa) are legally incorporated like cities, while unit tests are not? What value is there in recognizing a test as a legal entity? Does the, assuming US, legal system even allow incorporation of code? Frankly, I don't think your comparison works.

rileymat2 · on Dec 19, 2023

I think he is not implying a hard line legal standard but as connections and size increase different properties start to emerge humans start to differentiate things based on that, but there is a gradient so we can find examples that are hard to classify.

randomdata · on Dec 19, 2023

What differentiates a city from a village is legal status, not size. If size means population, there are cities with 400 inhabitants, villages with 30,000 inhabitants, and vice versa. It is not clear how this pertains to tests.

When unit test was coined, it referred to a test that is isolated from other tests. Integration tests are also isolated from other tests. There is no difference. Again, the post facto attempts to differentiate them all fall flat, pointing to things that have no relevance.

drewcoo · on Dec 19, 2023

> What differentiates a city from a village is legal status, not size

Fine. And legal status depends on location. There are many localities.

randomdata · on Dec 19, 2023

Yup, just like testing. Integration and unit tests depend on location as no two locations can agree on what the terms mean – because all definitions that attempt to differentiate them are ultimately nonsensical. At the end of the day they are the exact same thing.

troupo · on Dec 19, 2023

You should not be downvoted as heavily as you are now.

I feel like we did testing a disservice by specifying the unit to be too granular. So in most systems you end up with hundreds of useless tests testing very specific parts of code in complete isolation.

In my opinion a unit should be a "full unit of functionality as observed by the user of the system". What most people call integration tests. Instead of testing N similar scenarios for M separate units of code, giving you NxM tests, write N integrations tests that will test those for all of your units of code, and will find bugs where those units, well, integrate.

ejb999 · on Dec 19, 2023

I hate unit tests, though I am forced to write them to have my CI process not fail (I need 75% coverage or it won't build) - so I have written thousands and thousands of them in the last few years - the problem I have: not a single time that I had a unit test fail that resulted in me finding a bug in my code - all I ever find are bugs in my unit test code - so pretty much seems like a waste of time to me.

Either I am writing really good code so there are no bugs, or I am really bad a writing unit testing code to find those bugs.

solraph · on Dec 19, 2023

> Either I am writing really good code so there are no bugs, or I am really bad a writing unit testing code to find those bugs.

Honestly, having literally had a scenario 20 minutes ago where I wrote a test for what I figured was absolutely trivial code, and having it _fail_ on me and pick up a bug that I hadn't considered (and this is not the first time this has happened) I would strongly suggest it's the latter.

Do your unit tests the output and side effects exactly, or do they just make sure the function returned without error?

Just because function/method/whatever has 100% coverage, doesn't mean you have tested all the potential scenarios.

rileymat2 · on Dec 19, 2023

A sort of non-judgmental question, in your mind are you writing them to cover lines or exercise required behavior with an intent of proving the module is broken? I ask because it seems like requiring line coverage as a metric would have the effect you are describing.

mrweasel · on Dec 19, 2023

I've seen the same thing with comments. My boss required us to add comments to our code, to make it easier to read. That was all he asked, please add comments. My co-worker added comments like "Increase variable i by 1", while completely ignoring the 8 lines of spaghetti business logic above.

Similarly I've seen people add tests that will ensure that code coverage doesn't go down, but it doesn't actually do anything to help anyone. I'd argue that the issue is that have random coverage goals is a problem on its own, but it's the only way to force some people to write even to most basic of tests.

Izkata · on Dec 19, 2023

I've thought for a long time we present coverage backwards. We shouldn't be highlighting what is covered and getting that metric up, we should highlight what isn't covered and focus on getting that metric down (like how linting is done). Present it like "Hey, here's something that no one has looked at in-depth! It's a great place for bugs to be hiding!"

pixl97 · on Dec 19, 2023

Ah, Goodheart's law ruins everything.

nradov · on Dec 19, 2023

You aren't writing those unit tests just for yourself. You're writing them to help the next developer who works on that code avoid regression defects. That has value to your employer even if it seems like a waste of your time.

wharvle · on Dec 19, 2023

They’re way more useful in languages without static typing, where it’s easier to write or edit-in stupid bugs and not notice until the code runs. They’re not not useful in statically typed languages, just far less so.

rstuart4133 · on Dec 19, 2023

I don't particularly like writing unit tests either. However, one goal I set myself decades ago is less than one bug per kloc delivered. (I don't always achieve that.) If you seriously attempt to do that over many years unit tests become unavoidable. For me that path to unavoidable looked like this:

1. Decide that was the goal. Start measuring. Who knows, maybe I have to do nothing.

2. Discover that I seriously underestimated the number of bugs I produce. No one is available to review my code, changing languages (to something with a stronger type system) was out of the question. Only option appears to be methodical testing of every line of code.

3. Print (on dead trees - this was decades ago) all my code. Manually test all of it, running a highlighter down the listing. Continue to the entire code listing has a solid line of code down the left. It worked! Bugs per delivered lines of code dropped off a cliff. But geezz it took a loooong time, longer than writing the code. Finding an input sequence the exercised some code was surprisingly hard. And it was boring. Still success - and people who used it immediately noticed the improvement in quality and commented.

4. Then new features have to be added. Does this mean I have to test it all again? Surely not - I'll just test the bits I changed. Result: bugs per line of code rapidly start to ramp up again.

5. So I test everything again. That works, but it's horribly inefficient. I can spend days releasing a few line change. I can't get small changes anything like a reasonable time frame.

6. The solution is obvious to a programmer: automate your work, which in this case translates to writing code to do the tests. So I write unit tests for new code. It ends up being slower than doing manual tests :( Code size doubles. It works in keeping the bug count down, but can I afford to keep doing this time wise?

7. Then I add features to new code with unit tests. Initially this is painful - I move at perhaps 1/2 the speed because now I have to change at least twice the amount of code (actual and unit-test). Still, it's success bug count wise, and running unit tests is much, much faster than manually testing.

8. Keep doing this, notice that despite me having to change twice the lines of code (actual code and tests) when I'm adding new features I'm producing more debugged lines of code than before. Even more interesting, I'm fearlessly making much larger changes now. Turns out I'm using the unit tests as guard rails. I no longer minimise my changes to reduce the odds of introducing a bug.

9. Finally, notice that unit testing has changed the way I write code. And it's for the better. Code that's easy to test is also easy to understand. For example, it's much easier to test a pure function than something with side effects, so you minimise side effects as much as possible. You make your interfaces (which are the thing you focus your testing on) as small as possible. Testing deep inside a complex module is difficult and the torturous unit test code you have to write to do that is hard to understand. So you split things into smaller modules, each of which has those clean interfaces, to give your tests greater visibility. Turns out writing code so unit tests can understand it is oddly similar to writing code so that humans can easily understand it.

So it turns out unit testing is a win in every way, when done well. Well, except for the "it's boring" bit. (Numerous comments here hint at copilot being a real help.)

But writing code that's amenable to unit tests isn't something you do naturally. Fortunately just getting practice at writing unit tests is enough to teach the skill. Sadly, that takes time and frustration. While you learn your productivity will drop for a while. And worse, when writing new code adding unit tests is always slower than the old way. The pay back only comes when you later make changes.

kierenj · on Dec 19, 2023

I figure that the value in cases like this, is that you can have confidence things (even trivial things) will continue to work, when you decide to upgrade dependencies. Does that apply here? Would you feel more confident in that case than you would without the tests?

qup · on Dec 19, 2023

try writing the test first

bicijay · on Dec 19, 2023

We can't even have a consensus on what "unit" tests really are... Every company i work for has a different meaning for it. Some places consider a test "unit" when all the dependencies are mocked, some places consider a whole feature a "unit".

bedobi · on Dec 19, 2023

Kent Beck (the originator of test-driven development) defines a unit test as "a test that runs in isolation from other tests". This is very different from the popular and completely misguided definition of a unit test as "a test that tests a class/method in isolation from other classes/methods".

But it doesn't really matter if you want to call a given test an "integration" test or a "unit" test. The point of any test is to fail when something breaks and pass when something works, even if the implementation is changed. If it does the opposite in either of those cases, it's not a good test.

troupo · on Dec 19, 2023

Kent Beck also said [0] "I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence"

[0] https://stackoverflow.com/a/153565

randomdata · on Dec 19, 2023

> The point of any test is to fail when something breaks and pass when something works

The point of any test is to document API expectations for future developers (which may include you).

That the documentation happens to be self-validating is merely a nice side effect.

marcosdumay · on Dec 19, 2023

I'd rather drop the useless prefix instead of trying to fix it.

mdgrech23 · on Dec 19, 2023

I get the logic from the mocking camp e.g. we're not here to test this dependency we're just here to test this function/method whatever but when you mock you end up making assumptions about how that dependency works. This is how you end up w/ the case of all my tests are green and production is broke.

I think it's hard to beat e2e testing. The thing is e2e tests are expensive to write and maintain and in my opinion you really need a software engineer to write them and write them well. Now manual e2e testing is cheap and can be outsourced. All the companies I've worked for in the US have had testing departments and they did manage to write a few tests but they were developers and so to be frank they were really bad at writing them. They did probably 80 or 90% of their testing manually. At that point who we kidding. Just say you do manual testing, pay your people accordingly and move on.

domano · on Dec 19, 2023

So at work we would run tons of tests against the real service with a real database, seeding thousands of schemas to allow for parallel testing of tests that change state.

This takes 3 minutes, 1 if you use tmpfs. It only takes <10 seconds if you dont run writing tests.

These actually cover most real world use cases for a query-engine we maintain.

Unit tests have their place for pieces of code that run based on a well defined spec, but all in all this integration or component-level testing is really what brings me the most value always.

RaftPeople · on Dec 19, 2023

From research I've read, unit tests (whether automated or not) tend to catch around 30% of bugs whereas end to end testing and manual code review (believe it or not) each tend to catch around 80% of bugs.

yitchelle · on Dec 19, 2023

The gem of this story is the author is not running unit test in what most folks understand a unit test is. As he also pointed out, he is executing the tests on target so it is more of an integration tests rather than unit tests. In the test that he is doing, it brings in new categories of potential faults. ie scheduling issues, memory constraints, interrupts servicing,

mrweasel · on Dec 19, 2023

It is a little sad to see so many be so dismissive of unit tests. They aren't a universal solution, which seems to be why they are written off in many cases, but they make your life so much easier in so many cases.

If you need to mock out 80% of a system to make your unit test work, then yes, it's potentially pointless. In that case I'd argue that you should consider rewriting the code so that it's more testable in isolation, that will also help you debug more easily.

What I like to do is write tests for anything that's just remotely complex, because it make writing the actual code easier. I can continuously find mistakes by just typing "tox" (or whatever tool you use). Or perhaps the thing I'm trying to write functionality for is buried fairly deep in an application, then it's nice to be reasonably sure about the functionality before testing it in the UI. Unit tests just makes the feedback loop much shorter.

Unlike others I'd argue that MOST projects are suited for unit testing, but there might be some edge cases where they'd provide no value at all.

On caveat is that some developers write pretty nasty unit tests. Their production code is nice and readable, but then they just went nuts in the unit tests and created a horrible unmaintainable mess, I don't get why you'd do that.

appplication · on Dec 19, 2023

> If you need to mock out 80% of a system to make your unit test work, then yes, it's potentially pointless. In that case I'd argue that you should consider rewriting the code so that it's more testable in isolation, that will also help you debug more easily.

This is also where the dogma of “only test public methods” fails. If your public method requires extensive mocking but the core logic you need to protect is isolated in a private method that requires little mocking, the most effective use of developer resources may be to just test your private method.

> On caveat is that some developers write pretty nasty unit tests. Their production code is nice and readable, but then they just went nuts in the unit tests and created a horrible unmaintainable mess, I don't get why you'd do that.

I have also seen this a lot and usually it’s when people try to add too much DRY to their unit tests. I recall being as a junior dev told by our lead that boilerplate and duplication in tests is not strictly a bad thing, and I have generally found this to be true over the years. Tests are inherently messy and each one is unique. Trying to get clever with custom test harnesses to reduce duplication is more likely to lead to maintainability issues than it is test nirvana. And if your code requires so much setup to test, that is an indicator of complexity issues in the code, not the test.

feoren · on Dec 19, 2023

> If your public method requires extensive mocking but the core logic you need to protect is isolated in a private method that requires little mocking, the most effective use of developer resources may be to just test your private method.

You're looking at the tested code as immutable. If you're not allowed to touch the code being tested, then yes, you'll sometimes need to test private methods, and that is fine. "Don't test private methods" is actually more about how to architect the primary code, not a commandment on the test code. If you find that you're having to do extensive mocking to call a public method in order to test the functionality in some private method, that's a major smell indicating that your code could be organized in a better way.

randomdata · on Dec 19, 2023

> the most effective use of developer resources may be to just test your private method.

While there is nothing wrong with testing an internal function if it helps with development, so long as it clearly identifiable as such, you still need the public interface tests to ensure that the documented API is still conformant when the internals are modified. Remember that public tests are not for you, they are for future developers.

This is where Go did a nice job with testing. It provides native language support for "public" and "private" tests, identifying to future developers which can be deleted as implementation evolves and which must remain no matter what happens to the underlying implementation.

BeetleB · on Dec 19, 2023

> If your public method requires extensive mocking but the core logic you need to protect is isolated in a private method that requires little mocking, the most effective use of developer resources may be to just test your private method.

When I did unit tests in C++, I found a simpler (and better) solution: Shrink the class by splitting it up into multiple classes. Often the logic in the private methods could be grouped into 1-3 concepts, and it was quite logical to create classes for each of them, give them public methods, and then have an instantiation of that class as a private member.

Now all you need to do is write unit tests for those new classes.

Really, it led to code that was easier to read - the benefit was not just "easier to test". Not a single colleague (most of whom did not write unit tests) complained.

I've yet to run into a case where it was hard to test private behavior via only public methods that couldn't be solved this way.

aleksiy123 · on Dec 19, 2023

What this guy said. Public APIs don't need to be public to everyone. They can public but only visible to internally within a package.

You can split things out decompose some things even if its just some util functions and start sending out chunks of code for review. It doesn't even necessarily have to be seperate files.

Your reviews will be faster and smoother too.

6DM · on Dec 19, 2023

I have tried to evangelize unit testing at each company I've worked at and most engineers struggle with two things.

The first is getting over the hurdle of trusting that a unit test is good enough, a lot of them only trust an end-to-end test which are usually very brittle.

The second reason is, I think, a lot of them don't know how to systematically breakdown test into pieces to validate e.g. I'll do a test for null, then a separate test for something else _assuming_ not null because I've already written a test for that.

The best way I've been able to get buy-in for unit testing is giving a crash course on a new structure that has a test suite per function under test. This allows for a much lower loc per test that's much easier to understand.

When they're ready I'll give tips on how to get the most of their tests with things like, boundary value analysis, better mocking, IoC for things like date time, etc.

MoreQARespect · on Dec 19, 2023

I've evangelized against unit testing at most companies I work at, except in one specific circumstance. That circumstance is complex logic in stateless code behind a stable API where unit testing is fine. I find this usually represents between 5-30% of most code bases.

The idea that unit testing should be the default go to test I find to be horrifying.

I find that unit test believers struggle with the following:

1) The idea that test realism might actually matter more than test speed.

2) The idea that if the code is "hard to unit test" that it is not necessarily better for the code to adapt to the unit test. In general it's less risky to adapt the test to the code than it is the code to the test (i.e. by introducing DI). It seems to be tied up with some sort of idea that unit testability/DI just makes code inherently better.

3) The idea that integration tests are naturally flaky. They're not. Flakiness is caused by inadequate control over the environment and/or non-deterministic code. Both are fixable if you have the engineering chops.

4) The idea that test distributions should conform to arbitrary shapes for reasons that are more about "because google considered integration tests to be naturally flaky".

5) Dogma (e.g. uncle bob or rainsberger's advice) vs. the idea that tests are investment that should pay dividends and to design them according to the projected investment payoff rather than to fit some kind of "ideal".

randomdata · on Dec 19, 2023

> The idea that unit testing should be the default go to test I find to be horrifying.

Kent Beck, who invented the term unit test, was quite clear that a unit test is a test that exists independent of other tests. In practice, this means that a unit test won't break other tests.

I am not sure why you would want anything other than unit tests? Surely everyone agrees that one test being able to break another test is a bad practice that will turn your life into a nightmare?

I expect we find all of these nonsensical definitions for unit testing appearing these days because nobody is writing anything other than unit tests anymore, and therefore the term has lost all meaning. Maybe it's simply time to just drop it from our lexicon instead of desperately grasping at straws to redefine it?

> It seems to be tied up with some sort of idea that unit testability/DI just makes code inherently better.

DI does not make testing or code better if used without purpose (and will probably make it worse), but in my experience when a test will genuinely benefit from DI, so too will the actual code down the line as requirements change. Testing can be a pretty good place for you to discover where it is likely that DI will be beneficial to your codebase.

> The idea that test realism might actually matter more than test speed.

Beck has also been abundantly clear that unit tests should not resort to mocking, or similar, to the greatest extent that is reasonable (testing for a case of hardware failure might be place to simulate a failure condition rather than actually damaging your hardware). "Realism" is inherit to unit tests. Whatever it is you are talking about, it is certainly not unit testing.

It seems it isn't anything... other than yet another contrived attempt to try and find new life for the term that really should just go out to pasture. It served its purpose of rallying developers around the idea of individual tests being independent of each other – something that wasn't always a given. But I think we're all on the same page now.

RaftPeople · on Dec 19, 2023

> Kent Beck, who invented the term unit test, was quite clear that a unit test is a test that exists independent of other tests

Kent Beck didn't invent the term "unit test", it's been used since the 70's (at minimum).

> I am not sure why you would want anything other than unit tests?

The reason is to produce higher quality code than if you rely on unit tests only. Generally, unit tests catch a minority of bugs, other tests like end to end testing help catch the remainder.

randomdata · on Dec 19, 2023

> other tests like end to end testing help catch the remainder.

End-to-end tests are unit tests, generally speaking. Something end-to-end can be captured within a unit. The divide you are trying to invent doesn't exist, and, frankly, is nonsensical.

RaftPeople · on Dec 20, 2023

> End-to-end tests are unit tests, generally speaking.

Generally, in the software industry, those terms are not considered the same thing, they are at opposite ends of a spectrum. Unit tests are testing more isolated/individual functionality while the end to end test is testing an entire business flow.

Here's an example of one end to end test (with validations happening at each step):

1-System A sends Inventory availability to system B

2-The purchasing dept enters a PO into system B

3-System B sends the PO to system A

4-System A assigns the PO to a Distribution Center for fulfillment

5-System A fulfills the order

6-System A sends the ASN and Invoice to system B

7-System B users process the PO receipt

8-System B users perform three way match on PO, Receipt and Invoice documents

randomdata · on Dec 20, 2023

> Here's an example of one end to end test

Bad example, perhaps, but that's also a unit test[1]. Step 8 is dependent on the state of step 1, and everything else in between, so it cannot be reduced any further (at last not without doing stupid things). That is your minimum viable unit; the individual, isolated functionality.

[1] At least so long as you don't do something that couples it with other tests, like modifying a shared database in a way that that will leave another test in an unpredictable state. But I think we have all come to agree that you should never do that – going back to the reality that the term unit test serves no purpose anymore. For all intents and purposes, all tests now written are unit tests.

RaftPeople · on Dec 20, 2023

Every step updates shared databases (frequently plural). In the case of the fulfillment step, the following systems+databases were involved: ERP, WMS, Shipping.

Typically, in end to end testing, tests are run within the same shared QA system and are semi-isolated based on choice of specific data (e.g. customers, products, orders, vendors, etc.). If this test causes a different test to fail, or vice-versa, then you have found a bug.

If we call that entire sequence of steps a "unit" test, would you start with testing the entire sequence of steps, or would you recommend testing the individual steps first?

And if we did test the individual steps first, we would give that testing a different name? Like maybe "sub-unit" testing?

randomdata · on Dec 20, 2023

> Every step updates shared databases (frequently plural).

That's fine. It all happens within a single unit. A unit should mutate shared state within the unit. Testing would be pretty much useless without.

> If we call that entire sequence of steps a "unit" test, would you start with testing the entire sequence of steps, or would you recommend testing the individual steps first?

For all intents and purposes, you can't test the individual steps. All subsequent steps are dependent on the change in inventory state in step 1. And the product of step one is undoubtedly internal state, so there is no way for the test to observe the state change in isolation (unless you do something stupid). You have to carry out the subsequent steps to be able to infer that the inventory was, in fact, updated appropriately.

After all, the whole reason you are testing those steps together is because you recognize that they represent a single instance of functionality. You don't really get to choose (unless you choose to do something stupid, I suppose).

> And if we did test the individual steps first, we would give that testing a different name?

If the individual steps can be tested individually (ignoring a case of you doing something stupid), it's not actually and end-to-end process, so your example would make no sense. Granted, we have already questioned if it is a bad example.

RaftPeople · on Dec 20, 2023

> For all intents and purposes, you can't test the individual steps.

Sure you can, and we did (that is a real example of an end to end test from a recent project) which also included testing the individual steps in isolation, which was preceded by testing the individual sub-steps/components of each step (which is the portion that is typically considered unit testing).

For example, step 1 is broken down into the following sub-steps which are all tested in isolation before testing the combined group together:

1.1-Calculate the current on hand inventory from all locations for all products

1.2-Calculate the current in transit inventory for all locations for all products

1.3-Calculate the current open inventory reservations by business partner and products

1.4-Calculate the current in process fulfillments by business partner and product

1.5-Resolve the configurable inventory feed rules for each business partner and product (or product group)

1.6-Using the data in 1.1 through 1.5, resolve the final available qty for each business partner and product

1.7-Construct system specific messages for each system and/or business partner (in some cases it's a one to one between business partner and system, but in other cases one system manages many business partners).

1.7.1-Send to system B

1.7.2-Send to system C

1.7.3-Send to system D

1.7.N-etc.

> And the product of step one is undoubtedly internal state, so there is no way for the test to observe the state change in isolation

The result of step 1 is that over in software system B (an entirely different application from system A) the inventory availability for each product from system A is properly represented in the system. Meaning queries, inquiries, reports, application functions (e.g. Inventory Availability by Partner), etc. all present the proper quantities.

To validate this step, it can be handled one of two ways:

1-Some sort of automated query that extracts data from system B and compares to the intended state from step 1 (probably by saving that data at the end of that step).

or 2-A user manually logs in to system B and compares to the expected values from step 1 (again saved or exposed in some way). This method works when the number of products is purposefully kept to a small number for testing purposes.

> If the individual steps can be tested individually (ignoring a case of you doing something stupid), it's not actually and end-to-end process, so your example would make no sense. Granted, we have already questioned if it is a bad example.

Yes the individual test can be tested in individually. Yes it is an end to end test.

> Granted, we have already questioned if it is a bad example.

It's a real example from a real project and it aligns with the general notion of an end to end test used in the industry.

More importantly, combined with the unit tests, functional tests, integration tests, performance tests, other end to end tests and finally user acceptance tests, it contributed to a successful go-live with very few bugs or design issues.

MoreQARespect · on Dec 19, 2023

>Kent Beck, who invented the term unit test, was quite clear that a unit test is a test that exists independent of other tests

I vaguely remember him also complaining that there were too many conflicting definitions of unit tests.

Maybe that can be solved with another definition?

https://xkcd.com/927/

or maybe not.

I dont know many people who would describe a test that uses playwright and hits a database as a unit test just because it is self contained. If Kent Beck does then he has a highly personalized definition of the term that conflicts with its common usage.

The most common usage is, I think, an xUnit style test which interacts with an app's code API and mocks out, at a minimum, interactions with systems external to the app under test (e.g. database, API calls).

He may have coined the term but that does not mean he owns it. If I were him Id pick a different name for his idiosyncratic meaning than unit test - one that isnt overburdened with too much baggage already.

randomdata · on Dec 20, 2023

> He may have coined the term but that does not mean he owns it.

Certainly not, but there is no redefinition that is anything more than gobbledygook. Look at the very definition you gave: That's not a unique or different way to write tests. It's not even a testing pattern in concept. That's just programming in general. It is not, for example, unusual for you to use an alternative database implementation (e.g. an in-memory database) during development where it is a suitable technical solution to a technical problem, even outside of an automated test environment. To frame it as some special unique kind of test is nonsensical.

If we can find a useful definition, by all means, but otherwise what's the point? There is no reason to desperately try to save it with meaningless words just because it is catchy.

MoreQARespect · on Dec 20, 2023

The definition I gave is the one people use. Hate or love it youre not going to change it to encompass end to end tests and neither will Kent Beck. It's too embedded.

randomdata · on Dec 20, 2023

> youre not going to change it

I might. I once called attention to the once prevailing definition of "microservices" also not saying anything. At the time I was treated like I had two heads, but sure enough now I see a sizeable portion (not all, yet...) of developers using the updated definition I suggested that actually communicates something. Word gets around.

Granted, in that case there was a better definition for people to latch onto. In this case, I see no use for the term 'unit test' at all. Practically speaking, all tests people write today are unit tests. 'Unit' adds no additional information that isn't already implied in 'test' alone and I cannot find anything within the realm of testing that needs additional differentiation not already captured by another term.

If nothing changes, so what? I couldn't care less about what someone else thinks. Calling attention to people parroting terms that are meaningless is entirely for my own amusement, not some bizarre effort to try and change someone else. That would be plain weird.

6DM · on Dec 19, 2023

Well, I don't regard unit tests as the one true way. I don't enforce people on my team do it my way. When I get compliments on my work, I tend to elaborate and spread my approach. That's what I mean by evangelize, not necessarily advocating for a specific criteria to be met.

I find that integration tests are usually are flaky, its my personal experience. In fact, at my company, we just decided to completely turn them off because they fail for many reasons and the usual fix is to adjust the test. If you have had a lot of success with them, great. Just for the record, I am not anti-integration or end-to-end test. I think they have a place and just like unit tests shouldn't be the default, neither should they.

Here are the two most common scenarios where I find integration (usually end-to-end called integration) tests become flaky:

1) DateTime, some part of business logic relies on the current date or time and it wasn't accounted for.

2) Data changes, got deleted, it expired, etc. and the test did not first create everything it needed before running the test.

Regarding your points,

1) "realism" that is what I referred to as trusting that a unit test is good enough. If it didn't go all the way to the database and back did it test your system? In my personal work, I find that pulling the data from a database and supplying it with a mock are the same thing. So it's not only real enough for me, but better because I can simulate all kinds of scenarios that wouldn't be possible in true end-to-end tests.

2) These days the only code that's hard to test is from people that are strictly enforcing OOP. Just like any approach in programming, it will have it's pros and cons. I rarely go down that route, so testing isn't usually difficult for me.

3) It's just been my personal experience. Like I said, I'm not anti-integration tests, but I don't write very many of them.

4) I didn't refer to google, just my personal industry experience.

5) Enforcing ideal is a waste of time in programming. People only care about what they see when it ships. I just ship better quality code when I unit test my business logic. Some engineers benefit from it, some harm themselves in confusion, not much I can do about it.

Most of this is my personal experience, no knock against anyone and I don't force my ideals on anybody. I happily share what and why things work for me. I gradually introduce my own learning over time as I am asked questions and don't seek to enforce anything.

Happy coding!

feoren · on Dec 19, 2023

> I'll do a test for null, then a separate test for something else _assuming_ not null because I've already written a test for that.

Honestly, this pedantry around "unit tests must only test one thing" is counter-productive. Just test as many things as you can at once; it's fine. Most tests should not be failing. Yes, it's slightly less annoying to get 2 failed tests instead of 1 fail that you fix and then another fail from that same test. But it's way more annoying to have to duplicate entire test setups to have one that checks null and another that checks even numbers and another that checks odd numbers and another that checks near-overflow numbers, etc. The latter will result in people resting writing unit tests at all, which is exactly what you've found.

If people are resisting writing unit tests, make writing unit tests easier. Those silly rules do the opposite.

6DM · on Dec 19, 2023

Just to clarify, I am not advocating for tests to only test one thing, rather that after you have tested for one scenario you don't need to rehash it again in another test.

Breaking a test down helps to clarify what you're testing and helps to prevent 80 loc unit tests. When I test for multiple things, I look for the equivalent of nunit's assert.multiple in the language that I'm in.

The approach I advocate for typically simplifies testing multiple scenarios with clear objectives and tends to make it easier when it comes time to refactor/fix/or just delete a no longer needed unit test. The difference I find, is that now you know why, vs having to figure out why.

benrutter · on Dec 19, 2023

I agree! I see a lot of stuff like "static typing is better than tests", "tests don't prove your code is bug free" etc as if tests somehow have to be a silver bullet to justify their existence.

I definitely think its ok for the overall standard of test code to be lower than production code though (I guess horrible unmaintanable tests is maybe a bit much). A few reasons I can think of off the top of my head:

- You can easily delete and rewrite individual tests without any risk

- You don't ship your tests, bugs and errors in tests suites have a way smaller chance of causing downstream issues for customers (not the same as no chance but definitely a lot smaller)

- I'd rather have a messy, hard to understand test than no test at all in most cases. That isn't true of production code at all, there are features that if they can't be produced in a coherent way with the rest of the codebase just don't have the value add to justify the maintenance burden.

gorgoiler · on Dec 19, 2023

I often think of unit tests as being programmable types, like Eiffel pre/post conditions or functional languages with types like Even and Odd.

For example, in double(x) -> y you can use types to say x belongs to the set of all integers and y must also must be in that set, but that’s about all you can say in Python.

Unit testing lets you express that y must be an even number with the same sign as x. It is like formal verification for the great unwashed, myself included.

feoren · on Dec 19, 2023

But you literally cannot possibly test that assertion for all x. Let's take a slightly harder problem:

prove (or at least test conclusively) that for all integer x, the output y of the following function is always even:

    y = x^2 + x + 2

There is essentially no way to prove this for all x by simply testing all integers. If your integers are 64-bit, you don't have enough time in the lifespan of the universe.

On the other hand, you could simply reason through the cases: if x is even, then all terms are even. If x is odd, then x^2 is also odd, and x^2 + x = odd + odd = even. So you're done.

This is what people mean when they say "tests don't prove your code is correct" -- it's almost always better to be able to read code and prove (to some degree) that it's correct. It's really nothing like static types, which are also constructive proofs that your code is not incorrect in specific ways. (That is: it proves that your code is not [incorrect in specific ways], not that your code is [not incorrect].)

Once you prove your code correct, you can often write efficient tests with cases at the correct boundary points to make sure that proof stays correct as the code changes.

gorgoiler · on Dec 20, 2023

Could you do this in Python (using positive integers as an example of a type rather than even numbers)?:

  class N(int):
    def __init__(self, z: int):
      assert z > 0
      super().__init__(z)

  def log2(n: N) -> float:
    …

  log2(N(32))   # 5.0
  log2(32)      # type error
  log2(N(-32))  # runtime error

You are still relying on the runtime to detect errors and it’s annoying to have to cast all ints to Ns, but you at least won’t ever take the log of a negative number.

benrutter · on Dec 20, 2023

That's a cool trick! Depending on your use case though it might not go far enough since the new "type" is really just an integer with an assert. It won't be picked up by static type checkers and there's nothing stopping you doing this:

``` x = N(2) x -= 10 ```

I think x still winds up being less than 0 here?

Its probably easier just to add an "assert" with a friendly message at the start of the log function.

randomdata · on Dec 19, 2023

> But you literally cannot possibly test that assertion for all x.

Hence why he states it is formal verification for the "unwashed masses". The "washed" will use a language with a type system that is advanced enough to express a formal proof, but most people can't hack it, and thus use languages with incomplete type systems and use testing to try and fill in the gaps.

lamontcg · on Dec 19, 2023

> If you need to mock out 80% of a system to make your unit test work, then yes, it's potentially pointless. In that case I'd argue that you should consider rewriting the code so that it's more testable in isolation, that will also help you debug more easily.

demands that people rewrite all their production code in service of unit tests are probably a big reason of why a lot of programmers don't unit test.

> On caveat is that some developers write pretty nasty unit tests. Their production code is nice and readable, but then they just went nuts in the unit tests and created a horrible unmaintainable mess, I don't get why you'd do that.

probably they write bad unit tests because they can't rewrite all their code but they have a mandate that all changes must be unit tested.

if strict purity could be relaxed and programmers were allowed to write more functionalish unit tests with multiple collaborators under test then there would likely be less resistance to testing and there shouldn't be any mocking-hell tests written.

higher level functional/integration tests also shouldn't be missed since your unit tests are only as good as your understanding of the interfaces of the objects and people write buggy unit tests that allow real bugs to slip between the cracks.

zihotki · on Dec 19, 2023

Dismissive of unit tests or TDD? I don't know any peer developer who is dismissive of any form of unit tests. But there are a plenty who are dismissive of TDD.

As for the quality of tests, that's usually a combination of factors and capacity is one of them. At the end if PO's don't see business value in tests, they won't be prioritized.

feoren · on Dec 19, 2023

For me, the biggest point is that unit tests are not a stand-in for understanding your code. It's like that quote about driving by just crashing into the guardrails all the time. Most unit-testing evangelists sound to me like they're using (or even advocating for) unit testing instead of thinking deeply about their code. Slow down and understand your code.

If you're finding more mistakes by running unit tests than by thinking through and re-reading your code, you're not finding most of your mistakes. Because you're not understanding your own code. How can you even write great unit tests if you don't understand what you're doing?

There are, of course, times when writing the tests first can help you think through a problem -- great! Especially when thinking through how some API would look. But TDD as a methodology gets a hard reject from me.

I certainly reject the argument "unit testing is too hard" -- then your code is bad and you should focus on fixing it. Well-written code is automatically easy to unit test, among 60 other benefits. That's not a reason to avoid unit testing.

bedobi · on Dec 19, 2023

unfortunately legitimate use cases for unit tests (like this) are pretty rare

in corporate codebases, overwhelmingly, unit tests are just mocked tests that enforce a certain implementation at the class or even individual method/function level and pretend that it works, making it impossible to refactor anything or even fix bugs without breaking tests

such tests are not just useless, they're positively harmful

https://gist.github.com/androidfred/501d276c7dc26a5db09e893b...

nabbe23 · on Dec 19, 2023

Well-written breaking tests represent something changing in a code base. You can be intentional about breaking a test, but then at least you can be very explicit about what you are changing.

Have seen all to many times I've broken a unit test in a code base that I did not intend to break, just to have an aha moment that I would have introduced a bug had that test not been present.

Unit tests are a trade off between development speed and stability (putting aside other factors, such as integration tests, etc). In large corporate settings, that stability could mean millions of dollars saved per bug.

That example you provided is a poor one and not really consistent with your point that unit tests are useless - the point is being made that that specific test of UserResource is useless, which I also agree with. Testing at the Resource level via integration test and Service level via unit test is probably sufficient.

Joel_Mckay · on Dec 19, 2023

Especially true if you get emergent side-effects from non-obvious shared state dependencies in large projects.

Nightmares... =)

nabbe23 · on Dec 19, 2023

Yes sir :)

And pragmatically - this always happens at some point. Something something about deadlines and need to get this out yesterday.

Joel_Mckay · on Dec 19, 2023

If maintained right, unit tests at edge conditions can quickly diagnose system and runtime state health.

If you work with malicious on incompetent staff at times (over 100 there is always at least 1)... it is the only way to enforce actual accountability after a dozen people touch the same files over years.

"The sculpture is already complete within the marble block, before I start my work. It is already there, I just have to chisel away the superfluous material." ( Michelangelo )

Admittedly, in-house automated testing for cosmetic things like GUI or 3D rendering pipelines is still nontrivial.

Best of luck =)

Verdex · on Dec 19, 2023

I've been on several projects where we had a significant number of unit tests that would only fail when requirements would change and we had to change the code.

"Look, if it only fails with requirement changes, then maybe we're better off not having them."

This only made people uncomfortable. They just don't like walking down the mental path that leads them to the conclusion that their high coverage unit tests are not worth the tradeoff. Or even that there is a tradeoff present at all.

Meanwhile, PRs constantly ask for more coverage.

Not very often, but sometimes someone will mention: "but unit tests are the specification of the code"

  test ShouldReturnOutput {
    _mock1.setup( /* complicated setup code returning mock2 */ );
    _mock2.setup( /* even more complicated setup code */ );
    
    let output = _obj.Method( 23.5 );
    
    Assert( output == 0.7543213 );
  }

  /* hundreds of lines above this test case */
  setup {
    if ( _boolean ) {
      _obj = new obj(_mock1);
    }
    else {
      _obj = new obj(new mock());
    }
  }

I'm just not sure I can get there.

bobthepanda · on Dec 19, 2023

The point of unit tests is not to CYA during refactors, but to confirm that the implementation is consistent between small changes without weird side effects.

A coworker once thought unit tests were dumb, and ended up writing code that repeated the call to an application 10x for the same info. This didn’t result in a changed UI because it was a read, but it’s not good to just suddenly 10x your reads for no good reason.

TFA also describes discovering weird side effect race conditions as a result of unit tests.

bedobi · on Dec 19, 2023

> confirm that the implementation is consistent between small changes without weird side effects

not sure what this is referring to, but I'll give an example

say you have a requirement that says if you call POST /user with a non-existing user, a user should be created and you should get a 2xx response with some basic details back

you could test this by actually hitting the endpoint with randomly generated user data known to not already exist, check that you get the expected 2xx response in the expected format, and then use the user id you got back to call the GET /user/userId endpoint and check that it's the same user that was just created

this is a great test! it enforces actual business logic while still allowing you to change literally everything about the implementation - you could change the codebase from Java Spring Boot to Python Flask if you wanted to, you could change the persistence tech from MySQL to MariaDB or Redis etc etc - the test would still pass when the endpoint behaves as expected and fail when it doesn't, and it's a single test that is cheap to write, maintain and run

OR

you could write dozens of the typical corporate style unit test i'm referring to, where you create instances of each individual layer class, mocks of every class it interacts with, mocked database calls etc etc which 1) literally enforce every single aspect of the implementation, so now you can't change anything without breaking the tests 2) pretend that things work when they actually don't (eg, it could be that the CreateUserDAO actually breaks because someone stuffs up a db call, but guess what, the CreateUserResource and CreateUserService unit tests will still pass, because they just pretend (through mocks) that CreateUserDao.createUser returns a created user

Verdex · on Dec 19, 2023

To be fair to unit tests, I really like them for making sure that complicated code gets tested thoroughly. However, very often the complicated code isn't isolated to a single unit. It instead lives distributed amongst multiple objects that are reused for several competing aspects and all have their own undocumented assumptions about how the world works.

Now maybe this implies that we need a wide scale change in coding methodology such that the complicated parts are all isolated to units. But pending that, I'm not sure that the answer is a bunch of static mocks pretending to be dynamic objects with yet another set of undocumented assumptions of how the world works.

The unit tests that have made me happiest has been unit tests on top of a very complicated library that had a very simple api.

And on the other hand, the tests that make me most believe that the projects I'm working on are correct have been integration tests incorporating a significant part of the application AND the QA team's test very thorough test plan.

bobthepanda · on Dec 20, 2023

Unit and integration testing are not strategies to be used exclusively.

Integration tests relying on interlocking behavior are, by their nature, complicated. Unit tests are there to test what can be tested simply, and are cheaper to write, so your test structure should be pyramid shaped, with hopefully fewer tests as complexity increases.

bobthepanda · on Dec 19, 2023

I literally gave examples in my example.

As a general example, a unit test is great for things like:

Your ExampleFactory calls your ExampleSerivce exactly once, not more, not less, so you can check that side effects don’t result in unnecessary extra calls in more load.

This is particularly relevant in a language like Java; modern Java style is functional, but the old style relied heavily on side effects, and they’re still possible to write unintentionally.

bedobi · on Dec 19, 2023

I understand your example and I agree that in that particular case, maybe a unit test to ensure a given service calls a dao once and only once or whatever is justified

but I don't think the hypothetical risk of someone needlessly calling the db ten times is a good reason to justify adding that style unit tests to everything by default - if it happens, sure, add one for that particular call and call it a day

Verdex · on Dec 19, 2023

I believe the kind of scenario that bedobi is referring to is something like this (using your example):

Unit test exists, ExampleFactory only calls ExampleService once.

Hmm, it turns out that ExampleUIButton calls ExamplePoxyNavigator more than one time if the text in ExampleUIButton happens to be wider than the default width for the button.

What does ExampleProxyNavigator do? Oh, it calls ExampleFactory which calls ExampleService. But only when the whole system is wired up for deployment.

The unit tests indicate that the system should be functioning okay, but when you put everything together you find out that the system does not function okay.

domano · on Dec 19, 2023

Using mockserver etc. you can cover for these things in component-test cases even more easily through your whole application while being more flexible with bigger code changes than unit tests allow.