Hacker News new | past | comments | ask | show | jobs | submit login
The day I started believing in unit tests (mental-reverb.com)
155 points by sidpatil on Dec 19, 2023 | hide | past | favorite | 258 comments



I was ambivalent on unit tests until I discovered how much the mere act of writing them was finding bugs.

I very vividly remember writing a test for a ~40 loc class of pure functions. I started out thinking the exercise was a waste of time. This class is simple, has no mutable state, and should have no reason to change. Why bother testing it?

By the time I was done writing the test I had found three major bugs in that 40 loc, and it was a major aha moment. Truly enlightening.


That reminds me of this time I wrote some code to add a method to Python string objects. The first reply to my issue on it in the bug tracker was "We shouldn't accept this, it's trivial to implement in your own code, see: XXXX". The second reply was "You have a bug in your implementation in the first reply."

It took a couple years to be accepted.


Sounds familiar. Was that str.removeprefix?


str.rsplit()


I bumped into so many corner case and dumb bugs on a recent python project that I'm even more of a unit testing enthusiast than before. Past a certain level of complexity they are definitely a net benefit.


You mentioned Python. I struggle with the weak(er) typing. It is a bottomless well of bugs. Did your unit tests find type issues or (business) logic / state issues?


It was more business logic issues relating to conditional logic which needed to factor in a lot of edge cases. I don't think strong typing would have helped much in this case.


I had this kind of thing when it came to property based testing.

I built a property based testing library for ActionScript 3 (a fun journey in itself, with full test case reduction).

I was testing my testing library, and tried one of the most basic tests:

    For any object A
        A == decode(encode(A))
And discovered the fun of floating point values not being perfectly representable as strings.

The more significant one came from testing a UI library we'd built for TVs (so you has up, down, left, right as movement). We had in the spec that if you moved focus by pressing right, pressing left would take you back to the thing you were on before. The test looked something like

    For an arbitrary series of API calls generating the UI:
        For an arbitrary list of movements the user makes:
            If the focus changes, pressing the opposite direction moves your focus back where you came from
Now, this was actually very easy to write as a test, but it's extremely powerful. It found a bug in an interesting corner case, so I fixed it. Fixing the bug broke an existing unit test. I checked and the unit test correctly tested something in the spec.

The spec was inconsistent, but because we'd tested explicit examples it had never been spotted. I've been a convert since.

I've never implemented property tests in an existing project without finding some bug.

I don't recommend building your own property testing library unless you really want to, I highly recommend in python hypothesis: https://hypothesis.readthedocs.io/en/latest/


Floating point values are definitely perfectly representable as strings - worst case, you just output the binary as a string - but what you may refer to is that many exact fractions can't be represented exactly as floats and/or that floating-point arithmetic doesn't obey "normal" rules of arithmetic (e.g. addition isn't associative).


Sometimes a framework will do weird things, like convert it to scientific notation, or round it, or add a ton of zeroes. Like BigDecimal in Java and Decimal in C#. They can sneak up on you.


BigDecimals aren't floats, though. They're arbitrary precision rational numbers.


I'll clarify, as yes it's possible, but not using the built in conversions from strings to floats. That is, the following is not true for all x: Number(String(x)) == x


NaNs also cause problems here, since it is not equal to itself.


I think they are definitely valuable. My issue is I would have to disassemble most of the large legacy code base to be able to effectively test things, and a large part of it is UI (Windows Forms). I know you can do it, but just takes so much time and effort. We do have some though.


according to some studies it's around 50 bugs per 1,000 LoC.

So that puts it at about 1 bug per 20 LoC

Some estimates are something as high as 75 bugs per 1,000 LoC but that many bugs don't make it out to customers because of QA / Developer actions. So yeah, right on the money.


Was this statically or dynamically typed language?


I don't think it really matters. Major bugs are not "oh this can be null", major bugs are "this combination of preconditions yield a business logic edge case that wasn't accounted for". Static typing doesn't really help more than dynamic typing in these cases.


Oh it absolutely matters. Rail apps in particular are full of “tests” a basic compiler would catch.

So you save a “ton of time” writing the code without “cognitive overhead of types”, then you spend 3x as long writing the tests


>"this combination of preconditions yield a business logic edge case that wasn't accounted for". Static typing doesn't really help more than dynamic typing in these cases.

Depends on the language and the business logic. Types are a way of specifying preconditions and postconditions; a more expressive type system lets you rule out more edge cases by making illegal states unrepresentable.

In particular, I'm pretty sure it's not possible to have the thread bug from the article in Rust's type system.


Speaking only for me, but one can get used to the limits of the system and adapt (put another, I'd tend to introduce as many logical bugs, but in different ways)

For instance in a method that absolutely requires a specific type of object as a return, setting up a sacrificial default value to have the compiler happy, and actually build the inners of the function from there would be a normal course of action. That lets us run the code as we build it. But at the end if you forgot a case, it will still be a bug, except instead of having a wrong return type you get a wrong value. Weither it's better or not is up for debate.


It’s also not possible to write a non-trivial program in Rust.


> It’s also not possible to write a non-trivial program in Rust.

You should probably clarify that it’s not possible for you to write non-trivial programs in Rust.

And that’s okay! No one comes into this world knowing how to do any thing, but we can all learn if we choose to.

If you have a particular sticking point, I’d be glad to give advice.


"Servo is a web rendering engine written in Rust"

Is Servo non-trivial? I think yes.


> Major bugs are not "oh this can be null"

Not true. Even today the majority of security bugs are simple buffer overflows, for example. The subtle bugs are more memorable, but that doesn't mean they're actually more common.

> major bugs are "this combination of preconditions yield a business logic edge case that wasn't accounted for". Static typing doesn't really help more than dynamic typing in these cases.

It absolutely does, if you put a little bit of effort into actually using it. Represent your business logic invariants in the type system, then the compiler won't let you accidentally violate them.


It matters enough that the question "does static typing dramatically reduce the benefits of unit testing" is an open question or at least seriously discussed in the industry. All other replies are about dynamic languages.


Having static types is like having an automatic, exhaustively comprehensive, efficient unit test generator for the classes of bugs whose units tests are the most boring and annoying to write and maintain. Static types don't prevent you from needing unit tests, they free you up to focus on writing tests that are actually interesting.


This, but depending on how good the type system is and how well it's used most remaining interesting tests may be integration aside from legit things like https://news.ycombinator.com/item?id=38692634


I agree with your post. To be more specific, you can focus more on logic and state bugs.


I'm writing unit tests in rust and c++, I'm in the same boat as the op, often finding logical errors while writing the tests.

Not to mention peace of mind when you go and mess around with code you wrote 9 months ago - if you mess up or didn't think of a corner case, there's decent chance it'll get caught by existing tests.


A fully evolved type system (e.g. Coq) overlaps with unit testing. But the vast majority of languages people actually use have only partial type systems that require unit tests to stand in where the type system is lacking.

In practice, when you write those tests for where the type system is lacking, you end up also incidentally testing where there is type system coverage, so there likely isn't much for industry to talk about.


I don't think it is really unresolved. Static languages with sufficiently rich and modifiable type systems avoid a fraction of cases where you may well want a unit test, but it's not the overwhelming majority. Merely static helps too but not all that much. So while there is a reduction, it's a stretch to call it "dramatic".


> I don't think it is really unresolved

Well, you are currently participating in a thread that discusses this very question so there's that... and such threads are regular on HN.

I meant just that. People discuss it. What you want to say is that you have a strong opinion about it, that's OK and still possible with open questions


People discuss lots of things that are pretty well solved; i wouldn't equate "open question" with "lots of discussion".

I guess in this context I mean that the question of static vs. dynamic in unit testing turns out to not be that hard, but the questions like "what is a unit test" and "should we unit test at all" are much muddier. Because people are confused or argumentative about the latter, they tend to pull the former into discussions that don't really have much to do with static vs. dynamic.


Why isn't the question "does adequate testing dramatically reduce the benefits of static typing" asked? Why is static typing privileged by default?


Probably because using a static type system gives you those benefits "for free," whereas unit tests are things you need to write and maintain.

("For free" in scare quotes because of course there are always tradeoffs between different programming languages.)


The counterargument is that for sufficiently reliable software extensive testing is needed anyway, and if you do that you find the type errors "for free".

If the testing is insufficient, as in practice it often surely is, then static typing would seem to be more valuable.


"[A]dequate testing": Woah, I love this term. It is judgmental right from the start. It's right up there with "convention over configuration" and "well, if you wear your face mask _correctly_..." I once saw a blog post from an embedded programmer talking about how difficult it is to write "adequate" unit tests for embedded code. If you are writing code that will run in a heart pace maker or aeroplane auto-pilot/lander, it needs to be insanely well tested.


It's not judgmental in some dubious way. "Adequate" here means adequate to ensure the software achieves a specified (and high) level of reliability.

If testing is enough to ensure the software is reliable, does the extra benefit of static typing make it worth the cost? This could be quite a lot of testing! The more testing that is done, the fewer type bugs remain that static typing would have found.

The argument you want to make against this is that static typing would have benefit beyond just finding bugs. The argument you don't want to make is that static typing reduces the need for testing.


I wouldn’t say static typing is privileged, but that testing is disadvantaged, because, in the words of Edsger Dijkstra, “Program testing can be used to show the presence of bugs, but never to show their absence!”

https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/E...


We could paraphrase Dijkstra --- with some liberty -- to say, "Formal verification of programs can only show the specification is fulfilled, not that the specification is adequate."


It's not "privileged", it's better.


A bold claim, which is obviously false. To see that compare Clojure's dynamic type system with C's static type system. So perhaps a more nuanced stance is needed.


It is asked. A lot.


I've come to believe that "statically typed" is too large a category to be useful in types-vs-tests discussions. Type system expressiveness varies enormously by language. Better just to ask "what language, specifically?".


It was PHP, for what it's worth. I've had similar experiences in Go though.


I started believing in unit tests the day I finished my patch, ran the program and watched it work perfectly. I then grudgingly wrote a test, ran it and immediately observed it fail. One of the test inputs was some garbage input and that exposed a poorly written error handling path. Humbling!

I still hate writing them and it grates on my aesthetic sense to structure code with consideration to making it testable, but if we want to call ourselves engineers we need to hold ourselves to engineering standards. Bridge builders do not get to skip tests.


> if we want to call ourselves engineers we need to hold ourselves to engineering standards. Bridge builders do not get to skip tests.

Bravo. We need more of this mindset in the world, and also more collective will to encourage it in one another.

YOU are the kind of engineer I want writing the code that goes in my Dad's pacemaker or the cruise control in my wife's car.


If you have worked in places where safety is critical, you wouldn’t say something so shallow. In those places they place human verification above all else. They have a thick book where you do a full run and is double checked, they don’t f around with unit tests and say this is good to go


I don't think anyone is saying unit tests and you are good to go are they?

In any critical system work, there are multiple layers and you can't really skip any of them.

It's also sort of meaningless to talk about such testing without requirements and spec to test against. Traceability is as much a part of it as any of the testing.

By the time you get to the "thick book/full run" as you put it, there has typically been a metric crapload of testing done already.


For all the testing and paperwork, the code in safety critical applications is still frequently awful and riddled with bugs, following such a process does not actually guarantee good software, it mostly just means you need a lot of paper pushers.


Human verification is very expensive, compared to unit tests. It costs money to pay that human to do it, time for them to test it, time to describe issues found, time to send it back for a fix.

Unit tests - actually, all automated tests - are comparatively cheap. The developer can run them immediately.

All code will have bugs. The "trick" to building a productive development pipeline is to catch as many of those bugs as possible as early as possible, and thereby reduce both the temporal and monetary cost of resolving them.


Interesting take. I find structuring code to be testable to make the code much clearer: mainly, by making dependencies explicit via dependency injection. I do that even if I don’t end up testing the code.


I have an identical experience. What really made me understand dependency injection (in Java) was being forced to write 100% code coverage unit tests. To be clear: 100% code coverage was absolutely overkill for my domain, but it was a lesson about how to structure your code for dependency injection.


I work on a team with 75% of the developers don't write any tests. You never know what you're going to run into. Did I cause a new bug, or did I discover an old one? It's embarrassing when you discover completely non-working code paths.

I'm not even looking for a particularly high level of test coverage, just a basic "I wrote an API, here's a test (integration, unit, doesn't matter) for the happy path"-level of coverage would be great.

On the opposite end, I worked at places that wanted unit tests for every new function, even if it was something simple (like a getter or setter) used elsewhere. That's also terrible.


You could have watch the program and observed the failure, why need to write a test to be “surprised” it failed


A huge benefit of tests is their regression protection against future changes, often by other engineers. You don’t get that from ad hoc manual execution.


This. I like tests because it's hard to know if I accidentally broke something other than the thing I'm working on. Even software of modest size is at a level of complexity beyond what a human can QA in a reasonable amount of time for every revision. If you're at the point of needing a checklist with even 3 items on it, you're past the point of needing tests.


CPU cycles are so cheap these days that this is a gross waste of manpower.

Even better than manually written unit tests are automatically generated property-based tests (which can be unit or integration tests). One can literally run millions of tests in a day this way, far, FAR more than could ever be manually verified. All because computation is so darned cheap now.


The program worked! As I wrote, the test tried a rare error condition and the handling for that was faulty.

Updated the original to clarify. Hope that helps!


I still like one of the defining characteristics of Unit Tests (paraphrasing Michael Feathers from memory): they are fast and cheap to run. Sure, they might not perfectly simulate production like integration tests, but they also don’t take hours burning cash in cloud infrastructure while risking failure from unrelated races dealing with those dependencies. You can use Unit Tests to get to a place where you’re fairly confident that the integration tests will pass (making that whole expensive affair cheaper to run).


From Working Effectively With Legacy Code by Feathers, p. 14[0]:

Unit tests run fast. If they don’t run fast, they aren’t unit tests.

Other kinds of tests often masquerade as unit tests. A test is not a unit test if:

1. It talks to a database.

2. It communicates across a network.

3. It touches the file system.

4. You have to do special things to your environment (such as editing configuration files) to run it.

Tests that do these things aren’t bad. Often they are worth writing, and you generally will write them in unit test harnesses. However, it is important to be able to separate them from true unit tests so that you can keep a set of tests that you can run fast whenever you make changes.

[0]: https://www.google.com/books/edition/Working_Effectively_wit...


My "unit tests" do hit the database and file system, and I have found and fixed many many problems during testing by doing so. I have found many other problems with those calls in production when I didn't do so. Yes, they make testing a lot slower. Our main app takes around 40 minutes to build which isn't good. I'd like it to be faster. But writing a bunch of separate integration tests to cover those functions would be a steep price. I can understand reasonable people choosing either approach.


> My "unit tests" do hit the database and file system, and I have found and fixed many many problems during testing by doing so. I have found many other problems with those calls in production when I didn't do so.

No-one said that integration tests can't also be very valuable.

From the little context I get that you write integration tests, and that is fine. They are useful, valuable! But they are not unit-tests.

edit: on re-reading, I get the feeling that for you "integration tests" are a synonym for "end to end tests". But -at least in most literature- end-to-end tests are a kind of integration-test. But not all integration tests are end-to-end tests. In my software, I'll often have integration tests that swap out some adapter (e.g. the postgres-users-repository, for the memory-users-repository, or fake-users-repository. Or the test-payment for the stripe-payment) but that still test a lot of stuff stacked on top of each-other. Integration tests, just not integration tests that test the entire integration.


I find the easiest way (for me) to identify what type of test is running is by looking at responsibility.

- A unit test is single responsibility. It tests just that one bit of code with all dependencies stubbed, abstracted, mocked, or removed from consideration in some way.

- An integration test is multiple responsibility. It tests just one bit of functionality as a vertical slice through the stack (including [only] relevant dependencies) with all other aspects of the code base eliminated from consideration.

- An end to end test is full responsibility. It tests a complete path through all the functionality necessary to complete a 'journey' as a user/consumer of the app/tool.

So for example VAT calculation is unit tested as isolated code, invoicing is integration tested as a vertical slice including database etc, and order processing is end-to-end tested from placing the order through to its completion.

That's a simplified example and not always accurate depending upon the system and the team perspectives/opinions, but the principle of looking at responsibilities is a very useful rule of thumb.


>No-one said that integration tests can't also be very valuable.

Integration tests are a better kind of default test because they bring value under pretty much all circumstances.

Nobody said that unit tests cant also be valuable and under just the right circumstances i.e. - complex stateless code behind a stable API.

Unit tests shine in that environment - theyre not impeded by their crippling lack of realism because that stable abstraction walls off the rest of reality. And theyre very fast.

Most code isnt parsers, calculation engines, complex string manipulation, etc. - but when it is unit tests really do kick ass.

They just suck so badly at testing code that doesnt fit that mold. Which, to be fair, is most code. I dont write a lot of parsers at work. My job involes moving data into databases, calling APIs, linking up message queues, etc.


> Integration tests are a better kind of default test because they bring value under pretty much all circumstances.

I respectfully disagree. Not with the last part, that is true: they do bring value under pretty much all circumstances. But the first. Because integration tests come with (extremely) high costs.

They are expensive to run. They are much harder (costlier) to write. They are even harder (costlier) to maintain. The common pushback against tests -but they slow down our team a lot- applies to integration tests much more than to unit tests - factors more. And so on.

As with everything software-engineering, choosing what tests to write is a tradeoff. And taking all into consideration, e2e or integration tests are often not worth their investment¹. The testing pyramid fixes this, because testing always (well- it depends) is worth the investment. But when you skew the testing pyramid, or worse, make it an testing-ice-cream-cone, that ROI can and will often quickly become negative.

¹Edit: I meant to say that many of these e2e tests are not worth their investment. Testing edge-cases for example: if you need man-hours to write a regression test e2e style and then man-weeks to maintain and run that over coming years, it's often better ROI to just let that regression re-appear and have customers report it. Whereas a unit-test that captures this edge-case costs maybe an hour to write, milliseconds to run and hardly any time to maintain.


>Because integration tests come with (extremely) high costs.

Unit tests usually have lower capex and higher opex. It often takes less time and effort to write a single lower level unit test but that test will require more frequent maintenance as the code around it evolves due to refactoring.

Integration often tests have higher capex because they rely upon a few complex integration points - e.g. to set up a test to talk to a faux message queue takes time. Getting playwright set up takes quite a chunk of up front time. Building an integration with a faux SMTP endpoint takes time. What is different is that these tools are a lot more generic so it's easier to stand on the shoulders of others and they are more reusable and it's easier to leverage past integrations to write future scenarios. E.g. you don't have to write your own playwright somebody already did that and once you have playwright integrated into your framework any web-related steps on future scenarios suddenly become much easier to write.

Whereas with unit tests the reusability of code and fixtures written in previous tests is generally not as high.

You have to also take into account the % of false negatives and false positives.

I find unit tests often raise more false positives because ordinary legitimate refactoring that introduced no bugs is more likely to break them. This reduces the payoff because you will have more ongoing test failures requiring investigation and maintenance work to mitigate this.

I also find that the % of false negatives is lower. This is harder to appreciate because you wouldn't ever expect, for instance, a unit test to catch that somebody tweaked some CSS that broke a screen or broke email compatibility with outlook, but these are still bugs and they are bugs that integration tests at a high level can catch with appropriate tooling but unit tests will never, ever, ever catch.

>But when you skew the testing pyramid, or worse, make it an testing-ice-cream-cone, that ROI can and will often quickly become negative.

The pyramid is an arbitrary shape that assumes a one size fits all approach works for all software. I think it is one of the worst ideas to ever grace the testing community. What was particularly bad was Google's idea that flakiness should be avoided by avoiding writing tests and applying good engineering practices to root out the flakiness. It was an open advertisement that they were being hampered by their own engineering capabilities.

I do agree that this is a cost/benefit calculation and if you shift some variable (e.g. E2E test tooling is super flaky and you've got good, stable abstractions to write your unit tests against, you've got a lot of complex calculations in your code), then that changes the test level payoff matrix, but I find that the costs and benefits work out pretty consistently to favor integration tests these days.


> single lower level unit test but that test will require more frequent maintenance as the code around it evolves due to refactoring.

"more frequent" is not the same as "high maintenance costs" though.

Unit tests should only change when the unit-under-test (sut) changes. Which, for many units is "never". And for some with high churn, indeed, a lot.

Actual and pure e2e tests should never have to change except when the functionality changes.

But all other integration tests most often change whenever one of the components changes. I've had situations where whenever we changed some relation, or added a required-field in our database, we had to manually change hundreds of integration tests and their helpers. "Adding a required field" then became a chore of days of wading through integration tests¹.

With the unit-tests, only one, extremely simple test changed in that case. With the end-to-end-tests, also, hundreds needed manual changes. But that was because they weren't actual end-to-end tests, and did all sorts of poking around in the database. Worse: that poking-around wasn't abstracted even.

What I'm trying to convey with this example, is that in reality, unit-tests change often if the SUT has a high churn, but that those changes are very local and isolated and simple. Yet, in practice, with integration-tests, the smallest unrelated change to a "unit" has a domino-effect on whole sections of these tests. (And also that in this example, our E2E were badly designed and terribly executed)

¹Edit: one can imagine the pressure of management to just stop testing.


And integration tests can also be fast


Test containers really help with this. Should still have the big system tests that run overnight but a set of integration tests using Test Containers to stand in for the infrastructure dependencies is awesome.

My team has a ton of those and they run inside a reasonable time frame (5min or so) but we still allow for excluding those from test runs so you can run just the unit tests.


I hadn’t heard of Test Containers[1], but it looks really useful - thanks for the rec.

[1] https://testcontainers.com/


Indeed! It's one of the reasons I like the adapter pattern (aka hexagonal architecture) so much.

Data flowing through some 100 classes and 300 conditionals then into a `memory-payments` and back takes mere milliseconds. "Memory payments" is then some silly wrapper around a hashmap with the same API as the full-blown production payments-adapter that calls stripe over HTTP. Or the same api as the "production adapter" that wraps some RDBMS running on the other end of the data-center.


No one suggests discarding integration tests. The end of the quoted excerpt from Feathers above explicitly supports them.

Tests that do these things aren’t bad. Often they are worth writing, and you generally will write them in unit test harnesses.

You hinted at the value of separating unit tests and integration tests with your observation about 40-minute unit test runs being way too slow. The process friction it creates means people will check in “obviously correct” changes without running the tests first.

Feathers continues:

However, it is important to be able to separate them from true unit tests so that you can keep a set of tests that you can run fast whenever you make changes.

You want your unit tests to be an easy habit for a quick sanity check. For the situation you described, I’d suggest moving the integration tests to a separate suite that run at least once a day. Ripping that coverage out of your CI may make you uncomfortable. That’s solid engineering intuition. Let your healthy respect for the likelihood of errors creeping in drive you to add at least one fast (less than one-tenth of a second to run is the rule of thumb from Feathers, p. 13) test in the general area of the slower integration tests.

The first one may be challenging to write. From here forward, it will never be easier than today. Putting it off is how your team got to the situation now of having to wait 40 minutes for the green bar. One test is better than no tests. Your first case with the fixture and mocks you create will make adding more fast unit tests easier down the road.

Yes, just as it’s possible to make mistakes in production code, it’s certainly possible to make mistakes in test code. Unit tests are sometimes brittle and over-constrain. Refactoring them is fair game too and far better than throwing them away.


What would "integration tests" (that you don't write) look then in your opinion?

I ask because in my team we also a long time made the destinction between unit/integration based on a stupid technicality in the framework we are using.

We stopped doing that and now we mostly write integration tests (which in reality we did for a long time).

Of course this all arguing over definitions and kind of stupid but I do agree with the definition of the parent commenter.


> What would "integration tests" (that you don't write) look then in your opinion?

In our local lingo, an integration test is one that also exercises the front-end, while hitting a fully functional back-end. So you could think of our "unit tests" as small back-end integration tests. If you think that way, we don't write very many pure unit tests, mostly just two flavors of integration tests. That works well for our shop. I'm not concerned about the impurity.


The "impurity" isn't the problem. The problem is that such integration tests take a longer time to run and in aggregate, it takes minutes to run your test suite. This changes how often you run your tests and slows down your feedback loop.

That's why you separate them: not because the integration test isn't valuable, but because it takes longer.


I've never liked the conflating of target size and test time constraints.

I very much agree that there's benefit to considering the pace of feedback and where it falls in your workflow; immediate feedback is hugely valuable, but it can come from unit tests, other tests, or things which are not tests.

Meanwhile, some tests of a single unit might have to take a long time. Exhaustive tests are rarely applicable, but when they are it's going to be for something small and it's likely to be slow to run. That should not be in your tightest loop, but it is probably clearer to evict it simply because it is slow, rather than because it is not a unit test for being slow.


I'd actually quibble a lot on this definition--if you want to unit test code that needs to do any of the first three things, well, you have to do them to test the code. I would say that a test for a network protocol that works by spinning up an echo server on a random port and having the code connect to that echo server is still a unit test for that network protocol.

In my definition, it would still be a unit test if it is fast and it talks to a database, a network server, or the file system, so long as such communication is done entirely in a "mock" fashion (it only does such communication as the test sets up, and it's done in such a fashion that tests can be run in parallel with no issues).


I've dropped my archaic thinking on what constitutes a unit test or an integration test. I now seldom write what most people consider unit tests, in the "one class, one test" sense.

Instead, I classify units of logical coherence and write unit tests for those. I write financial trading systems - not hft, but still latency sensitive. I will test an order through our pipeline as a unit of work. This will necessarily touch multiple classes. So as an example, a test will cover orders which are accepted and then filled, orders which are rejected, and so on.

Many people would classify these as integration tests, and to be fair, I don't really care what you name them. To me these are much more valuable than the traditional "one class, one test" mechanism because it means I am free to refactor the internals of our pipeline as much as I want with very low impact on the test code.

One of the whole points of test code, that I think has been lost, is that it should be there to give you confidence in the correctness of your application under change. Writing "one class, one test" is a bad way to achieve this.


I haven't found the distinction between unit tests and non-unit tests to be that useful in practice. The important questions are:

1. Is it kinda slow? (The test suite for a single module should run in a few seconds; a large monorepo should finish in under a minute)

2. Is there network access involved? (Could the test randomly fail?)

3. Do I need to set up anything special, like a database? (How easy is it for a new developer to run it?)

If the answer to any of those is Yes, then your test might fall in the liminal space between unit tests and integration tests -- they're unit-level tests, but they're more expensive to run. For example, data access layer tests that run against an actual database.

On the other hand, even if a test touches the filesystem, then it's generally fast enough that you don't have to worry about it (and you did make the test self-cleaning, right?) -- calling that test "not a unit test" doesn't help you. Likewise, if the database you're touching is sqlite, then that still leaves you with No's to the three questions above.


It is pretty excruciating (and IMO useless) to write true, DB-isolated unit tests for DB access layer code.


In other words, unit tests – unless, perhaps, you count an empty test as being a unit test – do not exist. "Talking" to structured sets of data, i.e. a database, is fundamental to computing.


>1. It talks to a database.

>3. It touches the file system.

These are BS. Maybe they made sense in the beforetimes when when we didn't have Docker containers or SSDs, but nowadays there's no reason you can't stand up a mini test database as part of your unit test suite. It's way simpler than mocking.


100% this. A guy I work with just rebuilt our CI/CD pipeline and we're spinning up a database and all dependent services in containers. There are no mocks and it works great.

In previous lives, I worked on tests that mocked everything. We spent more time creating and maintaining mocks than writing the actual tests.


I think those are still considered "integration" tests in the traditional way of thinking. The problem is that a lot of applications don't do much other than interact with external resources, to the point where isolated unit tests are rare or only cover non-critical code, while just about any substantive test is an "integration" test.


Yes, you are correct. I never cared much for the traditional way of thinking about tests. I've had more arguments with people claiming this is an integration test, not a unit test, write some mocks, etc. Outside of very specific pieces of code, you generally get more value from integration tests.


That's a great rule of thumb.


THey are fast and cheap WHEN YOU FIRST WRITE THE CODE. Or, if you are the original author of the code.

The problem is, and there will be people that disagree with me, is that unit tests make refactoring of other people's code a lot harder.

STAY WITH ME!

If the unit tests were "good" and helped document what the code does, then they don't. You won't believe this, but in dogmatic high-breadth-coverage (low depth coverage), there are tons of test code that is SO TIED TO IMPLEMENTATION rather than interface than any monkeying of the presumed encapsulated logic breaks the unit tests, so you have double the things to fix.

You'll never believe what happens next. Some developer in some Agile thing that got assigned 2 unicorn shits for the task panics because the unit tests are SERIOUSLY slowing down his "velocity". So what does he do? Delete tests, change tests to make them work at any costs.


> they are fast and cheap to run.

But expensive to write. Especially if you want them to be fast and cheap to run.


That's exactly it; QA is a layered / tiered / pyramid shaped process, the more you catch lower down, the less reliance there is on the upper layers, and the faster the development iterations.


It's all degrees. Unit tests are great at finding examples of errors or correct behaviours. However they prove nothing and they definitely do not demonstrate the absence of errors.

They are often sufficient for a great deal of projects. If all it takes to convince you it's "good enough," are a handful of examples then that's it. As much as you need and no less.

However I find we programmers tend to be a dogmatic bunch and many of us out there like to cling to our favoured practices and tools. Unit tests aren't the only testing method. Integration tests are fine. Some times testing is not sufficient: you need proof. Static types are great but fast-and-loose reasoning is also useful and so you still need a few tests.

What's important is that we sit down to think about specifying what it means for our programs to be, "correct." Because when someone asks, "is it correct?" You need to as, "with respect to what?" If all you have are some hastily written notes from a bunch of meetings and long-lost whiteboard sessions... then you don't really have an answer. Any behaviour is, "correct," if you haven't specified what it should be.


The correct behavior is the behavior it has, of course! It is all the other programs that can't integrate with it that are wrong. /s

Unit tests or not, so much code I interact with is like this. This is part of why I love integration tests. It's usually at the point of integrating one thing with another that things go bad, where bugs occur, and where the intention of APIs are misunderstood.

I like unit tests for the way that they encourage composition and dependency injection, but if you're already doing that, then (unit tests or not) I prefer integration tests. They might not be as neat and tidy as a unit test OR as a e2e test, and they might miss important implementation edge cases, but well made integration tests can find all sorts of race conditions, configurations that we should help users avoid, and much much more precisely because they are looking for problems with the side effects that no amount of pure-function unit-tested edge-cased code will make obvious or mitigate.

Integration tests are like the "explain why" comments that everyone clamors for, but in reproducible demo form. "Show me" vs "tell me"


> they prove nothing

If they fail, they prove there's a bug (in either the test or the code.)

This is like literally any other kind of test.


I meant "prove" as in, "mathematically proven." That is, for all possible inputs your theorem holds. A unit test is only an example of one such input. They don't prove there are no bad inputs.

There are many places in programming where you don't care to prove properties of your program to this level of rigor; that's fine -- sufficiency is an important distinction: if a handful of examples are enough to convince you that your implementation is correct with regards to your specifications, then it's good enough. Does the file get copied to the right place? Cool.

However there are many more places where unit tests aren't sufficient. You can't express properties like, "this program can share memory and never allows information to escape to other threads." Or, with <= 10 writers all transactions will always complete. A unit test can demonstrate an example of one such case at a time... but you will never prove things one example at a time.


Well that's silly. You're not arguing against unit tests, you're arguing against testing in general. But empirically testing does improve the reliability of software. You're tossing the baby out with the bathwater in the interest of an academic ideal of proved correctness.

I will add that if you are verifying the correctness of some code, you have a formal specification of what the code is supposed to do. That is, you have a description of valid inputs, and a formula determining if the output is correct, given those inputs. But if you have those, you can also do property-based testing: generate random inputs that satisfy the input properties, and check that the output satisfies the output condition. This is all easier that proving correctness (it requires little or no manual intervention) and gives much of the same benefit.


Maybe you need to re-read my original comment. I’m arguing for sufficient evidence of correctness. Unit tests often provide that for a good deal of software. I think people ought to write more of them.

However I think folks do get dogmatic about testing and will claim things like, “all you need is unit/integration/types.”


A bug in test code is not a real bug. It’s just a test that’s not giving you useful information. Lots of tests don’t give you useful information. Some that fail and some that pass.

It’s easy to write a test that doesn’t provide useful information across time. Harder to write a test that does.


There are times for constructive advice and there are times when '...so don't do that' is the right answer.

> It’s easy to write a test that doesn’t provide useful information across time.

I firmly believe this is one of those times.

(Currently my only issue with tests in the product I work on is that they take too long to run. Can't have it all.)


> A bug in test code is not a real bug.

The bug is in the library that the test code invokes. Tests themselves should be simple, if there are bugs in tests they are trivial once you have a framework figured out.


They can prove that by building out some new feature you implicitly break some existing functionality that didn't occur to the person writing the requirements.


Good story.

I for one do not believe in Unit Tests and try to get LLM tooling to write them for me as much as possible.

Integration Tests however, (which I would argue is what this story is actually praising) are _critical components of professional software. Cypress has been my constant companion and better half these last few years.


Unit tests are useful for:

1) Cases where you have some sort of predefined specification that your code needs to conform to

2) Weird edge cases

3) Preventing reintroducing known bugs

In actual practice, about 99% of unit tests I see amount to "verifying that our code does what our code does" and are a useless waste of time and effort.


> In actual practice, about 99% of unit tests I see amount to "verifying that our code does what our code does" and are a useless waste of time and effort.

If you rephrase this as, "verifying that our code does what it did yesterday" these types of tests are useful. When I'm trying to add tests to previously untested code, this is usually how I start.

    1. Method outputs a big blob of JSON
    2. Write test to ensure that the output blob is always the same
    3. As you make changes, refine the test to be more focused and actionable


The problem with this for me is that most of the time "verifying that our ccode does what it did yesterday" is not a useful condition : if you make no change to code, its going to do what it did yesterday. If you do make a change to the code, then you are probably intending for it to do something different, so now you have to change the test accordingly. It usually just means you have to make the same change in 2 different spots for every piece of unit-tested code you want to change.


> If you do make a change to the code, then you are probably intending for it to do something different, so now you have to change the test accordingly. It usually just means you have to make the same change in 2 different spots for every piece of unit-tested code you want to change.

Sure, but that's how unit-tested code works in general.


> then you are probably intending for it to do something different

If you have decided that your software is going to do something different, you probably want to deprecate the legacy functionality to give the users some time to adapt, not change how things work from beneath them. If you eventually remove what is deprecated, the tests can be deleted along with it. There should be no need for them to change except maybe in extreme circumstances (e.g. a feature under test has a security vulnerability that necessitates a breaking change).

If you are testing internal implementation details, where things are likely to change often... Don't do that. It's not particularly useful. Test as if you are the user. That is what you want to be consistent and well documented.


Then think of the unit test as the safety interlock.


I had to migrate some ancient VB.NET code to .NET 6+ and C#. The code outputs a text file, and I needed to nake sure the new output matched the old output. I could have written some sort of test program that would have been roughly equal in length to what I was rewriting to verify that any change I made didn't affect the output, and to verify that the internal data was the same at each stage. Or... I could just output the internal state st various points and the final output to files and compare them directly. I chose the latter, and it saved me far more work than writing tests.

If I need to verify that my code works the same as it did yesterday, I can just compare the output of today's code to the output of yesterday's code.


I see two advantages in creating tests to check output

    1. You did the work to generate consistent output from the code as a whole, plus output intermediate steps. Writing those into a test lets future folks make use of the same tests.
    2. Having the tests in place prevents people from making changes that accidentally change the output
Don't get me wrong, tests that just compare two large blobs of output aren't fun to work with, but they _can_ be useful, and are an OK intermediate stage while you get proper unit tests written.


> In actual practice, about 99% of unit tests I see amount to "verifying that our code does what our code does"

That’s my experience too, especially for things like React components. I see a lot of unit tests that literally have almost the exact same code as the function they’re testing.


I've found that often find that a little bit of code that helps you observe that your code is working correctly is easier than checking that you code is working in the UI. The tests are a great place to store and easily run that code.


3) Preventing reintroducing known bugs

When I was learning unit testing, my mentor taught me this strategy when fixing production bugs. First, write the unit test to demonstrate the bug. Second, fix the bug.


That's what you get when you don't write the tests first.


That's just doubling your work. If you don't already have a spec, your unit tests and actual code are essentially the same code, just written twice.


Determining which states are authentically hazardous and mocking data and adjacent services to make those states accessible at the press of a button is definitely not the same as writing code which handles those states appropriately.


You should try switching it up. Write the tests and then ask the LLM to write the code that makes them pass. I find I'm more likely to learn something in this mode.


I'd argue having useable LLMs kind of brings out how problematic TDD is.

Imagine the dumbest function you have to write: a product A and a street address as input, and the shipping cost as an output.

How many test cases would you write to be absolutely sure that function actually does what you want it to do, and be confident it doesn't have weird exceptions that the LLM injected randomly ? I'd assume you'd still vet the code written by the LLM, but if it's hundreds of rambling lines doing weird stuff to get the right result, is it really faster than writing it yourself ?


If it's hundreds of rambling lines then I'm not going to be able to get it past my linter anyhow (complexity thresholds), nor am I going to be able to get it past my team when they review it. So yeah, that's a problematic case, but it's one I'm going to have to refactor to avoid with or without an LLM in the loop.


About the problems of TDD: Cedric Beust has a legendary blog post about it here: https://www.beust.com/weblog/the-pitfalls-of-test-driven-dev...


TDD works best if you default to testing at the outer shell of the app - e.g. translating a user story into steps executed by playwright against your web app and only TDDing lower layers once youve used those higher level tests to evolve a useful abstraction underneath the outer shell.

It seems to be taught in a fucked up way though where you imagine you want a car object and a banana object and you want to insert the banana into a car or some other kind of abstract nonsense.


How effective is the LLM when used this way, compared to normally?


I don't know what normally is, but I'd say it works pretty well.

Often the challenge is that the context for what you're trying to do is sprawling. There's just too many files and they're all too long: you end up exceeding the context window or filling it with 99% irrelevant stuff. Typically the structures you build for tests are smaller and more focused on the particular instance you're worried about, which I think is a better way to talk to an LLVM.

You don't have to explain, for instance, that there's data in production which doesn't match the schema in the code so it must be cautious to avoid running afoul of that difference. Instead you've mocked that data, so it's right there in the same code with the test that it's trying to make pass.


In reality, unit tests and integration tests are different names for the same thing. All attempts at post facto differentiation fall flat.

For example, the first result on Google states that a unit test calls one function, while an integration test may call a set of functions. But as soon as you have a function that has side effects, then it will be necessary to call other functions to observe the change in state. There is nothing communicated by calling this an integration test rather than a unit test. The intent of the test is identical.


No. Or maybe only if you also consider 'village' and 'city' to be the same thing.


That's a good example, because while they're clearly different things, any distinction you draw between them such as "population > 100k" or "has cathedral" is always going to be a bit arbitrary, and many cities grew organically from villages in an unplanned manner.


Is it? Kent Beck, coiner of unit test, made himself quite clear that a unit test is a test that is independent (i.e. doesn't cause other tests to fail). For all the ridiculous definitions I have come across, I have never once heard anyone call an integration test a test that is dependent (i.e. may cause other tests to fail). In reality, a unit test and an integration test are the same thing.

The post facto attempts at differentiation never make sense. For example, another comment here proposed that a unit test is that which is not dependent on externally mutable dependencies (e.g. the filesystem). But Beck has always been adamant that unit tests should use the "real thing" to the greatest extent possible, including using the filesystem if that's what your application does.

Now, if one test mutates the filesystem in a way that breaks another test, that would violate what Beck calls a unit test. This is probably the source of confusion in the above. Naturally, if you don't touch the file system there is no risk of conflicting with other tests also using the filesystem. But that really misses the point.


There are only two kinds of tests: ones you need and ones you don't. Splitting hairs over names of types of tests is only useful if you're trying to pad a resume.


Clusters of humans cohabiting a confined space? If you squint hard enough…


Implying that integration tests (or vice versa) are legally incorporated like cities, while unit tests are not? What value is there in recognizing a test as a legal entity? Does the, assuming US, legal system even allow incorporation of code? Frankly, I don't think your comparison works.


I think he is not implying a hard line legal standard but as connections and size increase different properties start to emerge humans start to differentiate things based on that, but there is a gradient so we can find examples that are hard to classify.


What differentiates a city from a village is legal status, not size. If size means population, there are cities with 400 inhabitants, villages with 30,000 inhabitants, and vice versa. It is not clear how this pertains to tests.

When unit test was coined, it referred to a test that is isolated from other tests. Integration tests are also isolated from other tests. There is no difference. Again, the post facto attempts to differentiate them all fall flat, pointing to things that have no relevance.


> What differentiates a city from a village is legal status, not size

Fine. And legal status depends on location. There are many localities.


Yup, just like testing. Integration and unit tests depend on location as no two locations can agree on what the terms mean – because all definitions that attempt to differentiate them are ultimately nonsensical. At the end of the day they are the exact same thing.


You should not be downvoted as heavily as you are now.

I feel like we did testing a disservice by specifying the unit to be too granular. So in most systems you end up with hundreds of useless tests testing very specific parts of code in complete isolation.

In my opinion a unit should be a "full unit of functionality as observed by the user of the system". What most people call integration tests. Instead of testing N similar scenarios for M separate units of code, giving you NxM tests, write N integrations tests that will test those for all of your units of code, and will find bugs where those units, well, integrate.


I hate unit tests, though I am forced to write them to have my CI process not fail (I need 75% coverage or it won't build) - so I have written thousands and thousands of them in the last few years - the problem I have: not a single time that I had a unit test fail that resulted in me finding a bug in my code - all I ever find are bugs in my unit test code - so pretty much seems like a waste of time to me.

Either I am writing really good code so there are no bugs, or I am really bad a writing unit testing code to find those bugs.


> Either I am writing really good code so there are no bugs, or I am really bad a writing unit testing code to find those bugs.

Honestly, having literally had a scenario 20 minutes ago where I wrote a test for what I figured was absolutely trivial code, and having it _fail_ on me and pick up a bug that I hadn't considered (and this is not the first time this has happened) I would strongly suggest it's the latter.

Do your unit tests the output and side effects exactly, or do they just make sure the function returned without error?

Just because function/method/whatever has 100% coverage, doesn't mean you have tested all the potential scenarios.


A sort of non-judgmental question, in your mind are you writing them to cover lines or exercise required behavior with an intent of proving the module is broken? I ask because it seems like requiring line coverage as a metric would have the effect you are describing.


I've seen the same thing with comments. My boss required us to add comments to our code, to make it easier to read. That was all he asked, please add comments. My co-worker added comments like "Increase variable i by 1", while completely ignoring the 8 lines of spaghetti business logic above.

Similarly I've seen people add tests that will ensure that code coverage doesn't go down, but it doesn't actually do anything to help anyone. I'd argue that the issue is that have random coverage goals is a problem on its own, but it's the only way to force some people to write even to most basic of tests.


I've thought for a long time we present coverage backwards. We shouldn't be highlighting what is covered and getting that metric up, we should highlight what isn't covered and focus on getting that metric down (like how linting is done). Present it like "Hey, here's something that no one has looked at in-depth! It's a great place for bugs to be hiding!"


Ah, Goodheart's law ruins everything.


You aren't writing those unit tests just for yourself. You're writing them to help the next developer who works on that code avoid regression defects. That has value to your employer even if it seems like a waste of your time.


They’re way more useful in languages without static typing, where it’s easier to write or edit-in stupid bugs and not notice until the code runs. They’re not not useful in statically typed languages, just far less so.


I don't particularly like writing unit tests either. However, one goal I set myself decades ago is less than one bug per kloc delivered. (I don't always achieve that.) If you seriously attempt to do that over many years unit tests become unavoidable. For me that path to unavoidable looked like this:

1. Decide that was the goal. Start measuring. Who knows, maybe I have to do nothing.

2. Discover that I seriously underestimated the number of bugs I produce. No one is available to review my code, changing languages (to something with a stronger type system) was out of the question. Only option appears to be methodical testing of every line of code.

3. Print (on dead trees - this was decades ago) all my code. Manually test all of it, running a highlighter down the listing. Continue to the entire code listing has a solid line of code down the left. It worked! Bugs per delivered lines of code dropped off a cliff. But geezz it took a loooong time, longer than writing the code. Finding an input sequence the exercised some code was surprisingly hard. And it was boring. Still success - and people who used it immediately noticed the improvement in quality and commented.

4. Then new features have to be added. Does this mean I have to test it all again? Surely not - I'll just test the bits I changed. Result: bugs per line of code rapidly start to ramp up again.

5. So I test everything again. That works, but it's horribly inefficient. I can spend days releasing a few line change. I can't get small changes anything like a reasonable time frame.

6. The solution is obvious to a programmer: automate your work, which in this case translates to writing code to do the tests. So I write unit tests for new code. It ends up being slower than doing manual tests :( Code size doubles. It works in keeping the bug count down, but can I afford to keep doing this time wise?

7. Then I add features to new code with unit tests. Initially this is painful - I move at perhaps 1/2 the speed because now I have to change at least twice the amount of code (actual and unit-test). Still, it's success bug count wise, and running unit tests is much, much faster than manually testing.

8. Keep doing this, notice that despite me having to change twice the lines of code (actual code and tests) when I'm adding new features I'm producing more debugged lines of code than before. Even more interesting, I'm fearlessly making much larger changes now. Turns out I'm using the unit tests as guard rails. I no longer minimise my changes to reduce the odds of introducing a bug.

9. Finally, notice that unit testing has changed the way I write code. And it's for the better. Code that's easy to test is also easy to understand. For example, it's much easier to test a pure function than something with side effects, so you minimise side effects as much as possible. You make your interfaces (which are the thing you focus your testing on) as small as possible. Testing deep inside a complex module is difficult and the torturous unit test code you have to write to do that is hard to understand. So you split things into smaller modules, each of which has those clean interfaces, to give your tests greater visibility. Turns out writing code so unit tests can understand it is oddly similar to writing code so that humans can easily understand it.

So it turns out unit testing is a win in every way, when done well. Well, except for the "it's boring" bit. (Numerous comments here hint at copilot being a real help.)

But writing code that's amenable to unit tests isn't something you do naturally. Fortunately just getting practice at writing unit tests is enough to teach the skill. Sadly, that takes time and frustration. While you learn your productivity will drop for a while. And worse, when writing new code adding unit tests is always slower than the old way. The pay back only comes when you later make changes.


I figure that the value in cases like this, is that you can have confidence things (even trivial things) will continue to work, when you decide to upgrade dependencies. Does that apply here? Would you feel more confident in that case than you would without the tests?


try writing the test first


We can't even have a consensus on what "unit" tests really are... Every company i work for has a different meaning for it. Some places consider a test "unit" when all the dependencies are mocked, some places consider a whole feature a "unit".


Kent Beck (the originator of test-driven development) defines a unit test as "a test that runs in isolation from other tests". This is very different from the popular and completely misguided definition of a unit test as "a test that tests a class/method in isolation from other classes/methods".

But it doesn't really matter if you want to call a given test an "integration" test or a "unit" test. The point of any test is to fail when something breaks and pass when something works, even if the implementation is changed. If it does the opposite in either of those cases, it's not a good test.


Kent Beck also said [0] "I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence"

[0] https://stackoverflow.com/a/153565


> The point of any test is to fail when something breaks and pass when something works

The point of any test is to document API expectations for future developers (which may include you).

That the documentation happens to be self-validating is merely a nice side effect.


I'd rather drop the useless prefix instead of trying to fix it.


I get the logic from the mocking camp e.g. we're not here to test this dependency we're just here to test this function/method whatever but when you mock you end up making assumptions about how that dependency works. This is how you end up w/ the case of all my tests are green and production is broke.

I think it's hard to beat e2e testing. The thing is e2e tests are expensive to write and maintain and in my opinion you really need a software engineer to write them and write them well. Now manual e2e testing is cheap and can be outsourced. All the companies I've worked for in the US have had testing departments and they did manage to write a few tests but they were developers and so to be frank they were really bad at writing them. They did probably 80 or 90% of their testing manually. At that point who we kidding. Just say you do manual testing, pay your people accordingly and move on.


So at work we would run tons of tests against the real service with a real database, seeding thousands of schemas to allow for parallel testing of tests that change state.

This takes 3 minutes, 1 if you use tmpfs. It only takes <10 seconds if you dont run writing tests.

These actually cover most real world use cases for a query-engine we maintain.

Unit tests have their place for pieces of code that run based on a well defined spec, but all in all this integration or component-level testing is really what brings me the most value always.


From research I've read, unit tests (whether automated or not) tend to catch around 30% of bugs whereas end to end testing and manual code review (believe it or not) each tend to catch around 80% of bugs.


The gem of this story is the author is not running unit test in what most folks understand a unit test is. As he also pointed out, he is executing the tests on target so it is more of an integration tests rather than unit tests. In the test that he is doing, it brings in new categories of potential faults. ie scheduling issues, memory constraints, interrupts servicing,


It is a little sad to see so many be so dismissive of unit tests. They aren't a universal solution, which seems to be why they are written off in many cases, but they make your life so much easier in so many cases.

If you need to mock out 80% of a system to make your unit test work, then yes, it's potentially pointless. In that case I'd argue that you should consider rewriting the code so that it's more testable in isolation, that will also help you debug more easily.

What I like to do is write tests for anything that's just remotely complex, because it make writing the actual code easier. I can continuously find mistakes by just typing "tox" (or whatever tool you use). Or perhaps the thing I'm trying to write functionality for is buried fairly deep in an application, then it's nice to be reasonably sure about the functionality before testing it in the UI. Unit tests just makes the feedback loop much shorter.

Unlike others I'd argue that MOST projects are suited for unit testing, but there might be some edge cases where they'd provide no value at all.

On caveat is that some developers write pretty nasty unit tests. Their production code is nice and readable, but then they just went nuts in the unit tests and created a horrible unmaintainable mess, I don't get why you'd do that.


> If you need to mock out 80% of a system to make your unit test work, then yes, it's potentially pointless. In that case I'd argue that you should consider rewriting the code so that it's more testable in isolation, that will also help you debug more easily.

This is also where the dogma of “only test public methods” fails. If your public method requires extensive mocking but the core logic you need to protect is isolated in a private method that requires little mocking, the most effective use of developer resources may be to just test your private method.

> On caveat is that some developers write pretty nasty unit tests. Their production code is nice and readable, but then they just went nuts in the unit tests and created a horrible unmaintainable mess, I don't get why you'd do that.

I have also seen this a lot and usually it’s when people try to add too much DRY to their unit tests. I recall being as a junior dev told by our lead that boilerplate and duplication in tests is not strictly a bad thing, and I have generally found this to be true over the years. Tests are inherently messy and each one is unique. Trying to get clever with custom test harnesses to reduce duplication is more likely to lead to maintainability issues than it is test nirvana. And if your code requires so much setup to test, that is an indicator of complexity issues in the code, not the test.


> If your public method requires extensive mocking but the core logic you need to protect is isolated in a private method that requires little mocking, the most effective use of developer resources may be to just test your private method.

You're looking at the tested code as immutable. If you're not allowed to touch the code being tested, then yes, you'll sometimes need to test private methods, and that is fine. "Don't test private methods" is actually more about how to architect the primary code, not a commandment on the test code. If you find that you're having to do extensive mocking to call a public method in order to test the functionality in some private method, that's a major smell indicating that your code could be organized in a better way.


> the most effective use of developer resources may be to just test your private method.

While there is nothing wrong with testing an internal function if it helps with development, so long as it clearly identifiable as such, you still need the public interface tests to ensure that the documented API is still conformant when the internals are modified. Remember that public tests are not for you, they are for future developers.

This is where Go did a nice job with testing. It provides native language support for "public" and "private" tests, identifying to future developers which can be deleted as implementation evolves and which must remain no matter what happens to the underlying implementation.


> If your public method requires extensive mocking but the core logic you need to protect is isolated in a private method that requires little mocking, the most effective use of developer resources may be to just test your private method.

When I did unit tests in C++, I found a simpler (and better) solution: Shrink the class by splitting it up into multiple classes. Often the logic in the private methods could be grouped into 1-3 concepts, and it was quite logical to create classes for each of them, give them public methods, and then have an instantiation of that class as a private member.

Now all you need to do is write unit tests for those new classes.

Really, it led to code that was easier to read - the benefit was not just "easier to test". Not a single colleague (most of whom did not write unit tests) complained.

I've yet to run into a case where it was hard to test private behavior via only public methods that couldn't be solved this way.


What this guy said. Public APIs don't need to be public to everyone. They can public but only visible to internally within a package.

You can split things out decompose some things even if its just some util functions and start sending out chunks of code for review. It doesn't even necessarily have to be seperate files.

Your reviews will be faster and smoother too.


I have tried to evangelize unit testing at each company I've worked at and most engineers struggle with two things.

The first is getting over the hurdle of trusting that a unit test is good enough, a lot of them only trust an end-to-end test which are usually very brittle.

The second reason is, I think, a lot of them don't know how to systematically breakdown test into pieces to validate e.g. I'll do a test for null, then a separate test for something else _assuming_ not null because I've already written a test for that.

The best way I've been able to get buy-in for unit testing is giving a crash course on a new structure that has a test suite per function under test. This allows for a much lower loc per test that's much easier to understand.

When they're ready I'll give tips on how to get the most of their tests with things like, boundary value analysis, better mocking, IoC for things like date time, etc.


I've evangelized against unit testing at most companies I work at, except in one specific circumstance. That circumstance is complex logic in stateless code behind a stable API where unit testing is fine. I find this usually represents between 5-30% of most code bases.

The idea that unit testing should be the default go to test I find to be horrifying.

I find that unit test believers struggle with the following:

1) The idea that test realism might actually matter more than test speed.

2) The idea that if the code is "hard to unit test" that it is not necessarily better for the code to adapt to the unit test. In general it's less risky to adapt the test to the code than it is the code to the test (i.e. by introducing DI). It seems to be tied up with some sort of idea that unit testability/DI just makes code inherently better.

3) The idea that integration tests are naturally flaky. They're not. Flakiness is caused by inadequate control over the environment and/or non-deterministic code. Both are fixable if you have the engineering chops.

4) The idea that test distributions should conform to arbitrary shapes for reasons that are more about "because google considered integration tests to be naturally flaky".

5) Dogma (e.g. uncle bob or rainsberger's advice) vs. the idea that tests are investment that should pay dividends and to design them according to the projected investment payoff rather than to fit some kind of "ideal".


> The idea that unit testing should be the default go to test I find to be horrifying.

Kent Beck, who invented the term unit test, was quite clear that a unit test is a test that exists independent of other tests. In practice, this means that a unit test won't break other tests.

I am not sure why you would want anything other than unit tests? Surely everyone agrees that one test being able to break another test is a bad practice that will turn your life into a nightmare?

I expect we find all of these nonsensical definitions for unit testing appearing these days because nobody is writing anything other than unit tests anymore, and therefore the term has lost all meaning. Maybe it's simply time to just drop it from our lexicon instead of desperately grasping at straws to redefine it?

> It seems to be tied up with some sort of idea that unit testability/DI just makes code inherently better.

DI does not make testing or code better if used without purpose (and will probably make it worse), but in my experience when a test will genuinely benefit from DI, so too will the actual code down the line as requirements change. Testing can be a pretty good place for you to discover where it is likely that DI will be beneficial to your codebase.

> The idea that test realism might actually matter more than test speed.

Beck has also been abundantly clear that unit tests should not resort to mocking, or similar, to the greatest extent that is reasonable (testing for a case of hardware failure might be place to simulate a failure condition rather than actually damaging your hardware). "Realism" is inherit to unit tests. Whatever it is you are talking about, it is certainly not unit testing.

It seems it isn't anything... other than yet another contrived attempt to try and find new life for the term that really should just go out to pasture. It served its purpose of rallying developers around the idea of individual tests being independent of each other – something that wasn't always a given. But I think we're all on the same page now.


> Kent Beck, who invented the term unit test, was quite clear that a unit test is a test that exists independent of other tests

Kent Beck didn't invent the term "unit test", it's been used since the 70's (at minimum).

> I am not sure why you would want anything other than unit tests?

The reason is to produce higher quality code than if you rely on unit tests only. Generally, unit tests catch a minority of bugs, other tests like end to end testing help catch the remainder.


> other tests like end to end testing help catch the remainder.

End-to-end tests are unit tests, generally speaking. Something end-to-end can be captured within a unit. The divide you are trying to invent doesn't exist, and, frankly, is nonsensical.


> End-to-end tests are unit tests, generally speaking.

Generally, in the software industry, those terms are not considered the same thing, they are at opposite ends of a spectrum. Unit tests are testing more isolated/individual functionality while the end to end test is testing an entire business flow.

Here's an example of one end to end test (with validations happening at each step):

1-System A sends Inventory availability to system B

2-The purchasing dept enters a PO into system B

3-System B sends the PO to system A

4-System A assigns the PO to a Distribution Center for fulfillment

5-System A fulfills the order

6-System A sends the ASN and Invoice to system B

7-System B users process the PO receipt

8-System B users perform three way match on PO, Receipt and Invoice documents


> Here's an example of one end to end test

Bad example, perhaps, but that's also a unit test[1]. Step 8 is dependent on the state of step 1, and everything else in between, so it cannot be reduced any further (at last not without doing stupid things). That is your minimum viable unit; the individual, isolated functionality.

[1] At least so long as you don't do something that couples it with other tests, like modifying a shared database in a way that that will leave another test in an unpredictable state. But I think we have all come to agree that you should never do that – going back to the reality that the term unit test serves no purpose anymore. For all intents and purposes, all tests now written are unit tests.


Every step updates shared databases (frequently plural). In the case of the fulfillment step, the following systems+databases were involved: ERP, WMS, Shipping.

Typically, in end to end testing, tests are run within the same shared QA system and are semi-isolated based on choice of specific data (e.g. customers, products, orders, vendors, etc.). If this test causes a different test to fail, or vice-versa, then you have found a bug.

If we call that entire sequence of steps a "unit" test, would you start with testing the entire sequence of steps, or would you recommend testing the individual steps first?

And if we did test the individual steps first, we would give that testing a different name? Like maybe "sub-unit" testing?


> Every step updates shared databases (frequently plural).

That's fine. It all happens within a single unit. A unit should mutate shared state within the unit. Testing would be pretty much useless without.

> If we call that entire sequence of steps a "unit" test, would you start with testing the entire sequence of steps, or would you recommend testing the individual steps first?

For all intents and purposes, you can't test the individual steps. All subsequent steps are dependent on the change in inventory state in step 1. And the product of step one is undoubtedly internal state, so there is no way for the test to observe the state change in isolation (unless you do something stupid). You have to carry out the subsequent steps to be able to infer that the inventory was, in fact, updated appropriately.

After all, the whole reason you are testing those steps together is because you recognize that they represent a single instance of functionality. You don't really get to choose (unless you choose to do something stupid, I suppose).

> And if we did test the individual steps first, we would give that testing a different name?

If the individual steps can be tested individually (ignoring a case of you doing something stupid), it's not actually and end-to-end process, so your example would make no sense. Granted, we have already questioned if it is a bad example.


> For all intents and purposes, you can't test the individual steps.

Sure you can, and we did (that is a real example of an end to end test from a recent project) which also included testing the individual steps in isolation, which was preceded by testing the individual sub-steps/components of each step (which is the portion that is typically considered unit testing).

For example, step 1 is broken down into the following sub-steps which are all tested in isolation before testing the combined group together:

1.1-Calculate the current on hand inventory from all locations for all products

1.2-Calculate the current in transit inventory for all locations for all products

1.3-Calculate the current open inventory reservations by business partner and products

1.4-Calculate the current in process fulfillments by business partner and product

1.5-Resolve the configurable inventory feed rules for each business partner and product (or product group)

1.6-Using the data in 1.1 through 1.5, resolve the final available qty for each business partner and product

1.7-Construct system specific messages for each system and/or business partner (in some cases it's a one to one between business partner and system, but in other cases one system manages many business partners).

1.7.1-Send to system B

1.7.2-Send to system C

1.7.3-Send to system D

1.7.N-etc.

> And the product of step one is undoubtedly internal state, so there is no way for the test to observe the state change in isolation

The result of step 1 is that over in software system B (an entirely different application from system A) the inventory availability for each product from system A is properly represented in the system. Meaning queries, inquiries, reports, application functions (e.g. Inventory Availability by Partner), etc. all present the proper quantities.

To validate this step, it can be handled one of two ways:

1-Some sort of automated query that extracts data from system B and compares to the intended state from step 1 (probably by saving that data at the end of that step).

or 2-A user manually logs in to system B and compares to the expected values from step 1 (again saved or exposed in some way). This method works when the number of products is purposefully kept to a small number for testing purposes.

> If the individual steps can be tested individually (ignoring a case of you doing something stupid), it's not actually and end-to-end process, so your example would make no sense. Granted, we have already questioned if it is a bad example.

Yes the individual test can be tested in individually. Yes it is an end to end test.

> Granted, we have already questioned if it is a bad example.

It's a real example from a real project and it aligns with the general notion of an end to end test used in the industry.

More importantly, combined with the unit tests, functional tests, integration tests, performance tests, other end to end tests and finally user acceptance tests, it contributed to a successful go-live with very few bugs or design issues.


>Kent Beck, who invented the term unit test, was quite clear that a unit test is a test that exists independent of other tests

I vaguely remember him also complaining that there were too many conflicting definitions of unit tests.

Maybe that can be solved with another definition?

https://xkcd.com/927/

or maybe not.

I dont know many people who would describe a test that uses playwright and hits a database as a unit test just because it is self contained. If Kent Beck does then he has a highly personalized definition of the term that conflicts with its common usage.

The most common usage is, I think, an xUnit style test which interacts with an app's code API and mocks out, at a minimum, interactions with systems external to the app under test (e.g. database, API calls).

He may have coined the term but that does not mean he owns it. If I were him Id pick a different name for his idiosyncratic meaning than unit test - one that isnt overburdened with too much baggage already.


> He may have coined the term but that does not mean he owns it.

Certainly not, but there is no redefinition that is anything more than gobbledygook. Look at the very definition you gave: That's not a unique or different way to write tests. It's not even a testing pattern in concept. That's just programming in general. It is not, for example, unusual for you to use an alternative database implementation (e.g. an in-memory database) during development where it is a suitable technical solution to a technical problem, even outside of an automated test environment. To frame it as some special unique kind of test is nonsensical.

If we can find a useful definition, by all means, but otherwise what's the point? There is no reason to desperately try to save it with meaningless words just because it is catchy.


The definition I gave is the one people use. Hate or love it youre not going to change it to encompass end to end tests and neither will Kent Beck. It's too embedded.


> youre not going to change it

I might. I once called attention to the once prevailing definition of "microservices" also not saying anything. At the time I was treated like I had two heads, but sure enough now I see a sizeable portion (not all, yet...) of developers using the updated definition I suggested that actually communicates something. Word gets around.

Granted, in that case there was a better definition for people to latch onto. In this case, I see no use for the term 'unit test' at all. Practically speaking, all tests people write today are unit tests. 'Unit' adds no additional information that isn't already implied in 'test' alone and I cannot find anything within the realm of testing that needs additional differentiation not already captured by another term.

If nothing changes, so what? I couldn't care less about what someone else thinks. Calling attention to people parroting terms that are meaningless is entirely for my own amusement, not some bizarre effort to try and change someone else. That would be plain weird.


Well, I don't regard unit tests as the one true way. I don't enforce people on my team do it my way. When I get compliments on my work, I tend to elaborate and spread my approach. That's what I mean by evangelize, not necessarily advocating for a specific criteria to be met.

I find that integration tests are usually are flaky, its my personal experience. In fact, at my company, we just decided to completely turn them off because they fail for many reasons and the usual fix is to adjust the test. If you have had a lot of success with them, great. Just for the record, I am not anti-integration or end-to-end test. I think they have a place and just like unit tests shouldn't be the default, neither should they.

Here are the two most common scenarios where I find integration (usually end-to-end called integration) tests become flaky:

1) DateTime, some part of business logic relies on the current date or time and it wasn't accounted for.

2) Data changes, got deleted, it expired, etc. and the test did not first create everything it needed before running the test.

Regarding your points,

1) "realism" that is what I referred to as trusting that a unit test is good enough. If it didn't go all the way to the database and back did it test your system? In my personal work, I find that pulling the data from a database and supplying it with a mock are the same thing. So it's not only real enough for me, but better because I can simulate all kinds of scenarios that wouldn't be possible in true end-to-end tests.

2) These days the only code that's hard to test is from people that are strictly enforcing OOP. Just like any approach in programming, it will have it's pros and cons. I rarely go down that route, so testing isn't usually difficult for me.

3) It's just been my personal experience. Like I said, I'm not anti-integration tests, but I don't write very many of them.

4) I didn't refer to google, just my personal industry experience.

5) Enforcing ideal is a waste of time in programming. People only care about what they see when it ships. I just ship better quality code when I unit test my business logic. Some engineers benefit from it, some harm themselves in confusion, not much I can do about it.

Most of this is my personal experience, no knock against anyone and I don't force my ideals on anybody. I happily share what and why things work for me. I gradually introduce my own learning over time as I am asked questions and don't seek to enforce anything.

Happy coding!


> I'll do a test for null, then a separate test for something else _assuming_ not null because I've already written a test for that.

Honestly, this pedantry around "unit tests must only test one thing" is counter-productive. Just test as many things as you can at once; it's fine. Most tests should not be failing. Yes, it's slightly less annoying to get 2 failed tests instead of 1 fail that you fix and then another fail from that same test. But it's way more annoying to have to duplicate entire test setups to have one that checks null and another that checks even numbers and another that checks odd numbers and another that checks near-overflow numbers, etc. The latter will result in people resting writing unit tests at all, which is exactly what you've found.

If people are resisting writing unit tests, make writing unit tests easier. Those silly rules do the opposite.


Just to clarify, I am not advocating for tests to only test one thing, rather that after you have tested for one scenario you don't need to rehash it again in another test.

Breaking a test down helps to clarify what you're testing and helps to prevent 80 loc unit tests. When I test for multiple things, I look for the equivalent of nunit's assert.multiple in the language that I'm in.

The approach I advocate for typically simplifies testing multiple scenarios with clear objectives and tends to make it easier when it comes time to refactor/fix/or just delete a no longer needed unit test. The difference I find, is that now you know why, vs having to figure out why.


I agree! I see a lot of stuff like "static typing is better than tests", "tests don't prove your code is bug free" etc as if tests somehow have to be a silver bullet to justify their existence.

I definitely think its ok for the overall standard of test code to be lower than production code though (I guess horrible unmaintanable tests is maybe a bit much). A few reasons I can think of off the top of my head:

- You can easily delete and rewrite individual tests without any risk

- You don't ship your tests, bugs and errors in tests suites have a way smaller chance of causing downstream issues for customers (not the same as no chance but definitely a lot smaller)

- I'd rather have a messy, hard to understand test than no test at all in most cases. That isn't true of production code at all, there are features that if they can't be produced in a coherent way with the rest of the codebase just don't have the value add to justify the maintenance burden.


I often think of unit tests as being programmable types, like Eiffel pre/post conditions or functional languages with types like Even and Odd.

For example, in double(x) -> y you can use types to say x belongs to the set of all integers and y must also must be in that set, but that’s about all you can say in Python.

Unit testing lets you express that y must be an even number with the same sign as x. It is like formal verification for the great unwashed, myself included.


But you literally cannot possibly test that assertion for all x. Let's take a slightly harder problem:

prove (or at least test conclusively) that for all integer x, the output y of the following function is always even:

    y = x^2 + x + 2
There is essentially no way to prove this for all x by simply testing all integers. If your integers are 64-bit, you don't have enough time in the lifespan of the universe.

On the other hand, you could simply reason through the cases: if x is even, then all terms are even. If x is odd, then x^2 is also odd, and x^2 + x = odd + odd = even. So you're done.

This is what people mean when they say "tests don't prove your code is correct" -- it's almost always better to be able to read code and prove (to some degree) that it's correct. It's really nothing like static types, which are also constructive proofs that your code is not incorrect in specific ways. (That is: it proves that your code is not [incorrect in specific ways], not that your code is [not incorrect].)

Once you prove your code correct, you can often write efficient tests with cases at the correct boundary points to make sure that proof stays correct as the code changes.


Could you do this in Python (using positive integers as an example of a type rather than even numbers)?:

  class N(int):
    def __init__(self, z: int):
      assert z > 0
      super().__init__(z)

  def log2(n: N) -> float:
    …

  log2(N(32))   # 5.0
  log2(32)      # type error
  log2(N(-32))  # runtime error
You are still relying on the runtime to detect errors and it’s annoying to have to cast all ints to Ns, but you at least won’t ever take the log of a negative number.


That's a cool trick! Depending on your use case though it might not go far enough since the new "type" is really just an integer with an assert. It won't be picked up by static type checkers and there's nothing stopping you doing this:

``` x = N(2) x -= 10 ```

I think x still winds up being less than 0 here?

Its probably easier just to add an "assert" with a friendly message at the start of the log function.


> But you literally cannot possibly test that assertion for all x.

Hence why he states it is formal verification for the "unwashed masses". The "washed" will use a language with a type system that is advanced enough to express a formal proof, but most people can't hack it, and thus use languages with incomplete type systems and use testing to try and fill in the gaps.


> If you need to mock out 80% of a system to make your unit test work, then yes, it's potentially pointless. In that case I'd argue that you should consider rewriting the code so that it's more testable in isolation, that will also help you debug more easily.

demands that people rewrite all their production code in service of unit tests are probably a big reason of why a lot of programmers don't unit test.

> On caveat is that some developers write pretty nasty unit tests. Their production code is nice and readable, but then they just went nuts in the unit tests and created a horrible unmaintainable mess, I don't get why you'd do that.

probably they write bad unit tests because they can't rewrite all their code but they have a mandate that all changes must be unit tested.

if strict purity could be relaxed and programmers were allowed to write more functionalish unit tests with multiple collaborators under test then there would likely be less resistance to testing and there shouldn't be any mocking-hell tests written.

higher level functional/integration tests also shouldn't be missed since your unit tests are only as good as your understanding of the interfaces of the objects and people write buggy unit tests that allow real bugs to slip between the cracks.


Dismissive of unit tests or TDD? I don't know any peer developer who is dismissive of any form of unit tests. But there are a plenty who are dismissive of TDD.

As for the quality of tests, that's usually a combination of factors and capacity is one of them. At the end if PO's don't see business value in tests, they won't be prioritized.


For me, the biggest point is that unit tests are not a stand-in for understanding your code. It's like that quote about driving by just crashing into the guardrails all the time. Most unit-testing evangelists sound to me like they're using (or even advocating for) unit testing instead of thinking deeply about their code. Slow down and understand your code.

If you're finding more mistakes by running unit tests than by thinking through and re-reading your code, you're not finding most of your mistakes. Because you're not understanding your own code. How can you even write great unit tests if you don't understand what you're doing?

There are, of course, times when writing the tests first can help you think through a problem -- great! Especially when thinking through how some API would look. But TDD as a methodology gets a hard reject from me.

I certainly reject the argument "unit testing is too hard" -- then your code is bad and you should focus on fixing it. Well-written code is automatically easy to unit test, among 60 other benefits. That's not a reason to avoid unit testing.


unfortunately legitimate use cases for unit tests (like this) are pretty rare

in corporate codebases, overwhelmingly, unit tests are just mocked tests that enforce a certain implementation at the class or even individual method/function level and pretend that it works, making it impossible to refactor anything or even fix bugs without breaking tests

such tests are not just useless, they're positively harmful

https://gist.github.com/androidfred/501d276c7dc26a5db09e893b...


Well-written breaking tests represent something changing in a code base. You can be intentional about breaking a test, but then at least you can be very explicit about what you are changing.

Have seen all to many times I've broken a unit test in a code base that I did not intend to break, just to have an aha moment that I would have introduced a bug had that test not been present.

Unit tests are a trade off between development speed and stability (putting aside other factors, such as integration tests, etc). In large corporate settings, that stability could mean millions of dollars saved per bug.

That example you provided is a poor one and not really consistent with your point that unit tests are useless - the point is being made that that specific test of UserResource is useless, which I also agree with. Testing at the Resource level via integration test and Service level via unit test is probably sufficient.


Especially true if you get emergent side-effects from non-obvious shared state dependencies in large projects.

Nightmares... =)


Yes sir :)

And pragmatically - this always happens at some point. Something something about deadlines and need to get this out yesterday.


If maintained right, unit tests at edge conditions can quickly diagnose system and runtime state health.

If you work with malicious on incompetent staff at times (over 100 there is always at least 1)... it is the only way to enforce actual accountability after a dozen people touch the same files over years.

"The sculpture is already complete within the marble block, before I start my work. It is already there, I just have to chisel away the superfluous material." ( Michelangelo )

Admittedly, in-house automated testing for cosmetic things like GUI or 3D rendering pipelines is still nontrivial.

Best of luck =)


I've been on several projects where we had a significant number of unit tests that would only fail when requirements would change and we had to change the code.

"Look, if it only fails with requirement changes, then maybe we're better off not having them."

This only made people uncomfortable. They just don't like walking down the mental path that leads them to the conclusion that their high coverage unit tests are not worth the tradeoff. Or even that there is a tradeoff present at all.

Meanwhile, PRs constantly ask for more coverage.

Not very often, but sometimes someone will mention: "but unit tests are the specification of the code"

  test ShouldReturnOutput {
    _mock1.setup( /* complicated setup code returning mock2 */ );
    _mock2.setup( /* even more complicated setup code */ );
    
    let output = _obj.Method( 23.5 );
    
    Assert( output == 0.7543213 );
  }

  /* hundreds of lines above this test case */
  setup {
    if ( _boolean ) {
      _obj = new obj(_mock1);
    }
    else {
      _obj = new obj(new mock());
    }
  }
I'm just not sure I can get there.


The point of unit tests is not to CYA during refactors, but to confirm that the implementation is consistent between small changes without weird side effects.

A coworker once thought unit tests were dumb, and ended up writing code that repeated the call to an application 10x for the same info. This didn’t result in a changed UI because it was a read, but it’s not good to just suddenly 10x your reads for no good reason.

TFA also describes discovering weird side effect race conditions as a result of unit tests.


> confirm that the implementation is consistent between small changes without weird side effects

not sure what this is referring to, but I'll give an example

say you have a requirement that says if you call POST /user with a non-existing user, a user should be created and you should get a 2xx response with some basic details back

you could test this by actually hitting the endpoint with randomly generated user data known to not already exist, check that you get the expected 2xx response in the expected format, and then use the user id you got back to call the GET /user/userId endpoint and check that it's the same user that was just created

this is a great test! it enforces actual business logic while still allowing you to change literally everything about the implementation - you could change the codebase from Java Spring Boot to Python Flask if you wanted to, you could change the persistence tech from MySQL to MariaDB or Redis etc etc - the test would still pass when the endpoint behaves as expected and fail when it doesn't, and it's a single test that is cheap to write, maintain and run

OR

you could write dozens of the typical corporate style unit test i'm referring to, where you create instances of each individual layer class, mocks of every class it interacts with, mocked database calls etc etc which 1) literally enforce every single aspect of the implementation, so now you can't change anything without breaking the tests 2) pretend that things work when they actually don't (eg, it could be that the CreateUserDAO actually breaks because someone stuffs up a db call, but guess what, the CreateUserResource and CreateUserService unit tests will still pass, because they just pretend (through mocks) that CreateUserDao.createUser returns a created user


To be fair to unit tests, I really like them for making sure that complicated code gets tested thoroughly. However, very often the complicated code isn't isolated to a single unit. It instead lives distributed amongst multiple objects that are reused for several competing aspects and all have their own undocumented assumptions about how the world works.

Now maybe this implies that we need a wide scale change in coding methodology such that the complicated parts are all isolated to units. But pending that, I'm not sure that the answer is a bunch of static mocks pretending to be dynamic objects with yet another set of undocumented assumptions of how the world works.

The unit tests that have made me happiest has been unit tests on top of a very complicated library that had a very simple api.

And on the other hand, the tests that make me most believe that the projects I'm working on are correct have been integration tests incorporating a significant part of the application AND the QA team's test very thorough test plan.


Unit and integration testing are not strategies to be used exclusively.

Integration tests relying on interlocking behavior are, by their nature, complicated. Unit tests are there to test what can be tested simply, and are cheaper to write, so your test structure should be pyramid shaped, with hopefully fewer tests as complexity increases.


I literally gave examples in my example.

As a general example, a unit test is great for things like:

Your ExampleFactory calls your ExampleSerivce exactly once, not more, not less, so you can check that side effects don’t result in unnecessary extra calls in more load.

This is particularly relevant in a language like Java; modern Java style is functional, but the old style relied heavily on side effects, and they’re still possible to write unintentionally.


I understand your example and I agree that in that particular case, maybe a unit test to ensure a given service calls a dao once and only once or whatever is justified

but I don't think the hypothetical risk of someone needlessly calling the db ten times is a good reason to justify adding that style unit tests to everything by default - if it happens, sure, add one for that particular call and call it a day


I believe the kind of scenario that bedobi is referring to is something like this (using your example):

Unit test exists, ExampleFactory only calls ExampleService once.

Hmm, it turns out that ExampleUIButton calls ExamplePoxyNavigator more than one time if the text in ExampleUIButton happens to be wider than the default width for the button.

What does ExampleProxyNavigator do? Oh, it calls ExampleFactory which calls ExampleService. But only when the whole system is wired up for deployment.

The unit tests indicate that the system should be functioning okay, but when you put everything together you find out that the system does not function okay.


Using mockserver etc. you can cover for these things in component-test cases even more easily through your whole application while being more flexible with bigger code changes than unit tests allow.


A lot of places on the internet treat component testing and unit testing as synonyms. I’ve never heard of the former and it basically sounds like the unit tests as we write.


I barely ever have unit tests flagging real issues. It's always a chore to update them. Feature/end to end tests though... Plenty of real issue flagged.


> I barely ever have unit tests flagging real issues

That sounds like you work alone and haven't worked for a long time on a code base with unit tests. Or the unit tests are bad.


Dont take this the wrong way, but this is the answer i would get from enterprise devs usually when pointing this out.

Then i would realize that their definition of a real issue was completely removed from any business or user impact, but geared more towards their understanding of the process detail in question.

I would argue that there certainly are some good places for unit tests, like if you have some domain-driven design going and can have well defined unit-tests for your business logic, but this usually is the smallest part of the codebase.

Mocking things that talk to databases etc. usually gives a false sense of security while that thing could break for a whole number of reasons in the real world. So just dropping the mock here and testing the whole stack of the application can really do wonders here in my experience.


> this is the answer i would get from enterprise devs usually when pointing this out

Yes, exactly what I thought, that's what you would hear from somebody who has experience working on large code bases with many contributors.


Ironically my experience has been that these responses came from people working in enterprise silos with few collaborators. Your mileage may vary.


Not disagreeing with your points. One thing mocks can be good at is to simulate errors that would be difficult to reproduce in an actual stack. For example, maybe you want to try and handle transient network issues or db connection failures, a mock could throw the correct exception easily, making your full stack do this would be challenging.


> It's always a chore to update them

Or he is actually not realizing unit tests bring to attention code that is impacted by the change... Or his tests just do for dynamically typed language whatever static tying does on compilation :)


I have an unrelated (and most likely dumb) question about the article. When they talk about the inheritance relationship between 'Thread' and 'MyThread' in the example code in reference to the destructor methods, particularly here:

> Now, what happens when MyThread::singlepassThreadWork() uses a member variable of MyThread like foobar and we delete the MyThread object while the thread is still running? The destruction sequence is such that MyThread is deleted first and after that, the destructor of its parent object Thread runs and the thread is joined. Thus, there is a race condition: We risk accessing the vector foobar in singlepassThreadWork() after it was already deleted. We can fix the user code by explicitly stopping the thread in its destructor

What does it mean when they say 'the destructor of its *parent* object Thread runs'? I've always thought that when you inherit from one class to another and then instantiate an object of said class, they're just one object, so what do they mean when they make the distinction between 'parent' and 'child' object? When you have inheritance of say two classes, those would be two distinct objects instantiated in memory? Is there something I'm missing?


You're right, the wording is confusing. It should be "parent class". There is only one object, a MyThread object. In C++ when an object is destroyed, all the destructors in the hiearchy run, from bottom to top. So first ~MyThread and then ~Thread.

Anyway I think it is odd design to stop the thread in the destructor. You'd normally stop the thread first and then destroy the object, not the other way around?


They might be trying to encapsulate things so that they are sure threads get stopped when the objects go out of scope.

But, I would probably do that by having a class that contains both the thread and the data that the thread needs to access. Then its destructor could first join the thread and then clean up the data. For example, instead of a WorkerThread that contains a vector of WorkItem, have a BackgroundWorker that contains a Thread and a vector of WorkItem.


I see now, thank you very much


https://en.cppreference.com/w/cpp/language/destructor

Take a look at "Destruction sequence" but basically the destructors are chained together and called one after another to free all resources rather than forming one destructor for the derived object. That being said it is still effectively one object in memory.


Thank you for the explanation and reference.


I think about unit tests being useful for getting more confidence that some deterministic, pure (mathematically speaking) and stateless piece of code that's data in data out actually works, particularly when you change it.

If any of those conditions doesn't hold the cost/benefit certainly and even sometimes the absolute utility goes way down.

If I have to mock anything, in particular, or more generally care at all about any implementation details (ie side effects) then I just think might as well make this a full on automated functional test then.

As soon as fake code is introduced into the test its utility rapidly decays in time as the things it fakes themselves change.


I agree that mocks are brittle and nearly useless.

If you follow SOLID principles to the extreme, you'll find that your code is separated into logic code that is pure and easy to unit test, and IO code that is very simple and can be tested by a relatively few number of integration tests.


To some extent this is pretty much the same as mocking. You are still injecting fake data into your pure logic functions whether its through their parameters or by them calling a mock.

I agree preferable but sometimes you want to test the logic of the code thats actually making decisions about how and when the IO is called.

You can do it with integration tests of course but in more complex environments with lots of complex IO dependencies mocking is cheaper. Its also hard to simulate specific failures in integration tests like a specific request failing. Pretty much mocking with extra steps.

So mocking has its place as well.


How do you all feel about the need to rewrite a unit test when code gets refactored or business logic changes, isn’t that like a huge pita?


I treat unit tests like double-entry bookkeeping; I wouldn't describe it as a particular pain and consider it more of a matter of due diligence.

Not everything needs this level of rigor but there are plenty of cases where the tests are very cheap to write and reason about (for many pure functions) or are worth the cost as they validate critical behavior. Unit tests also add some design pressure to keep more logic pure/side-effect free; sure, it may take a bit more work to factor your code accordingly to keep i/o interactions separated to the shell of the application but I find this to be a useful pressure.

I've found that if I'm encountering pain when writing unit tests, then the pain is due to one of the following things:

1. The code is growing too complex and I need to decompose the logic or refactor the tests

2. The code has grown too many unintentional side effects and I need to move those side effects to discrete components

3. The code under test has fundamental side effects and those side effects require testing, thus the unit tests need to be converted to an integration test

4. The code under test is sufficiently complex that it demands full system/acceptance testing

There are some cases where refactoring the tests is generally too painful and I'll throw away all the tests entirely, maybe sprinkle in a few tests for logic that seems critical, and move on. Tests can accumulate technical debt, but in contrast to implementing code it's pretty cheap to cut your losses on tests and wipe them out.

I see a lot of people conflating unit testing with the idea that all code must have tests, and there's a ton of code that's phenomenally painful to test and can be easily checked by the developer. Tests should be a supporting tool an an augment to the developer practices; it's better to have some tests that work well and throw out the ones that are miserable to write rather than require 95% test coverage, drown in testing, and throw out all tests entirely.


Thanks for that response, it was helpful esp about the side effects


Generally refactoring is where I find tests to be super valuable. If it’s a pure refactor then the existing tests shouldn’t break. If they start failing, then you have done something that has changed the expected behavior.

For business logic I would change the tests first so that it represents the new expected result. Then you refactor the code until the tests pass.


My experience is that the tests should either test functions that are small and do one thing (I.E. sorts, maps with some logic, basically where you want to test edge cases and sanity check). In those cases there is very little reason to change that code. If you are testing something larger, the test should be an integration test, where you test the full business logic flow. That makes the code less PITA to change while still giving you confidence.

If the business logic actually changes, the tests should break IMO because they are there to ensure that the business logic remains consistent. When you test the business logic (without testing the implementation) the code becomes much safer to modify and refactor.


If you're testing implementation details rather than contracts, you're susceptible to this. Make sure the unit yutare testing is the thing you want to observe the behavior of.


>It is a little sad to see so many be so dismissive of unit tests.

You're preaching to the choir. The overwhelming majority of people worship unit tests like dogma. There's almost no point in saying the above. It's like saying it's a little sad to see some people who are so dismissive about eating and breathing to stay alive.

Your next part is the one that's interesting. Mocking 80 percent of a system to get unit tests to work. I've seen so much of this from developers who don't even realize the pointlessness of what theyre doing that it's nuts. They worship test so much that they can't see the nuance and the downside.

Take this article. This article is literally presenting evidence for why unit tests are bad. He literally created an error that would not have existed in the first place we're it not for his tests. Yet he has to spin it in such a strange way to make it support the existing dogma of test test test.


Sitting on a call right now where a guy is going on about how excited he is to mock out the entirety of a large e-commerce vendor's platform. It's maddening.


To me an interesting distinction is not between unit and integration tests, but between tests that are run quickly as part of a gate on commits in CI, vs. tests that are run more asynchronously searching for bugs.

The former must run quickly, and it's ok if the exact same test is run over and over. The latter need not run quickly, but benefits if new tests can be created and run, or if the tests incorporate randomness so they don't do the same thing each time they are run.

Here, it seems he was using tests intended for the first purpose for the second purpose instead. That can work, as it did here, but I don't think it's optimal. Better to have more exploratory, randomized, property-based tests chugging away in the background to find weird new ways the code can fail.


I like pasting code into ChatGPT, then saying "Write unit test(s) that demonstrate the bug(s) in this code". I have pre-instructions that say "Show code only. Be concise" to keep it simple. This has resulted in many learnings for me.


I believe in them. But unit tests are useless around useless humans. And there’s lots of those. A fine example is that time I wrote a test suite for a domain specific language parser. Someone wanted to break the language so they deleted the tests. New stuff was added without tests.

They confidently broke everything historically and looking forward. Then blamed it on me because it was my test suite that didn’t catch it. The language should not have been broken.

Everything only works if you understand what you are doing so every argument should be posed as both sides.


Whenever I read, "when my code breaks the tests, I delete the tests", this is what I picture in my head.

Their code changed behavior and good unit tests catch change in behavior. Someone somewhere is probably depending on that behavior.


In my opinion this is because we don't teach Chesterton's Fence early enough (or often enough) to internalize it at a societal level.


Most software doesn't work the moment you stray from the expected path.

Whether that's because most software isn't tested competently or because software testing practices don't deliver robust software is not yet clear.

I suspect that unit tests, and tests in general, will be considered a historical artifact from the time before we worked out how to write software properly.

For example, we don't generally unit test things that a static type system checks for us. Maybe good enough type systems will remove the rest of them.


I think it’s a little over optimistic to think that we will ever work out how to properly write software. Some new patterns may help, but we will always have a need for unit tests and other tests.

Wrt typing, that’s a very narrow set of errors, and I would dare say even a small minority of the things that can and do go wrong in software are type related. That said, effective typing is another orthogonal tool to unit tests that can help create robust software. On that front, what we are missing is a language with robust typing that catches these type errors, but also gets out of developers way the rest of the time.


I don't see how you can either believe or not believe, in a unit test. A unit test is what it is. It's a real thing. It exists. Use it, or don't.

How this topic can sometimes be about belief is beyond me. It's like if a person found a screw driver and says, I now believe in screw drivers.

The topic of how people believe in unit tests, to me is proof that the world is screwed. We're all screwed and everything is a screw driver.


I suppose this is sort of the complement of https://xkcd.com/169/

Pretending to misunderstand clear communication then making smug points about it isnt clever either.

https://www.merriam-webster.com/dictionary/believe%20in definition 2, "to have trust in the goodness or value of (something)".

Words (and phrases) in English usually have more than one meaning. Ranting about correct use of a phrase because you're pretending the only extant meaning is a different one is not clever.


Unit tests are great, but the way we often do unit tests is often as simple change detectors, which don’t say so much about correctness as much as they do about the code still doing what the programmer thinks it is doing (and tests often need to change if the code changes).

It would be nice if unit tests were more like interlocking evidence of system correctness, but right now we just have integration tests with poorer coverage for that.


Similar experience to this guy - didn't believe in them initially, but now I'm a believer.

For what it's worth, I find Copilot to be quite an exceptional help in writing unit tests ! A real game changer for me. Not only it takes care on most boilerplate code, but also kind of 'guesses' what case I'm about to write - and sometimes even point me in a direction I would miss otherwise.


Unit tests is like buying insurance but you don’t know how much insurance has paid you if things go wrong. You spend a lot of time and effort to make your code testable, figure out what the useful test is and change the unit test when you do refactors in hope it speeds up your project, except you cannot really know if there was a net gain in speed/ reliability vs proper QA and other techniques


I like them because they help me to partition my code into units that are easier to write and test. Once they're working, assembling the parts generally leads to a working project.

It also motivates me to get small pieces working and tested before I get to the finish line. Each successful test is a victory!


Said it before and will say it again. There is no replacement for unit test - it is the only thing that will give you flawless deployments. Not MIT degrees, not process, not managers - tests are the literally the only thing I've seen consistently produce flawless production deployments. It's not a discussion.


Another thing unit tests are is focused. If you change just a small part of your code, you should only need to run a small fraction of your unit tests. Your unit test framework should support this selective execution of tests.


Unit tests saved me many times.

I'm happy that this article praises unit tests without forcing a TDD perspective to the reader. It presents it like a tool, not a religion, and that's very refreshing.


I am troubled by the word belief, not just in the title, but in the comments here. Unit tests should not be doctrine, there is a time and a place. And, I feel that more often than not they are warranted.

We can argue about what granularity they should be, talk about functional programming, debate whether they should hit the database or not, but IMO all of those things miss the point. For me, in order of priority, unit tests provide the following benefits:

1) Make me write better, more decoupled code

2) Serve as documentation as to the intent of the code, and provide some expected use cases

3) Validate the code works as expected, (especially when "refactoring", which is basically how I write all my code even from the start)

4) Help you when deleting code by exposing unexpected dependencies

You can argue against all of those points, and I often will, myself. It depends on the scale, importance, and lifetime of the project as to whether I will write unit tests. But, as soon as I think someone else will work on the code, I will almost always provide unit tests. In that scenario, they:

- Provide a way to quickly validate setup and installation was correct and the application functions

- Signal that the code was "curated" in some way. Someone cared enough to setup the test environment and write some tests, and that gives me a certain comfort in proceeding to work on the code.

- Provide a gateway into understanding why the application exists, and what some of the implementation details are.

So, thinking about the advantages I've outlined above, for me it would be very hard to say I don't "believe" in unit tests. I just don't always use them.


I don't believe in unit tests as they are practiced. This, unfortunately, is the kind of thing that can work in principle, but the realities make it unusable.

There are multiple problems with unit tests, as they are implemented in the industry. And to make the unit tests usable and productive you need to make them so productive that it can offset those problems.

First of all, for unit tests to work everybody has to contribute quality unit tests. One team member writing unit tests well for his part of functionality is not going to move the needle -- everybody has to do this.

Unfortunately, it is rarely the case that all team members are able to write quality code this is the case for unit tests.

Usually, the reality is that given deadlines and scope, some developers will deprioritize focusing on writing good unit tests to instead deliver what business people do really care about -- functionality. Give it enough time and unit tests can no longer be trusted to perform its job.

Second, it is my opinion that refactoring is extremely important. Being able to take some imperfect code from somebody else and improve it should be an important tool in preventing code rot.

Unfortunately, unit tests tend to calcify existing code making it more expensive to change the functionality. Yes, more, not less expensive. To move a lot of stuff around, change APIs, etc. you will usually invalidate all of the unit tests that work around this code. And fixing those unit tests in my experience takes more effort than refactoring the code itself.

Unit tests are good for catching errors AFTER you have made the error. But my personal workflow is to prevent the errors in the first place. This means reading the code diligently, understanding what it does, figuring out how to refactor code without breaking it. Over the years I invested a lot of effort into this ability to the point where I am not scared to edit large swaths of code without ever running it, and then have everything work correctly on the first try. Unit tests are usually standing in the way.

I think where unit tests shine is small library code, utilities, where things are not really supposed to change much. But on the other hand, if they are not really supposed to change much there also isn't much need to have unit tests...

The most paradoxical thing about unit tests is that teams that can write unit tests well can usually produce code of good enough quality that they have relatively little use of unit tests in the first place.

What I do instead of unit tests? I do unit tests. Yes, you read that correctly.

The trouble with unit tests is that everybody gets the part of what unit is wrong. Unit does not have to mean "a class". Units can be modules or even whole services.

What I do is I test a functionality that matters to the client -- things I would have to renegotiate with the client anyway if I was to ever change it. These tests make sense because once they are written -- they do not need to change even as the functionality behind them is being completely rewritten. These test for what clients really care about and for this they bring a lot of bang for the buck.


But my personal workflow is to prevent the errors in the first place.

Too many times I’ve made “simple” changes that were “obviously correct” and whose effects were “completely localized” only to wind up eating healthy servings of crow. If correct up-front analysis were possible to do reliably, we would have no need for profilers to diagnose hotspots, debuggers, valgrind, etc., etc.

So I enlist cheap machine support to check my work.


Sure. Only it is a lie that it is cheap.

Maybe CPU cycles are cheap, but writing that code is not. Which is exactly the point of my rant.

My position is that it makes much more sense to focus on tests that test observable behaviour that is not supposed to change a lot because it is a contract between the service and whoever the client is.

Writing this code is still expensive, but at least now it is much easier to make sure the return is higher than the investment.


The classic test rookie mindset is to test the functionality of the whole system, because that's what really matters.

But in reality, unit testing every single function and method is where the vast majority of the benefit lies. Details really matter.

It took me some time to learn this, even after being told. It's the same for most people. This little post will probably convince no one.

But maybe remember it when you finally get there yourself :)


> every single function and method

Very much no, that's the bad kind of unit test that locks your code into a specific structure and makes it a pain to update because you also have to change all the related tests even if the actual interface used by the rest of the codebase didn't change. I would call this the rookie mistake of someone new to unit tests.

You want to encapsulate your code with some sort of interface that matches the problem space, then test to that interface. How its internals are broken down don't matter: it could be one big function, it could be a dozen functions, it could be a class, it as long as the inputs and outputs match what the test is looking for you can refactor and add/remove features without having to spend extra time changing the tests. Makes it much less of a pain to work with in general.

One way of looking at it I've used before with coworkers: For this new feature you're writing, imagine a library for it already exists. What is the simplest and most straightforward way to use that library? That's your interface, the thing you expose to the world and what you run your tests against.

This is what unit testing originally meant: semantic units, not code units.

It's like app Hungarian notation vs system Hungarian notation, the original idea got overtaken by people who didn't understand the idea and only mimicked the surface level appearance.


> But in reality, unit testing every single function and method is where the vast majority of the benefit lies. Details really matter.

To me, this is actual rookie mentality. You end up testing the same thing multiple times over different lines of code, mocking and providing various sets of testing data... When you could just test specified and/or observable behaviour of your system, and achieve the exactly same result with fewer tests.


Well, I certainly don't end up doing that.

There are many ways of doing things, and I guess we do unit tests differently.

> When you could just test specified and/or observable behaviour of your system, and achieve the exactly same result with fewer tests.

In my experience, it turns out to be very difficult to test a specific behaviour 5-10 layers deep from an external interface. Also, when one of those intermediate layers changes, you tend to have to rewrite many of those tests.


> Well, I certainly don't end up doing that.

How else are you "unit testing every single function and method"?

> it turns out to be very difficult to test a specific behaviour 5-10 layers deep from an external interface.

If you don't test it, how do you know your system works for that specific behaviour? Just because you've tested every single function and method in isolation doesn't mean they actually work with each other, or produce the responses the way you need them to.

> Also, when one of those intermediate layers changes, you tend to have to rewrite many of those tests.

As you should. Otherwise how do you know that your system still works?


> How else are you "unit testing every single function and method"?

There are many ways to write unit tests, as well as writing code that is easy to test. I don't know how my ways differs from yours, but I don't have much of the problems you mention.

> Otherwise how do you know that your system still works?

We do have some integration test, of course. But it's a small part of the total test suite.


> I don't know how my ways differs from yours, but I don't have much of the problems you mention.

You haven't answered "How else are you "unit testing every single function and method"?"

Given a medium-sized project and at least a passing and a failing test case for each function and method, you end up if not with hundreds, but with dozens of tests largely doing the same thing.

> We do have some integration test, of course. But it's a small part of the total test suite.

So what does your test suite contain? Lots of unit tests for each method and function. What else?


> Given a medium-sized project and at least a passing and a failing test case for each function and method, you end up if not with hundreds, but with dozens of tests largely doing the same thing.

I don't understand this comment.

If I have one unit test for each function/method, that's just one test doing the same thing.


1. Your unit test probably shouldn't be testing several conditions at once [1]

2. Even if its just one unit test per function/method, even in a medium-sized project it's dozens of tests, many if them overlapping, with no idea if those functions/methods even work together correctly

[1] Depends on function/test


I'll mirror a sibling comment. For me your mentality is the rookie mentality. I too once believed in strict unit tests, as well as a strict differentiation between them and other kinds of tests (end to end, integration, etc).

Then I joined a project where they were just starting to add tests to an existing project, and the lead developer was adamant on the following philosophy: "Virtually all tests will run the full program, and we'll mock out whatever slows the tests down (e.g. network access, etc)". I whined but I had to comply. After a year of this, I give him credit for changing my mindset. The majority of bugs we found simply would not have been found with unit tests. On the flip side, almost none of the unit test failures were false alarms (function signature changed, etc).

Since then, I've dropped categorizing tests as unit tests vs other types of tests. Examine the project at hand, and write automated tests - preferably fast ones. Focus on testing features, and not functions.


I think he should have credited Tom Van Vleck with the "three questions" idea. It was published in ACM SIGSOFT Software Engineering Notes, vol 14 no 5 July 1989, pages 62-63, and you can read the whole thing here:

https://multicians.org/thvv/threeq.html

I hope he got permission to reproduce the comic.


Unit tests are not even well defined. What is a unit?


Something needn't be well-defined to be valuable. :)

If it helps, think of "unit tests" and "atomic tests". Your goal in writing a unit test is to test the smallest possible amount of logic at a time, with the least possible overhead (i.e., mocking).

The advantages of this approach are many: it helps keep the level of complexity of individual methods low enough to be quickly understandable, documents the interface provided by your methods, ensures that the tests run quickly, and allows new tests to be written with minimal effort.

Obviously there are disadvantages, too. Unit tests - any tests - take time to write. This is sometimes offset by the time saved by catching issues as early in the development cycle as possible, but not always.

For "greenfield" projects especially, I tend to take a different approach than in my other work. For those, I start by "writing the README". It doesn't matter if it's an actual README.md; the point is to write down some examples showing how you think the new functionality should be used. Once that's done, I'll stub out an implementation of that, then refining it with increasing granularity until the overall architecture of the project begins to be defined. Sometimes, that architecture is complex enough that it's worthwhile to break it into smaller pieces and start the process over for those. Other times, I get to a working "happy path" pretty quickly.

Once I have a minimally working feature, I write tests for the public-facing interface. Then the interfaces between domains inside the project. Then unit tests for individual methods. I mostly work in Python, so this is also the point where I pause and apply type annotations, write/expand my docstrings, ensure that my `__all__` objects are set properly, make sure any "internal use" methods of publicly exported types are prefixed with `_`, etc.

On the other hand, when I'm writing a feature or making a change to a more mature codebase, I often _start_ by writing tests. Sometimes that's a new interface that I'll be using elsewhere, so I'll write tests defining that. Sometimes it's a change in behavior on an existing implementation, so I'll write tests for that. Either way, from that point on I repeatedly run _only_ the new tests that I've written as I build out the feature. Only once the feature works and those tests pass do I re-run the whole test suite to check that I've not broken something I hadn't considered. When those pass, I'll go back over my code one more time to make sure that I've added tests for all of the relevant internal stuff before submitting the patch.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: