Hacker News new | past | comments | ask | show | jobs | submit login
Losing faith in testing (thorstenball.com)
192 points by ben_s 9 months ago | hide | past | favorite | 148 comments



>I get paid for code that works, not for tests

A blog post could be written about just this statement and how it contributes to a low trust workplace where those who cut corners are favored by stakeholders and everyone else is left scrambling to clean up the messes left in their wake. If you're writing code for yourself, sure, be targeted and conservative with your tests. But when you're working with others, for goodness sake, put the safety nets in place for the next poor soul that has to work on your code.


That quote is totally true though.

Ultimately, tests are there to make sure code works, not for tests' sake. The rest of the sentence you're quoting being "so my philosophy is to test as little as possible to reach a given level of confidence"

Overall the approach in the OP looks to me like a decently balanced take, trying to aim for enough tests without excess.


> tests are there to make sure code works, not for tests' sake.

It works is awfully imprecise. Are you talking about perfect code? Some specific use-case? Some project-specific level of quality?

Ordinary (non-fuzzing) tests are generally there to help protect against regressions when changing existing functionality, or to offer baseline quality assurance for new functionality. They can also be helpful in, say, determining whether the code behaves as expected when using a different compiler. They aren't normally how you discover a long-standing bug. There's always more to software quality than just testing.


Such a reply is at best bad faith. Why would tests not be about ensuring working software? Why is the other side doing-it-for-itself while you care about the real goals?

We could continue down this line: it’s not about working software; it’s about software that is useful for the user. Because it’s better to deliver slightly wrong results sometime as long as it is what the user expects and wants. But wait. Why did I assume that you want working-software-for-itself at the expense of what the user wants? Just a bad faith assumption, again.

And we continue: it’s not about software that is useful for the user; it’s about software that is useful for the business.


Sadly, writing tests to have tests is totally a thing. A tech lead can come in a new team and ask for 80% or more of code coverage and no one will bat an eye. Will the software work better for it ? that's up to debate and it will depend on the quality of the tests.

I get your point of view, it feels completely absurd. The same way writing JIRA tickets just to have tickets, writing dumb comments because the CI will yet at you for not having comments frel absurd. And it's a reality in more places that I wish.

> And we continue: it’s not about software that is useful for the user; it’s about software that is useful for the business.

If we look at the number of failed businesses, that's a lesson that needs to be relearned again and again.


I completely take your point and agree with you.

I’d just like to add a nuance/complication: sometimes it makes perfect sense for a lead to come and insist on the test coverage metric hitting X%. Not because it’s a good idea on its own or in general, but probably because of very context-specific reasons. It’s not uncommon for engineers to not think about how their code tests at all (which is related to the interfaces they design), so something as arbitrary as this can be a forcing function to bring a baseline of awareness and competence to a team. In that sort of scenario, ideally the norms change as the team problems evolve, and so that arbitrary test coverage goal would ideally become irrelevant over time as a team improves and matures.

Think back to your first weeks / months of programming: this sort of reductionist and over-simplistic “rule” was probably very formative in your early years, even if you had to unlearn them over time. Real teams have people with a spectrum of skills, and sometimes you have to set crazy-ish rules to level-up the baseline.


> Sadly, writing tests to have tests is totally a thing. A tech lead can come in a new team and ask for 80% or more of code coverage and no one will bat an eye. Will the software work better for it ? that's up to debate and it will depend on the quality of the tests.

This is true but not the bad faith part. The bad faith part is assuming—based on nothing—that the other interlocutor is doing thing because of cargo-cult/silly reasons.


It's impossible to prove that code works. But tests are a strong indicator and at least put bounds on where the program does work.

If you're paid to write code that works, you're paid to write tests. This is fairly standard in every other engineering field.


You can at least try to prove the code works, and you can get a lot further than people seem to bother. Hell: even just using a language with types--which many people don't do--is a form of invariant that proves away a lot of potential bugs you would otherwise have to test.

And like, we absolutely have proof assistants and model checkers and more advanced languages with dependent types (even C++ can do a lot more than many languages due to how it can template over values, which includes stuff like the sizes of buffers on which it can do math at compile time, but we should be spending more time coding in the like of Coq/Idris/Lean)... let's not normalize a world in which people entirely give up on formal correctness.

The problem with tests is that people use them as a crutch to not have to even just prove to themselves in their head -- much less to someone else or in a way that can be checked by a machine -- why their code does or doesn't work... they just throw together a ton of tests and if they hammer the code and when the tests stop failing they proudly announce "I guess it works now" and move on.

My challenge: try to code for a while with the mentality that every every time you stop typing to test it or run and it doesn't work (including "my tests failed"), that is a serious problem that should be avoided. Instead, try your best to make it so the first time you get around to testing your code it works because you are that confident in how it works.


This is the kind of shortcut that gets easily forgotten after a while IMHO.

Why you write tests is important, and for instance coverage numbers are not that. Most automated coverage assessments still won't guarantee you're testing all the critical patterns (you just need enough to touch all the paths) and a low number doesn't always mean it's not enough.

I understand the use as an heuristic's, but as it gets widely adopted it also becomes more and more useless. I mean, today we see people eyeing at LLMs to boost their coverage numbers automatically, and that trend of writing low effort tests has been going all for a while IMO.


It's like that common misconception about the testing pyramid.

The reason it's smaller at the top isn't because you should have numerically more tests at the bottom then at the top. It just shows that if you're doing a higher level test, you're also testing the layers below this. As an unrealistic example: if you'd have 1 IT and 1 UT, you'd still have double coverage at the bottom. You're probably still gonna create more UT then ITs though, as they're easier to write... so this is probably more academic pedantry then anything insightful


> I understand the use as an heuristic's, but as it gets widely adopted it also becomes more and more useless.

I too am a big fan of Goodhart's Law. It seems many took this as advice and not a warning.


At my current company we have pretty high test coverage. The test suite also happens to be full of mocks plus some cargoculted antipatterns that make many tests useless. The engineers who actually test their stuff can be counted with one hand.


I see this so often.

Mocks and stubs get used all over the place because nobody understands what it means to write code that’s easily testable. They’re great when used correctly (e.g., remote services or inherently stateful APIs like time). But they almost never are.

You end up with tests that ensure one and only one thing: the code is written the way it’s currently written.

Tests should do two things: find unexpected out-of-spec behavior, and prevent regressions during the course of editing and refactoring. These overly-mocked tests by definition can’t do the first one and they actively inhibit the second. They have negative value insofar as they constantly trigger failure while making completely benign edits.


It feels like

- First write the needed change

- Now assert that we just wrote the needed change


This is why I tend to lean toward the approach of only using mocks, stubs, etc. when absolutely needed. That tends to be at the places where the code is interacting with external components like databases or web services than with components that are within the current layer, such as connecting controllers and services.

It's also why I don't like the philosophy of unit testing being about only testing a specific class. -- You end up in situations where you convince yourself you have to mock the helper classes it uses, so end up with disconnected pointless tests.

Instead, each test should be testing as much real code as possible.


The best definition for "unit" in unit testing is the test itself. The test is the unit - as in, each test is testing a unit of code and doesn't interfere with other tests or rely on the external environment.

Too many times have I been told a class or a method in a class is a unit. It makes no sense and is the reason for excessive mocking, because they end up trying to isolate that class as an arbitrary unit.


Yeah that's the biggest issue in testing software.

In a perfect world someone would stop and refactor the entire codebase to make every object perfectly unit testable.

In the real world, you have to pick between expanding the system under test (SUT) to include multiple real collaborator objects or creating a violent mess of mocks.


It's not the right thing to do in a perfect world either. I've written quite a lot of software without any dependency injection with 0 unit tests that is fully covered by reliable, more than fast enough integration tests.

Some people think that if I did dependency injection so I could unit test all of that same code, that code would be "better" in some indefinable way. In reality the code would just be longer (dependency injection increases SLOC). All other things being equal - longer = worse.

DI-all-the-things is a dogma hangover from the late 1990s when integration testing and end to end testing tooling was all poor, computers were all slow and people tended to write more of their own algorithms (rather than importing them). DI is now something that should be approached on a case by case basis.


Yep, and I've seen code shipped that passed all its unit tests but the interface between them was subtly wrong so it violently failed to work at all after being shipped. Had to write functional tests to make sure it all worked together.


I'm really interested in this. Most of my systems use DI containers to orchestrate any system-level fakes (e.g. the file system) I might need, but I have minimal mocking or fakes otherwise.

Do you have a blog or article that goes into your design more?


No, I haven't read or written about it I'm afraid.

System level fakes is how I achieve it though. They're often more costly to build up front but they're extremely reusable between projects and you can use them to construct better, more realistic fixtures.

I suppose it is still dependency injection of a sort - but it's dependency injection that is A) stack-agnostic and B) invisible to the app C) connects much better to the outside world (e.g. real databases or filesystems).


I wrote configuration management software. Mocking the filesystem was a disaster. It worked much better to take the whole system that was responsible for modifying the filesystem and just "dependency inject" a real file or directory and do real POSIX operations on it and include the POSIX filesystem as the "system under test".

So as a concrete example if you have a system for editing /etc/{passwd,shadow} on Solaris you can create a tmpfile (even on Linux) that looks like a Solaris /etc/{passwd,shadow} file (a 'fixture') and "DI" that tmpfile into your system and then do operations on that file and check the result. This is testing the "update the passwd file on solaris" system in isolation, so it isn't a proper full integration test of the entire configuration management run, but there's a lot of moving parts under test. It also allows devs to run it on Linux and while you still have to test it in CI on Solaris before shipping, the likelihood of any incompatibility between Solaris and Linux filesystem semantics is pretty low. And it doesn't mess with the actual /etc/{passwd,shadow} file on your CI server. You can worry about coverage of "okay it edits the tmpfile but can it edit the real file" and we had some tests that would really add a fake user and really delete a fake user on the CI boxes as well. In practice I don't think that was necessary since we just required root.

So you're testing everything under the "please create a user" API down to the hardware and not having to insert mocks into your own code or to attempt to mock the entire POSIX filesystem API (I did that at one point much earlier in my career and it still lives on to this day as the biggest mess of a test I've ever constructed -- unfortunately it works and doesn't change much at all and would have taken way too long to rewrite it).


This assumes useful tests.

There's a false equivalency drawn between writing tests means you wrote good code. In reality, those who can write good tests can also write good code.

My rules of thumb: "Be a goldfish" - Forget everything you know about your project, is it complicated, non-intuitive? Tests + clear documentation.

But don't test for stupid stuff.

AI's already generate+test the stupid stuff for us anyway ... why are we writing it


Exactly this. Most of the tests reimplement the implementation using mocks. Such tests are useless, as they always prove the code is correct. Worse, such tests make refactoring much slower. On a low level, only black-box interface tests make sense, and on a high level, use scenario testing. The implementation has to be tested indirectly, otherwise, it leaks.


I think another blog could be similarly written on "don't fix it if it ain't broken." Fixing things before they are broken is substantially cheaper and takes far less time. But it is far easier to push off. Maintenance and fixing things BEFORE they are broken is key. Of course, not everything needs to be fixed. But many sayings are often taken too literally.


See also “let’s be pragmatic”. “Pragmatic” can be used to short-circuit any debate when the interlocutor arbitrarily decides that you care too much about whatever subject. Like, do you think that adding tests in this PR will give us concrete, tangible results? Well no... but in the long term it will make the application more robust and review more streamlined and— and now the other guy has rhetorically won because he has a snippy “pragmatic” argument while you replied with a whole paragraph.

Let’s be pragmatic. I care about working code. I care about the bottom-line of the business. On and on and on.


“that works” is doing a lot of the heavy lifting. I take it to mean some point in the quality space where the number of bugs (noisy crashes or silent inaccuracies) is almost zero.

“that works” can also mean demo to pointy haired boss and it works one time on a developers laptop. But that really means “that seems to work”.

There are different concerns at play:

Are you writing redundant tests?

Could tests be made redundant by say making better tests, sophisticated type systems or better architecture. “Make illegal state impossible” kind of things.

For example in C# the private keyword is an assurance of a constraint. Your compiler will check the illegal state cannot be made by an external class corrupting it, reducing the scope of code that needs to be checked.

Modularizing code well can also help. Making it easier for developers to understand how things fit together, further improving quality.

“10000 tests + shit architecture < 1000 tests and good architecture” will often be true for the real definition of “that works”.


I'm shocked by how many developers check in code that passes the tests but they have not actually tested to make sure it works.

Also, I'm tired of people not factoring in (and not honestly reporting) the time and frustration spent yak shaving to keep testing infrastructures working. I believe that it's because folks convince themselves that to acknowledge time lost to the ritual preparation for testing is a kind of weakness because other people aren't having those problems because surely you'd hear more about it.

In reality, you can only write tests to cover the cases you anticipate. Correlating test coverage with reliability can be deadly; instead of losing sleep, periodically make sure that you can restore your backups to a production state and maybe even run some drills to see how your team responds when an unanticipated problem arises.


Tests are unbelievably useful for updating libraries. Every time I update rails I see a ton of specs fail all over the app highlighting breaking changes not mentioned in the docs. Stuff that is impossible to anticipate otherwise.


This is where a nice type system and compilation checks in CI are very useful to have.


I agree a type system does wipe out 80% of the tests you need, but I still feel like the 20% is useful. You can just write tests that verify the output looks right without having to run every single line of code to make sure there are no typos or type issues.


Yeah, I think that's one of the driving forces away from high code coverage / dogmatic TDD: dynamically typed languages used to be a lot more popular, and you really do need incredible amounts of testing to keep those working reliably in the long run. Now that typed languages are more popular — while some of the tricky logic parts are still worth testing — you can wipe out a lot of the rest because it simply won't compile when, say, you passed a nullable variable to a function that didn't expect it.


Types don't replace tests.

Tests don't replace types.

For either statement to be true, they would have to be entirely equivalent, at which point it would be a distinction without a difference to complain about.

And last I checked no type system automatically generates and minimizes failure cases.

And last I checked no test system can be used to formally prove anything about the behavior of code.


The two are definitely not equivalent, but they overlap.

Types can replace tests - if an edge case exists but I can define it out of existence with my type system, then I no longer need tests for that case. And likewise, tests can replace types - I can use dependent type systems to define invariants in my system but at a certain point it becomes so unwieldy that it's easier to use tests to define those invariants less formally.

In the end, they're just two different tools that both serve the overall aim of software correctness. Demonstrating correctness with one tool may be easier but less thorough, or more complicated but more precise, or whatever else. In this context, if you can get the type system to help you, you'll be able to get away with a lot fewer tests because you don't need to test things like "does this function call a method that actually exists?" or "what happens when this function is passed bad data?".


Why does the statement need to be either true or false, and why would tests versus a type system have to be entirely equivalent? The parent comment mentioned that 80% of tests can be eliminated with a good type system, not that a good type system can entirely replace tests.

IMO working with a good type system for a decent length of should absolutely make it apparent that a whole host basic but pervasive errors are practically eliminated, or at the very least heavily mitigated. Silly errors like thinking an array of numbers is just a number, or an Id is a string instead of an int. Every developer I have ever worked with makes these mistakes all the time, especially the developers that say they have never needed a type system or that type systems somehow limit the expressiveness of their code.

I would really like to know if there is an efficient way to write tests that can adequately cover these silly errors without over burdening the codebase with heaps of extra test code that causes rigidity and friction for refactoring.


> Why does the statement need to be either true or false,

They never said that btw. They said for a logical system you'd need "types replace tests <=> tests replace types." While I don't agree from a mathematical point, their meaning is clear enough. It's clear that you can have typed systems and still need tests and well... tests obviously don't enforce types. Plus, how does everyone in CS not know that logic systems are not binary but tertiary?


> passes the tests but they have not actually tested

Surely this is a problem with the tests then? Whatever manual steps you're referring to with "actually tested" should be automated and part of the test suite.

Much easier said than done in some cases obviously, but I've seen some cases where people were manually doing things to "actually test" their code that could obviously and easily be automated.


> Surely this is a problem with the tests then? Whatever manual steps you're referring to with "actually tested" should be automated and part of the test suite.

As far as I know there is no real way to automate ensuring that the code being submitted is tested.

Coverage gates are the closest, and coverage is routinely decried on this here site. The only thing that actually comes close is a very strong type system obviating most tests… but not all, and we’re back to square one.

Then there’s the issue of tests testing something useful, yet not over-testing.


I can relate to the issue: tests work with specific check conditions, and there might other side effects that are not critical to the test but still impacting the user.

For instance a page that properly loads, but surprisingly slowly. If you had no requirement of speed and it still loads within the test timeouts, it will pass fine, but a manual check would have raised the issue and perhaps underlying well hidden bugs.


It amazed me when a dev on one of my teams put up a code review, no tests (to be added "during the QE pass later" (a practice that itself has issues but in this case was at least plausible)), but from my quick inspection couldn't possibly work. So I download the patch, built it into my local app, test it.. and yeah, it doesn't work. At that point it was faster for me to actually make it work and just submit the new diff instead of do the back-and-forth dance over code review.

My team did pretty good on reporting the time spent (sunk) to BS like infrastructure, flaky tests (especially those involving complex selenium automation which additionally suck for speed because they require the whole app to be running), and out-of-our-team's-hands problems in the wider company. I think it helped that the most senior devs ran into the same problems and felt fine complaining about them as a way to explain why something is taking longer than might otherwise be expected. We also had interns every summer to run into problems that hadn't been fixed and we had just grown accustomed to or found our own mitigations. Sometimes it's just a knowledge issue, being honest about it means someone might be able to help. Even new members with "lead engineer" in their title can learn new things that were taught to the last crop of interns and be more productive. There is a downside risk in that the excuses can be taken advantage of and someone can get away with doing pretty much no code work for days. The excuses can also be taken advantage of to get some various work in that doesn't nicely fit into its own work ticket, though... Some slack in orgs is important and comes in various forms.

Property testing helps cover some cases you don't fully anticipate. It's worth integrating a library (https://hypothesis.works/articles/quickcheck-in-every-langua... is old but at least points to a bunch for different languages) into the automation infrastructure even if it's not used all that often.


I'm with you in spirit but I'm not shocked given how many people say that they don't need to write docs because their code is so well written that it is self documented. Which tells me your code is any combination of 1) spaghetti 2) highly unoptimized 3) a holy piece of text. What's the saying? 30 years of improvements in programming has completely undone 30 years of improvements in hardware? Moving fast and breaking things is great to get going, but eventually someone has to come around and clean up all the mess.


I never heard people say their code was well written (I wouldn't personally), but many of us have stopped reading docs and comments outside of specifically added ones in surprising locations and vendor provided doc.

Many orgs have mandatory docs and comments, yet the devs find nothing specific to write about (they already wrote design documents), and deeply down in their heart don't want to maintain it either. So you get bland and sometimes inane docs and comments, with a bunch of it going stale for a combination of reasons. To the point where you feel you lost your time reading that prose 9 times out of 10, when it wasn't straight misleading/factually wrong.

It takes real dedication, discipline and talent to have good documentation, and I wouldn't expect any random dev org to be able to pull it off.


> I never heard people say their code was well written

It's shockingly common...[0]

TBH there's very few people I've worked with that document, comment, or any form of help. You can see in that thread that much of my frustration in work (including big tech) is that people hand be aliases and call them scripts or hand be hard coded spaghetti and expect me to figure it out without handing over any of the files. But I'm also a researcher, but I have worked on some projects in big tech and this happens more.

> It takes real dedication, discipline and talent to have good documentation, and I wouldn't expect any random dev org to be able to pull it off.

I don't disagree, but I think writing docs makes a better programmer. You have to think more about your code and methods. It's like the Feynman technique of learning: teach others. I generally write docs for myself because I need to remember. Because writing comments helps me not rush and think a bit more about what I'm doing and where I'm going or could go. As a researcher I'm changing what I'm doing all the time so the latter is quite useful, thinking what I'm going to come back and hack on and making it easier or providing myself hints to come back and rethink. You're right, it takes more time, but often it is faster too. I definitely think it has made me a substantially better programmer and why shouldn't it? A professional athlete reviews their performance, analyzes, criticizes, and rethinks their plan of attack. Are we not doing the same to better ourselves? I hope the pay we get is enough to not just check out and coast but I don't want to be average. I want to be better than I was a month ago.

[0] https://news.ycombinator.com/item?id=39476290


> I'm shocked by how many developers check in code that passes the tests but they have not actually tested to make sure it works.

That's actually one of the other topics I wanted to write about yesterday. Chose to write the article above instead.

Yes, 100%. I've seen it many times: manually testing reveals more in 1min than hours of previous discussions/reviews/test-writing.


Manual testing runs into the same problems, just like coupled messy code needs to be mocked to high heaven to the point that it's hard to be completely sure you're testing the functionality, messy code is also really hard to manually test.

I worked at a place that had a 100% test coverage requirement and the codebase was highly coupled so there were mocks everywhere and tests were ridiculously brittle. Manual testing was nearly a non-starter because it was a microservice architecture with a lot of moving pieces and non-deterministic docker builds failed with surprising regularity (to the point that people shipped development VMs around much to the chagrin of the project "architect"). Getting the stars to align for docker-compose to result in a working test system, then populating it with whatever specific data was needed for that test case so you could actually walk through the flow was a minor miracle.


They've written tests, but not verified that the tests work. Specifically, not verified that the tests can detect the failure mode they claim to.


You must always verify the tests pick up whatever it is they're testing, you do that by intentionally asserting a wrong value, or running tests before the bug was fixed, for example... it's not hard.


> passes the tests but they have not actually tested

Maybe that's where the TDD should be mandatory? Before you commit a fix you should commit a test that fails without your fix.

CI could check if along the branch created for solving the issue some failing tests were added.


I have maintained a widely used (250-500k+ unique ip dl/month consistently over the last 6 years) interactive terminal program that had essentially no tests of its interactive behavior. Building and restarting the program took a minimum 15 seconds no matter how trivial the change. It became an absolute nightmare to work with and I eventually stopped contributing because it was so frustrating to work with.

Writing good automated tests for interactive programs has to be done from the beginning or it will be almost impossible to effectively add later. To test an interactive program, you need to be able to simulate input and analyze the output stream. This can be done efficiently and effectively but again only if the program is designed this way from the beginning.

At some point, if they continue on this trajectory, I would expect zed will end up in a similar state to the program I described above. It will become extremely difficult to reason about the effects of ui changes and very painful to debug and troubleshoot. Regressions will happen because relatively common cases (say 1-5% of users) that the developers themselves don't regularly use will not be noticed. Bug fixing becomes a game of whack a mole as manual changes to fix one case break another without your noticing.

I hope they're able to avoid this fate because it was hellish for me.


One of the scariest stories I've read on HN - the amount of technical debt must be staggering...

see https://news.ycombinator.com/item?id=18442941 - a day in the life of an Oracle DB developer.


[flagged]


> multi-billion dollar per year marketshare

At some point you have to wonder if it's worth it.


Indeed, as we usually focus on economy of scale and neglect the law of diminishing returns.


>Building and restarting the program took a minimum 15 seconds no matter how trivial the change. It became an absolute nightmare to work with and I eventually stopped contributing because it was so frustrating to work with.

Wait. You’re complaining about a 15 second compilation and startup loop?

Don’t take this the wrong way, but I don’t think compiled languages are for you.


> Don’t take this the wrong way, but I don’t think compiled languages are for you.

You just need an adequate build system that can perform incremental compilation and does not run a whole lot of unnecessary steps on every build no matter what changes.

Where I work, our system is huge and includes code written in multiple languages. A change to a "leaf module" (one which is not depended on by many modules) takes a second or two. You only get into 10s of seconds if you change code that affects the external API of a module which triggers the re-build of many other modules (a "core" module) - because in such case there's no escape and many LoC must be re-compiled to verify they still work.

To keep it this way is not easy: you must constantly fix mistakes people make often, like add a new build step which runs unconditionally - everything should run conditionally (normally on whether its inputs or outputs were touched), and optimise things that take too long (e.g. slow tests and moving code that changes often from core to a leaf module).


That's very unhelpful.

I abhor long compile times and I exclusively use staticly typed, compiled/transpiled languages. The solution isn't to just shrug it off, but seriously evaluate how difficult it would be to refactor to smaller modules and if the benefits would be worth it. Sometimes, it's not worth the hassle. But if it's getting to a point where velocity is a concern, and a potential major version is on the horizon, a refactor to reduce code debt, and make things modular can be a real possibility if explained correctly to the right stakeholders.


It’s not meant to be helpful. It’s simply the truth. They’re literally talking about 15 seconds.

I’m sorry, but if you’re looking for subsecond compile times, you’re simply not going to get it in C, C++, Java, or really any statically typed compiled language for any project that isn’t trivial —- no matter how many dynamically linked libraries you break your project up into.

They want a REPL, and they’re just not going to get one while dealing with these technologies.

Even your idea of creating a million tiny libraries to achieve less than 15 second compilation and launch time is insane, because now you’ve just “solved” developer efficiency by creating a deployment and maintenance nightmare.

It’s not a serious solution.


> m sorry, but if you’re looking for subsecond compile times, you’re simply not going to get it in C, C++, Java,

I always get subsecond compile times when actively working in C, until headers are changed. Single files take milliseconds to compile, and you ordinarily aren't changing multiple files at a time.

A 2000 line module I was working with was compiling in about 20ms, with a clean performed beforehand.

My last project, around 10k sloc in about 6 modules compiled and linked in under 1s.

The exception to this is embedded environments which run cmake more often than required (specifically the ESP32 IDF) or similar.


This is not true at all in any of those languages. For example in C, With a good build system setup, modifying just one .c file means only that file needs to be compiled, then you only need to run the linker.

Even if the program is very large and linking everything takes many seconds, breaking up the program into dynamic libraries should get you sub-second compiles.


Agreed, writing a core system component that isn't going to change and be called from thousands of different places? By all means heavily test and fuzz and cover the entire API surface. Writing some product feature that is probably going to be changed 1000 times and is at the edge of the system, waste of time. You'd be better off just having really good metrics and alerting for system degradation that measure actual business metrics, staggered canary roll outs, and easy rollbacks.


> Both are among the highest-quality software I have ever used and hacked on.

> Both have less tests than I expected.

Interesting bit of context: Zed's Nathan Sobo and Max Brunsfeld are both alumni of Pivotal Labs, a firm _notorious_ for its zealous adherence to, among other things, TDD. I don't, therefore, find it surprising that "neither codebase has tests, for example, that take a long-ass time to run," because the more stringently one test-drives, the less patiently one tolerates slow test suites. Besides that, after a serious investment in test-driving, one starts to learn which sorts of tests have paltry or even negative ROI; "tests that click through the UI and screenshot and compare" were right there at the top of the chopping block for most Pivotal devs, and tests that "hit the network" were explicitly taboo!

I think that Kent Beck quote is great, but it's good advice for people who test too much, rather than devs in general or, god forbid, junior developers!

The way I think about TDD, it's just like how after a while, one gets tired of copy+pasting code into the terminal, and reluctantly writes it to a file. One gets tired of emailing files, and checks them into version control. One gets tired of manipulating state--of clicking through the UI, of newing up a bunch of collaborators in the REPL, of smashing tab while blanking on the name of the method one has only just written--and writes a test.

And maybe one who wakes up tired writes the test first. ;)


I dunno, in my career I’ve found that UI e2e tests were the ones that actually found bugs. All the unit tests that I’ve written ware mostly ceremony, to increase code coverage, etc, and were the first to need refactoring after some code change. A lot of refactoring.

The tests that stayed true were the UI tests that almost always pointed to real problems with real code.

It took a while to figure out how to write them though - trying to rely as little as possible on the internal ids / html / css and write them with what the user sees - e.g. instead of clicking on the button with “testid=login” we would “click on the button with the text login in it”. Identifying fields by the labels to them or tables by their column headers. It was inspired by rails’ capibara testing lib.

And making sure the tests were not flaky, fast to execute and isolated took some time.

But it was so worth the investment - it’s surprising how little those tests would change, as it allowed us to fearlessly refactor stuff without changing any tests. They felt a lot more like a friendly QA helping out rather than an annoyance you had to deal with after you’ve finished writing your code.

And writing them was actually fun, since you didn’t have to understand how the app worked, fiddle with brittle css identifiers etc, you just wrote the steps you thought the user should do and saved it into a file.

Being UI tests kinda meant they tested the whole system with the various micro services involved, databases and other infra. And I think this is where most problems in software arise, at the edges of systems when they try to interact with each other.

Static types, immutability automatic API schemas and validators usually make sure the code one writes executes reasonably well, its where the code one writes starts interacting with all the other systems where people usually can’t anticipate things. And thats where the integration / e2e / UI tests help the most I think.


I was a big fan of the unit testing done at JP Morgan. I found 1 exploitable bug when writing Unit Tests, for an untested swath of code that was of minor concern. Ofc this was one day out of a ticket that took a few days. After a year, when I left, it was the only bug that I found with a unit test, there. I wrote unit tests for everything during my time, and I found it to be a good practice. Do I always do it at other workplaces? No. Do I do it for my own projects? Yes. It's not that I don't care as much at work, but the time pressure is constant and I sometimes skip them if I can.


This mirrors my experience - tests working with real software running on real infrastructure uncover real bugs, easier to write and maintain in the long run. You still need unit test to verify edge case such us database going down or throwing a constraint violation.


There’s a flip side. A culture that thinks end-to-end testing is the only legitimate way to test a system and that TDD is an unnecessary expense that is neither necessary nor sufficient for success.

That culture is right, but it’s also wrong [1]. Although it is true that you can spend most of your time testing, providing little value, it’s equally true that with any software system, that by developing it, you will break it in subtle ways; ways in which you could spend weeks fixing bugs you introduced and fixed before, therefore creating little value.

Unfortunately there is no substitute for critical thinking when engineering. This piece says we need to practice critical thinking with tests and that having a test that moves a mouse and clicks to confirm functionality, with suites that take 40 minutes to run, is going too far. Ball cites two examples of projects that strike a balance in testing speed and coverage. But how do we achieve this?

I used “culture” in a specific way to describe working environments. Culture, not process, politics, or incentives, drives how you do testing. A culture is an environment that has preferences. A culture that will generate good tests is a culture that values technical rigor. Rigor is important for forging tests, but it’s also important for removing tests. To make these good test suites, we need to be comfortable with removing tests, but more importantly understand why a test is needed and remove it when we can’t explain it.

[1] I’m not going to say there’s a “balance” to be struck, because that’s the language of people who say we should not write any tests.


Testing is wonderful- if one starts with the goal of delivering highest quality per unit time spent.

If you start with a goal of coverage or every little thing having X unit test, it becomes a pain on everyone.

Integration tests are the hardest. They test a huge surface area. They are slow, flakey and hard to debug when something is wrong.

But they offer the biggest bang for buck since they test it end to end from the user’s perspective.

Unit tests are wonderful for libraries with well crafted and stable interfaces.

The bottom line is always making a pragmatic decision - for the unit time spent, where can I add guards with highest bang.

SQLite is sqlite because they are obsessed with making it battle ready.

Most great quality software has a team that takes the quality bar seriously and has tests to hold that bar.

Somethings don’t need that quality bar because - it’s not proven anyone will use it.


Tests are very much a "if you liked it then you shoulda put a test on it" thing. If some end-user property is desirable to you, make sure a test covers it so that you can be sure it still works that way. This is a godsend when doing extensive refactors on the codebase, which lets you move fast.

https://twitter.com/simonw/status/1701764953114546664:

"The single biggest productivity enhancement I've ever found for my own personal projects is writing comprehensive tests for them. I don't mean TDD - I rarely write tests first - I mean trying to never land a feature or fix a bug without a test that proves that it works"

> No tests that click through the UI and screenshot and compare and hit the network.

That's fair, I think testing UI is just hard in general and it's easier to rely on users to submit bug reports especially if the UI rarely breaks.


1. There is no substitute for simplification and good design. You cannot test yourself out of a mess.

2. What you actually want is confidence in the code, and your ability to make changes. Sometimes tests are a good tool to achieve that, sometimes they are not.

3. The previous two points can and will be used to justify sloppy code by bad programmers, but it doesn't make them less true.


> You cannot test yourself out of a mess.

My dev shop has taken over two dumpster fire web app projects, both many years old, passing through many unskilled/inexperienced hands, atrocious architecture and implementation, and no tests. But, actively being used and a full rewrite not being in the cards for some time.

The very first thing we did for both was start writing integration tests at the http (wsgi client) layer to cover the majority of client facing functionality. We learned a lot about the apps in the process, fixed bugs where we could, documented and fixed lots of security issues, but generally didn't refactor anything that wasn't very broke.

Once that was done, we could start refactoring which, for one project, included a migration off Mongo to tabled Postgres. The refactoring included significant unit and other testing so that we had very good and helpful test coverage when we were finished.

I do believe there are times when testing yourself out of a mess is the only reasonable option.


Fair enough, and I have had similar experiences. I would note a few things:

- Sounds like a major undertaking.

- You note this was done in service of refactoring. So you didn't test yourself out of the mess so much as use testing to enable the simplification that got you out of the mess.

- I would argue that what is really happening here is that, by spending the time to create these tests and refactor, the current team is creating a shared mental model of the messy codebase -- they are making it understandable, at least in large part. So, you might amend my original statement "there is not substitute for simplification and good design" to "there is not substitute for making the code comprehensible". While a simple, good design is the best way to do this... it can also be achieved by having everyone expend more effort to understand a bad design well enough to work with it. And this latter strategy is, in practice, the far more common one.


Editorializing the title to something grammatically-incorrect? Please don't.

I consider it significant that both of the examples he cited are interactive programs. Thorough testing is low-payoff for those, for two reasons: it's common to tweak behaviors a bit, and if something is broken, you'll notice in the process of using it. Not no tests, but fewer tests, makes sense.

At the opposite end, I'm working on a VM, and you better believe it's got tests. Not enough, it needs more, it always needs more, but they're a godsend. When I add an optimization to the compiler, I want confidence that it hasn't broken other behaviors, which frequently it does at first. They've enabled several refactors, with at least one more big one on the roadmap. Change one end of the pipeline, change the middle, change the end: half the tests fail, figure out why, the tests are green, all is well.

It's almost a tautology but: write tests for a reason. Good reasons change over the lifecycle of a program. Early on, fixing simple invariants and preventing regression is a good motive, but with the recognition that things are going to change. Dogmatic TDD can lock in a design too early, if literally everything has a test right from the beginning the effort of every change is multiplied.

For a mature systems program designed to be robust and load-bearing, complete coverage could be a good goal. For something like a text editor or terminal emulator, that's probably overkill outside of the core components. Tests aren't free, but no tests can get pretty expensive too.


Speaking of text editors and tools like that, you can often avoid having tests (or postpone adding them for a long time), if the logic is on the main execution path, meaning you'll execute it every time you run the program, and whatever failures that can happen, are reasonably easy to pinpoint (i.e. the program shows error backtraces or somehow traces problems otherwise).

This is from my experience hacking on Emacs, naturally.

At the same time, projects that you might ship for an employer or a client, are more critical to check for correctness before deploying, and are often more complex to run and check manually on the regular than writing at least one "happy path" integration test at least for the main scenario (or several).


An interesting correlation that I have observed over the years is that the more one is "religious" about writing tests, the less actual understanding of the code one seems to have.

"Beware of bugs in the above code; I have only proved it correct, not tried it." - Donald Knuth


I've faced that problem before: Someone claimed their autoscaling algorithm had to be working correctly, as he had written a proof and had a working simulator behaving perfectly. And yes, the code matched the algorithm as written, and the proof was correct... as long as latency was zero. Once I added lags to the right places in the simulator, we got the exact same problems than in production.


I've seen a correlation between caring about the application code and the test code. That makes sense because writing, reading and running specs help you think about what you're intending to do and what you've accidentally done.


If you have good tests, you don't need to know the code as well.


The more I program the more I feel this way. Do I really care how the code looks like or does something? Not really, all I know is this set of specifications are held true as I make changes, as long as I'm happy with those specs, I'm happy with the code.


On the small I wholly agree, like how much energy do some teams or even companies still waste on style document types of disagreements? But other aspects, I do often care how the code "looks" or more precisely "does" something. Like, it shouldn't be needlessly wasteful of resources, but I don't want to pigeonhole myself as "the performance guy". It shouldn't be using under-educated idioms that increase the likelihood that someone's gotta come back to this later to fix something stupid like a null pointer exception. At the same time certain things that sometimes get derided as ivory tower complex (or just "clever") code constructions, I don't think should necessarily be avoided all the time, but should be tastefully balanced with an aim towards broader understanding and not showing off cleverness for the sake of cleverness. I've replaced so many hundreds of lines of code with some relatively simple tens of lines type theory constructions in plain old Java 8, just because a dev who stopped learning around Java 1.4/5 doesn't understand doesn't make them "complex" or "clever". I don't even particularly like static typing, but a tool is a tool.

But often I just don't care about even those details, and sometimes feel guilty about it. Carmack says to fill your products with give-a-damn, but I'm sorry, for so many things, a lot of the time I just don't/didn't. Work must be done anyway though, and not just by me. So practices that are broader, like enforcing tests, do help deliver a good enough product even under the guidance of devs and management and management's management etc. who seem to care even less than I do and don't even feel bad about it. It's particularly crazy when you talk to a customer who is gushing about something you know could have been even better with slightly different prioritization and tradeoffs; an important lesson is that many people are stoked merely that something exists.

It's sad when tests catch something that really should have been caught, if not during code authoring time by an author who actually cares a bit more than just doing the job however and going home, then by code review time, but in large companies you sometimes have to just accept things for long periods and at least with more tests we can be more likely to catch things at all. (Not to mention they're pretty valuable when the original author who best understood the code is long gone, they help you make minor tweaks without having to tradeoff new feature development time with time getting a deep enough understanding of that old code that still mostly works most of the time.)


> have been caught, if not during code authoring time by an author who actually cares a bit more than just doing the job however and going home,

I have to ask, what's the alternative type of author to one who does the job however and goes home?

Because doing the job properly takes more time, where does this extra time come from? In any agile setup the dev who goes slower but better is going to get dinged in every single standup and metric, so the only other alternative to doing the job however is unpaid overtime, which I think is even worse for a Dev team than shitty code.

IOW... Good, cheap, fast... Pick two and don't judge those who don't give you all three.


Edit: Seems you're based in South Africa? My thoughts are biased from the US perspective, so take them with a huge pile of salt.

Surely you've worked on something with passion before? At least for me, there's additional energy, focus, and clarity of thought when I'm really engaged with something, this counts for more than just extra hours of detached professionalism (let alone negative mindsets). The quality of work and the hours put into it aren't so tightly intertwined. The "how" can matter quite a bit, and even if I don't get around to implementing it until the next day at the office, I don't mind thinking about something over dinner or in the shower when I'm technically not "working". (Salaried employees don't have to "clock in" / "clock out" anyway.) I'm additionally much more motivated to engage in self-learning of new things that I may be able to put into practice on the work later. A single tiny example is just knowing about regex can save tremendous amounts of work. Same with parser generators. Softer things like the "Mikado Method" are more ehhh for whether they're worth it.

Quite a bit further down the scale of people who don't care at all how something gets done, simply not half-assing things goes a long way. To even slightly more sophisticated looks into productivity, it also becomes clear that you end up going faster when you're not having to go back to half-assedly fix the flaws of the half-assed thing done a few months ago. It takes time to do code review? Yes, and? That time pays for itself, no one needs to start working more than contractual minimums in order to get planned features out on a good schedule because of code review. In fact, given it's the only practice that consistently replicates in studies on quality, introducing it if it's absent will probably speed things up over longer time scales even if some people get mad they can't just merge their diff into master right this second. (Code review is very much a "we care at least a bit about how the job is done, not just whether it's 'done'" thing.)

Though sure, you can also put in more time than the contractually obligated minimum. Sometimes that's fine. But it's not required. There are plenty of devs out there who do a fine job with professional detachment and make sure to be done and gone by a certain hour. But that requires caring at least a bit about professionalism, and not just hacking crap together to increase your ticket close rate.

> In any agile setup the dev who goes slower but better is going to get dinged in every single standup and metric

Not my experience, nor that of many other engineers I've talked to who have worked at a lot more places than I have. I'm sure it happens, maybe I'm lucky. Admittedly I don't know anyone who has worked for Amazon.

If it's a concern, I'll say it's helpful when you get team buy-in to care at all about quality, then expectations are easier to set and maintain even in the face of re-orgs. If you're the only one who cares, or wants to care, yeah, it'll be a struggle.

There are occasionally stories, and I've seen it a few times, of devs who just go slow, they don't actually do anything better. Sometimes those "dings" are to some degree valid, but there's usually more going on than apparent speed in closing tickets and even a toxic PIP process where someone's really just out to get someone else for personal reasons will involve more "evidence gathering" than just such a metric. Anyway, when a worker is actually just slow and not better, then instead of trying to pick two of good/cheap/fast you get 0. (Since unless you're dealing with a lot of offshoring, a lot of orgs and even startups have already abandoned cheap.) We should all strive to be good and fast regardless.


You've got a long post, so I am going to work through it point by painful point, while still being as brief as possible :-)

To recap, the question I posed is

     I have to ask, what's the alternative type of author to one who does the job however and goes home?
> Surely you've worked on something with passion before?

I don't think that is relevant to the question. It's unreasonable to expect people to actually be passionate about work. Engaged while on the clock? Professional? Competent? Studious? Yes to all of the above.

I would answer the same to someone who says "I always put in an extra 2 hours each day. Surely you've done the same too?": Just because you do it, does not make it reasonable to ask of employees.

> I don't mind thinking about something over dinner or in the shower when I'm technically not "working".

That's also not relevant to the question. Just because you don't mind doing legwork during dinner does make it a reasonable ask of someone else. Maybe they want to talk to their kids? Maybe they want to watch TV?

It is beyond unreasonable to expect that staff will forgo their private time to think about work problems.

> I'm additionally much more motivated to engage in self-learning of new things that I may be able to put into practice on the work later. A single tiny example is just knowing about regex can save tremendous amounts of work. Same with parser generators.

Once again, this is irrelevant to the question I posed. I don't get the point you are trying to make with (what I construe as) irrelevant anecdotes.

...

You next paragraph, using code reviews as an example, I feel can be addressed by "The Company Pays For That Because It Is Done On Company Time". IOW, the company values it enough to make time for it.

> Though sure, you can also put in more time than the contractually obligated minimum. Sometimes that's fine. But it's not required. There are plenty of devs out there who do a fine job with professional detachment and make sure to be done and gone by a certain hour. But that requires caring at least a bit about professionalism, and not just hacking crap together to increase your ticket close rate.

Here's the meat of the argument, I think: just hacking crap together to increase the ticket close rate.

My original point is that, when this is done, this is not the fault of the employee! If the company incentivises that behaviour, then you shouldn't be judging your coworkers for following that incentive.

After all, as you've demonstrated with the code-review example, if the company wants a particular output, they can so very easily get it! If the company wants non-"hacked together crap", they can incentivise that.

IOW, you are negatively judging some person for doing exactly what is wanted by the company! What you want is not relevant to the company.

The unfortunate reality is that "hacked together crap" is more often than not more valuable to the company than quality software as you define it, because if it wasn't then the company would stop incentivising the quality of "hacked together crap".

When the company removes ticket-closing rates in its metrics, and replaces it with "quality" (however you define it) then you won't see "hacked together crap".

Until then, it's almost certainly bad form to judge people who are simply optimising for the metrics that they are being measured on.[1]

The TLDR would be "You're blaming the players when it is the game that is rigged". Your ire is terribly misplaced.

[1] I worked, once, at a place that prioritised quality over development velocity. Four years of writing code for embedded devices in C, and I never once shipped a bug! _Not_ _Even_ _Once_ ... go on and guess what my bonus for that feat was? At a different place, I hacked together some spaghetti in a language I barely new, within a week, with no tests and 10% of the happy paths broken. Customer saw it, signed on, I got a nice bonus for ensuring that potential customer turned into actual customer, and some other poor shlub was given the project to move forward with the customer.

IOW, I am both the person you say you are and the person you say you dislike, and having both perspectives, vs your single perspective, makes me a lot more sympathetic to those who do an okay-not-hbad-but-not-actually-that-good job.

I hope that you, one day, can enjoy both perspectives too.


I appreciate the thorough response, and concede my communication hasn't been very clear. I have been both types as well, I thought that could be inferred from my first comment you replied to.

I think you have a Deming perspective, which I agree with, that employees work within the system which the company has created. Or more specifically, management has created, because management is responsible for the system. Where our disagreement lies is probably how much employee responsibility and output variance that actually bounds. Maybe we also disagree on how much influence non-management employees, particularly software engineers, have in inducing management to change the system, and perhaps even whether they should bother to try and what the range of likely outcomes is when people do try.

It is in those things that I think there is a lot of opportunity for alternatives an individual has to just doing the job "however", or doing the job "while just following the incentives", that don't require working overtime.

I also still see many alternatives to "however" that boil down not to increased time but just to increased thought or knowledge. Maybe you'll argue there's often no incentive for acquiring or utilizing such thought or knowledge, or maybe not if you think it's reasonable to expect competence and studiousness. But whichever, fine; still, if I already have it, well, I can use it or not. My irrelevant examples were trying to get at this. I'll try once more with something rather trivial: there is no time difference, really, in whether you return null or an empty collection from some function when there are no things. There's no time difference really when writing the calling side to do the null check before you loop vs. just going into the loop. There's no time difference really to write the arrange-act-assert unit test on this behavior, if unit testing is your thing. So do whatever, right? There's nothing in the system to incentivize either option? Regardless of what you do, you'll go home by 5, and you haven't had to rob the story of time (aka delaying it) in order to do something in a slower but correct way. I submit though that one of these options is "the correct way", and will flag it in a review if I see the wrong one done.

I didn't say anything about expecting extra things of individual employees -- management should only expect what they incentivize. Though thinking about it, I will say now that if we're to call ourselves engineers, it's reasonable to expect an engineer to have certain professional ethics and standards, something that makes them say "actually, as an economic free agent, I'll go find a different game to play." No, I don't think something like "doesn't or isn't allowed or isn't incentivized to write unit tests" rises to that level of standard, but I can see others disagreeing (some think no one should ever write in a non-statically typed language), and if they want to leave a place because of that, especially after trying and failing to spearhead a change, more power to them.

Anyway, my point was not that something like "passion" is expected, but that when it's there, it's yet another alternative way besides staying past 5 that can lead to actually getting more and/or higher quality work done. If "passion" is present, or can be incentivized somehow, I care enough to make use of it.

> When the company removes ticket-closing rates in its metrics, and replaces it with "quality" (however you define it) then you won't see "hacked together crap".

All I'll say to this is: "Were it so easy."


I was never a huge TDD believer and I’m not one today. It’s not about tests but about managing complexity.

When working with highly dynamic languages like JavaScript, Python, Ruby (at least couple years ago) tests were the only tools we had to handle it.

Today some of the most common issues are caught by popular static typed languages (Rust, TypeScript). There are some very smart tools like prop tests and if someone is really deep into modeling it’s also possible to test concepts with TLA+ (fun if you need to explore infinite possibilities of reality bending scenarios). Also qualities of certain languages also make code easier and stabler in domains - e.g. Erlang/Elixir, Clojure or Haskell (and there are much more but those are in my monkey zone).

But in the end for me testing is just that: Managing ever growing complexity. And since tracking and maintaining change due to distributed development effort is hard the smallest common denominator I know is… write it twice.


This is a great post. I feel the same on many topics.

> I was never a huge TDD believer and I’m not one today.

I'll never forget this savage takedown of TDD by Cedric Beust: https://www.beust.com/weblog/the-pitfalls-of-test-driven-dev...

The best rebuttal to TDD that I ever heard from a developer that I respected: "I don't care if you use TDD or whatever. When you commit code, tests need to be included." That's it. And yet, the TDD evangelicals are like the vegans of diet or calisthenicians of fitness -- always annoying, no matter what they say. (Side note: I was a vegan for many years, and I still thought the loud ones were annoying!)

> write it twice

This is what I hate so much about unit tests -- you chisel the statue once from stone, then, by writing unit tests, you essentially chisel the inverse to fit your new statue. So exhausting.


I use TDD in two cases:

1) when I develop something very new to me and learn as I go. I build tests as guardrails, then move along them, fix code, fix tests, fix my assumptions. Usually it lets me move faster and be more confident about what I'm doing.

2) there's a bug reported. I write a test that reproduces the bug first, and iterate on a fix. Helps to go fast immensely.

All other cases, I have the same feelings as Thorsten a lot of the time.


And yet this is one of the best methods knows to man (See double entry accounting which is used with great success in finance for hundreds of years).

An interesting approach I’ve seen was - instead of writing tests - describing step by step scenario for fellow engineer to run on review. When it broke it was due to implicit assumptions or unclear instructions. I believe it was right solution as system input was triple digits of free form documents versus hundreds of configurations and rulesets. I couldn’t imagine dataset for those. Effect was more concise, stable test suite and a rather boring system.


Over the lifetime of a code base, most of the value of a test is not in verifying functionality. It's keeping you or someone else from accidentally breaking it when you change something.


At home, I only test parts of code where I think I need it, like regexes.

At work, as a web developer in Java Spring, I am required to have at least 80% coverage, enforced by Jacoco and tracked in Sonar. This means that if I write a method in an adapter that accepts a single parameter and only makes a single call to another method on some repository passing the parameter to it without transformation and returning the result, I must write a test for it, with Mockito.

At this point I'm not testing my code, I'm testing the Java API itself and the JVM it's running on, which I find hilarious. I'm hoping one day this kind of test fails, which will mean that Java does not garantee that reaching a method call will result in a method being called.


Test coverage requirements are such a band-aid on bad engineering culture. Just instruct reviewers to look at the coverage during code review and flag uncovered important business logic, *BOOM* problem solved. If you can't trust your code reviewers to do this, that's a sign they're either lazy or incompetent.


Also, Goodhart's law.


I'm not going to comment on unit tests (of which we have many), as that seems to be where the biggest divide is, but I will say that our integration tests are worth their weight in gold.


The whole article can be boiled down to tests are an investment and you should make sure your investments have a return.

It sounds obvious but it's a valid point. It's more common for people to have a dogma based attitude to testing to an investing approach.


> In both codebases I’ve merged PRs without any tests and frequently see others do the same. And the world didn’t end and no one shed any tears and the products are still some of best I’ve ever used and the codebases contain some of the most elegant code I’ve ever read.

Yeah if someone actually has 10 years of experience in the product and can tell that the code is obviously correct and tests would be annoying for anyone to write, and the contributor will go radio silent if you ask for the tests, and that it makes the product better and it is unlikely to bite you in the ass, then you can ship it. It is vital to have that well-built mental model of what in the codebase is likely to bite you in the ass though.

This doesn't necessarily lead to a slippery slope where every PR is full of missing or shitty tests and slide right into game development. The slippery slope argument is a logical fallacy.

Similarly, the perfect process isn't one where every PR is cut up until 100 line or less PRs, with unit tests and two developers sit down and do code review to nitpick every other line and then both of them flip the switches like they're sitting in a minuteman silos starting WWIII.


> maybe there’s no correlation between software quality and tests

Bingo. The underlying assumption that tests are some god-given faultless spec is flawed. In fact, the tests themselves shouldn't be considered to be of any higher quality than the poor code they test.

Good teams produce good software, regardless of the approach or ideology.


Perhaps the key with testing strategy is to define what to test.

As pointed out, code coverage is not a good metric when followed blindly.

I believe there are at minimum 2 levels that need testing. First is unit testing. It should ensure complex functions work as intended as a unit.

Second, for your core functionalities, have e2e test cases. This ensures your product actually works for the end user.

Unit testing should make up the biggest part of the test suites. E2e should be kept at minimum but yet satisfies the quality requirements.

Few years back i wrote a bit about this based on the AAA method: https://cinaq.com/blog/2019/05/05/simple-high-value-tests-wi...

The core idea is to determine what are high value tests.


I like the way that testing as I go alters my code base, making each bit of code runnable in isolation and so easy to understand and fix.

I hate flaky tests and I hate slow tests. Figuring out how to eliminate both takes care, but it gets rid of two of the three main negatives of testing.

The third main negative for me is that testing can prematurely add friction around changing things like function signatures. Mitigating this is addressed well in my current favorite essay on effective testing:

https://matklad.github.io/2021/05/31/how-to-test.html


Tests are valuable, but of course they’re never perfect. Certain kinds of programs require them more than others. And no development approach, method, or tool is the best in all cases.

The program with the most tests that I’m aware of is SQLite. I have a feeling Hipp and his team know their code extremely well and that they do a lot of thinking before they change anything. Then they run their massive test suite, which they built because they’re good enough to know that they’re not good enough to think everything through without making errors.

Tests have saved me from a few blunders over the years.


I think our options basically come down to types, tests, and syntax.

Syntax is the least explored because it's expensive to iterate on and teams can't agree on it, so we sit around hoping that the next generation of languages will let us consume a better syntax instead of daring to build one. But a syntactical approach leads towards writing the specification in terms of the high level description, folding up lower level pieces of the problem into bits of syntax, and then having them meet in the middle with "enough" compilation. You end up with a program that is right because it literally can't be written wrong.

Types do some specification, but they work in a more bureaucratic sense of "you have to do what you mean" by checking to see if you filled out the right form. Of course teams love doing that, because they need a little bit of bureaucracy. Everyone seems to agree on some amount of type discipline being needed at scale.

Tests exist at the far end of specification, where the program is just "what it's been tested to be." This is most useful when you don't have a clear specification and you're mostly looking at the integration side of things, not deeply examining the behavior. Teams can add lots of tests, but the tests are a scattershot and don't always do anything specific. They do have a tendency to tell you how a change will break an existing specification, though.


Man I don't love these unspecified, context-free takes on "building software". The author lists a lot of tech stack, but no scope, no context, no size estimates about the software he's built.

Now we have a thread where the gal writing FORTRAN for legacy nuclear plants is telling the guy doing IT for a mom and pop how he's approaching tests all wrong. Meanwhile, the third person working in an expanding startup who has "ninja" in his job description understands neither and looks down on both.

Not very productive.


To make an even broader point—

I used to be very impressed with people that wrote programming books, spoke at conferences, and so on. Decades on, I look at the best programmers and managers I’ve ever worked with and notice they’ve never done these things. On the flip side people I’ve worked that were always off doing this kind of thing never seemed all that impressive when it came down to building our products and business.

There’s only so many hours in a day—-are you using them to build software or your personal brand?

One engineering leader I worked with lost out in a corporate power struggle and left to “spend more time with his family.” While he was doing that, he produced a series of blog posts. Once he found his next role the posts petered out. This strikes me as a publication model for actual high performers.


It’s tricky to balance Doing Stuff and Broadcasting Stuff. You need both. If you spend 100% of your time heads down, even delivering huge value, nobody will notice and your career will suffer. If you spend 100% of your time bloviating, you’ll never build anything of value. As so often in life, I find the best solution is to team up. Partnerships between an 80 build / 20 tell and a 20 Build / 80 Tell change the world. Woz + Jobs is just the best known example. The book Rocket Fuel calls these “Integrators” and “Visionaries” https://www.eosworldwide.com/rocket-fuel-book


Unless you are building tdd software, you don’t need anyone telling the world your team’s thoughts on tdd. It’s sold as part the recruiting effort, but I’m dubious as to the ROI.


Viaweb/Yahoo! Shopling co-inventor and HN host and inventor Paul Graham wrote a few books On LISP.

OTOH, even Graham would say that his partner Robert Morris was the technical leader of the pair.


The broader point seemed to be that tests should be treated like an investment that ought to pay off and you shouldn't do it if it doesn't.

I think the author didnt make this point particularly well but I agree with it and I think it does apply equally to building both space rockets and mom and pops.

It's also something most people don't really do.


I think that’s a good point.

I often run into folks that tell me their software is perfect,” because their unit tests have 100% code coverage.

As I tend to write a lot of GUI stuff, unit tests are not that useful. I wrote an essay about how I tend to use test harnesses, over unit tests[0].

[0] https://littlegreenviper.com/miscellany/testing-harness-vs-u...


I think everyone has something in common though - they all think testing is a waste of time until all the sudden things start not working at all and it isn't!


I don't find this attitude to be common. People always agree that testing has value at least in theory. The wrongheaded attitudes I do see a lot are:

1) Testing is a moral imperative rather than an economic investment that should be paying dividends (I think this is what the article is getting at)

2) Unit test all the things.

3) Your tests should follow a shape (like a pyramid)

4) TDD doesn't really work (for my circumstances)

5) "I don't have time to write tests."

6) Test speed is paramount

7) Flakiness is not a bug to be fixed, flakiness is inevitable


I would consider myself a TDD type of dude, but not in a dogmatic way because to be honest I find the intent and execution to be self evident, but have never really looked into doing it “properly”.

For me, the #1 reason I lean so heavily on tests is that the software I write rarely behaves exactly as I expect and intend. And the beyond that, I will not remember how a piece of code works 6 months or a year from now.

Having tests is an easy way to say “ah, this is how that actually works”. Which is relevant whether you’re trying to identify why an edge case is behaving so weird, or because you forgot or never knew how some bit of code worked.

As a side benefit, I work with some renegades whose bugs are only stoped because CI begs them not to merge their code, which has introduced a sea of red failures. If we didn’t have tests, we’d find out about these issues in a much different way.


I'm anti-dogma for everything in software, but I have been using TDD more than 90% of the time for the last 3 years. I think this is rare, and it works mostly because I don't do it the orthodox way most of the time.

I think orthodox unit test driven development works and works really well about 15-20% of the time - under scenarios where you have stateless code, a simple API with simple inputs and simple outputs and some complex logic underneath it. If you're writing something like a parser, this might be all you need.

I use a more unorthodox version of TDD the other 70-80% of the time, constructing what most other people would regard as a combination of a snapshot test and integration test. It's still TDD, but the test might be something like "load web page, enter text in textbox, click button, fix snapshot of the outcome, faking the REST API that gets called". I find this practice to be vanishingly rare, but I also find it pays large dividends.

The orthodox way to do that other 70-80% of code is to create a mess of mock objects. I don't think the people who came up with TDD necessarily intended this (with the exception of uncle bob), but I think it's responsible for most of why it doesn't work for most people most of the time.


which should be an experience almost any software developer makes within one year of developing software


I've worked in start ups where they don't have automated tests.

The code quality is not as good. But that imo is not a symptom of lack of tests. It's a symptom of being a start up and the desire to move fast.

As for mysterious error or bugs that occur the rate at which we get them in the code is no larger than code with tests.

It turns out that manual testing and static checking is mostly enough.

I think for most stuff testing is an illusion. It's one of those faith based mantras developers follow without any basis in science. There are a few applications where I feel automated tests are required but they are not the majority of software projects.


I've worked in plenty too. They start out faster. Delivery and quality both decrease almost imperceptibly for a long period. When somebody who understood how everything was built leaves because they got offered 20% more elsewhere, delivery and quality both plunge quite a lot and the decline accelerates.

It's incredibly unusual for them to recover delivery speed and quality after not doing automated testing for so long. It's incredibly hard to "bolt on" testing afterwards - both because the culture and context isn't there but also because the tooling and infrastructure can't be built up in a week. Most companies in this state look for quick fixes like:

* Elaborate branching strategies

* Entirely different versions of the software for different clients (because you're too afraid to upgrade them all).

* Ever more infrequent releases.

The next step is technical bankruptcy. That's when your devs start whingeing about wanting to do a full rewrite. That's usually the point where you've probably ended up losing money overall by dodging tests.

It can work without tests you manage to hit product/market fit before the decline sets in, but I find that companies in this situation often tend to struggle developing and will often stagnate. 1/50 might tap into some undiscovered new market opportunity and hit it out of the park either way but it's rare.


> It's a symptom of being a start up and the desire to move fast.

My take is that you can only move fast when you are not in a hurry. If you are in a hurry, you need to work slower, or things will be messed up with no time to fix it.

Thus, usually, there is no need to move fast, since if you can, you are not in a hurry anyway.


To me, TDD is about software architecture and design, not testing.

Bob Martin said it best - TDD forces you to create a system that's testable. You're not going to get to the end and scratch your head and wonder how are we ever going to test this thing? That's not gonna happen.

What TDD isn't about, or at least shouldn't be anyway, is testing every little last minute detail. Like Kent Beck said, if there are classes of errors I simply don't make then why am I creating a test to see if I made that error?

That's why I coined the term Maintenance Driven Development (https://taylodl.wordpress.com/2012/07/21/maintenance-driven-... - wow, it's been 12 years now!). Test for the types of mistakes you and your team typically make. Make your tests productive. When bugs arise then create a test to re-create the bug, and then fix it. Your test will prove you've fixed the bug. Overtime your test cases will grow and they'll be concentrated around the areas in your software where you're having actual problems. This is what makes testing effective.


The problem with automated testing is that the people who would best benefit from having a very detailed and comprehensive test suite to verify their work are precisely the same people who cannot write that test suite effectively.

Someone who half asses the business code is absolutely going to less than half ass the test code, and now you have no real verification that what they did was right.... And now likely have some shitty tests to take care of too!


There are good and bad tests.... ;) Good = actually tests something, precisely, properly, doesn't take an age to run. Enough of these in CI, and a deploy is far less likely to break prod. Bad = author didn't really know what they were trying to test, mocks fall out of date with reality (if they ever reflected it in the first place), other problems such as repetitive code and anything which tempts people to short circuit rather than keeping tests up to date, e:g tests that take far longer to write than the code they test. Shout out for pytest-cases as a library that can reduce repetition and save time. I've been in this game long enough to remember the days before massive automated test suites. Quality software did indeed get shipped in those days too. It took longer, with less regular releases. It took an army of skilled QA people with thorough test plans. You can do good or bad testing whether its today's TDD or yesterday's manual QA.


TDD was born when people moved from proper typed languages to javascript, where compiler doesn't check anything and a test suite compensates for that. Some MBA educated managers probably believed 100% coverage = no bugs. Real 100% coverage means you make sure every possible input produces an expected output. Which is impossible to do, except for the simplest systems.


> First, my credentials. More than half of all the code I wrote in my life is test code. My name is attached to hundreds of pages of TDD.

Am I supposed to trust you more or less from this introduction?

> But neither codebase has tests, for example, that take a long-ass time to run.

Is this for real? Any hint of tongue-in-cheek? The two projects have tests. But they don’t have long-ass-running tests? It sounds like they might test the core data structures and core logic. But maybe they don’t “integrate” by setting up and tearing down all application state and friends. So maybe what is ostensibly missing are those integration tests to complement the lean core-logic tests.

And it’s surprising that the heavy integration tests leave little return? Why? They are long-ass-running, they are hard and messy to set up, and in the worst case they just work as a smoke test, which you can do manually once in a while anyway.

Losing faith on testing for what? They already do testing.


I think most people believe in tests based off of anecdotal data and gut feelings.

This post is saying that it is possible that much of those anecdotes and gut feelings may be erroneous. Similar to how believers of religion show faith in belief despite clear lack of evidence.

All humans are capable for falling for faith-based tropes... as shown by basically almost every variation of religion in the world. Its quite possible that if we have some way to measure the quantitative cost of tests vs. No-tests that those results would form a definitive basis against testing.

Think about it. Like a believer your automatic reaction may be to counter this argument. But think, does your belief in testing have any basis in science at all? Likely not.


There was this sailor who always did extra knots and shit on his ropes and lines. Because he wanted it to be extra safe.

Then in a storm the ship sank because he couldn't undo that shit quickly enough.

I guess you can tank your software project too by too much or wrong testing.


I don't do TDD. Tests help me write code. I write the code, iterate on it, then write tests to confirm the assumption I made in the code. The act of writing then uncovers misunderstandings I had, careless bugs etc. When the tests are complete, they protect my code from someone else changing the code such that the things I wanted the code to do no longer happen.

Is every test I've ever written useful? No, absolutely not. But I don't have time/am not best placed to determine which would be useful at the point of writing, so I write them all.


In the team I lead, we haven't really shipped tests in over a year.

We've also been in discovery and proof-of-concept phase, and where I do agree with the author is it's not worth it to spend tireless hours on having green pipelines at this stage.

However we are now starting to put our platform towards a what will become a production state, and it's clear that without some level of testing, the confidence of stability is too low.

(Thankfully our codebase is lightweight and mostly integration between systems, with some components for a third-party tool)


Write the tests you need to sleep at night and avoid you and others you care about burning out when everything starts falling apart and even the smallest changes inspire dread.


> Enter Ghostty and Zed.

You're taking two desktop applications, written in Zig and Rust respectively, and extrapolating their lack of testing rigor as evidence that "there’s no correlation between software quality and tests"? Really?

Have you tried maintaining quality in enterprise software without tests? If you ever manage to do so, I'd be very interested to read about it.


Hey, author here. Yeah, I worked at Sourcegraph and I do think we built some high-quality stuff and we did write tests. I also think a lot of them were necessary. But, like I wrote here, I think that maybe we/I sometimes overdid it with tests and I'm not so sure about the use of /some/ of them anymore.


There was an interesting discussion recently on (somewhat) opposite perspective https://antithesis.com/blog/is_something_bugging_you/ - how good testing systems makes everything way faster


> people who give a damn.

That’s the money shot, in my own opinion.

I write a lot of tests, but I am also constantly testing my work, even after shipping (in my latest app, it has been out since January, and is already at 1.2.0, mostly due to fixes for corner cases and UI improvements).

I don’t think there’s any substitute for Giving A Damn.


This blog post makes a valid point. The problem is testing is a religion. People believe in it via faith. Also like religion, such faith is almost impossible to fully remove. Not without hard evidence.

The actual way forward is data. Data driven metrics that can show the cost of tests vs. No tests.


I don't know. When working in an unfamiliar codebase, tests are very helpful (even when some percentage of them will always be annoying). The problem is, the industry has focused on having lots of tests instead of having good code.


One problem I don't see talked about enough - to the point where I don't even have a name for it - goes something like this,

You have a giant, long lived system, like decade plus. You've got huge number of both unit and integration tests. You're cruising along, implementing a new feature. Everything's looking good from your local manual tests and the tests you wrote over the new behavior. Then you ship the branch up to the test runner to let the full suite cook, and you get it back, and there's like 9 random test fails that aren't in master.

Has your change had unintended consequences? Is the test just out of date? If the test is out of date, how do we update it such that it both passes and we are at least reasonably confident that the test is still actually testing what it was intended to test? Where it gets real obnoxious is when the test isn't testing anything even vaguely related to your change so it can turn into a real snipe hunt. Five minutes here, ten minutes there, it adds up. What's frustrating is that the number of times this has found an actual issue that I can personally remember.. I won't swear it's zero, but I'm struggling to recall a specific example.


I guess it all depends on the value of developer time vs the cost of releasing with an undiscovered bug.

In any case, I'm glad the industry is starting to realize that tests are code, code can have bugs and code can be stupid.


What’s kinda funny is that there have been a few where we’ve had old code tests written sort of in the form of “try this thing which this system should reject as impossible” and then you realize that after your change that in fact the business rules actually handle this corner case gracefully instead of exposing…

I’ll admit this trade off is probably helped by our clients being mostly state govt, so they’re often grateful to get anything at all functional.

One never wants to ship bugs, we’ve actually had plenty of positive client interaction along the lines of “Hey, we’re just impressed you caught it and are the process of pushing a fix before we even noticed”.


This is a nitpick' but I wish the phrase "business logic" should go away. Are we not just testing logic? It makes all software sound like a tax preparation program.


I believe that the feeling when tests should be added to asset a certain feature comes only with experience.

You just what logic is not trivial enough and where is makes sense to test it.


I have more faith in 'fail fast/early'.

It's impossible to test everything. But if you fail fast at least you know something is broken.


Our paying customers don't appreciate bugs. A layer of testing _is_ fail fast/early because it prevents an issue before it gets to the customer.

One gig I had: several times per week the site would go down and nobody could log in. People were just pushing code and yolo'ing into prod. Paying customers were churning. The solution? A single cypress test on a canary node that logged in and edited one setting. 2 or 3 minutes of barrier to production and now the site doesn't go down periodically with deploys.


I am not saying you should not test. To me it seems obvious that you test what you create. But it is impossible to test everything.

When you fail fast you are quicker to notice it yourself. And when it was deployed without you nothing the bug it will still surface very early because it breaks your app.

Broken gets fixed, shitty lasts forever.


> Broken gets fixed, shitty lasts forever.

Love it. More succinct than "devs need to feel the pain to fix it."


Rich Hickey once said something in the midst of my “TypeScript, Haskell, la la la la la” phase:

> I like to ask this question, what's true of every bug ever found in the field? (It got written?) Pff. It got written, yes. What's a more interesting fact? (pause) It passed the type checker! What else did it do? (the tests?) It passed all the tests. So now what do you do?

The basic problem is that we really want tests to somehow specify the contract of the software, but when you are writing “given, when, then” the “givens” pin down too much of how it is done and the “when/then” is scoped to distinct sub-contracts of different parts so it pins down how the work was broken-down...

It hits like a syllogism, right? All tests are software, all software has scope creep, scope creep in testing is contract creep.


Bugs all passed the tests sounds like survior bias. Like how all the fighter planes that returned with damage had damage to the wings. Point is, damage anywhere else was fatal, the takeaway should not have been to armor the wings more, which was the first thing which was tried. Likewise thinking about bugs this way discounts potential bugs that _were_ stopped by tests.


It's not like the planes that have returned. A bug reported in the field could well have been a crash. A costly, all-hands-on-deck incident kind of crash. And that bug still passed your own tests, compiled well, went into the production.

Occam's razor tells me that it's more indicative of test harness being not comprehensive enough. Fix it to reproduce the bug/crash, then fix the code, then ship the fix, then tentatively pat yourself on the back for potential bugs that were stopped by the test. Until the next crash.


Oh, I agree with that. I thought you were arguing that tests were less valuable because all production bugs passed the tests.


tl;dr; be careful with your testing strategy, it might break your company. There's some aspects that stand to gain from automated testing, it's important that developers test their work, but in my opinion manual testing is super valuable especially for an early stage start-up.

I was working for a startup that first had the strategy of testing everything, at a point when we didn’t even start working on the product. We spent a horrible ammount of time getting the test system up and running (microservices, browser add-ons), with the UI testing being the most challenging.

But then they also wanted to "move fast and break things", which we did, so we spent an annoying ammount of time fixing breaking tests. This is when the “let’s just delete the test” expression started popping up.

So we ended up with an unmaintained testing system that no one cared about anymore, keeping our build on red because we always had the testing system in the backlog so it was going to be done at one point.

Then, the quality of the codebase started degrading to the point where we’d wake up with features that have been broken for a while and no one noticed it, even though each individual developer was testing their own flows. We had no one / nothing testing the entire system.

At this point, I suggested we hire 1 – 2 manual testers, as our testing strategies are obviously failing, the product is suffering in terms of quality and we could get them for relatively cheap compared to dev-time. I’ve had great success working with manual testers for very complex products with real world repercussions, compared to this tiny start up in dev tooling.

They refused. So they decided we’d do cross functional testing and then test the entire system whenever we’d do a merge. So developer velocity fell from a cliff because we became the manual testers. We still had no customers at this point. Runway got shorter and shorter. And the start-up became a statistic.


The counterfactual may have been the same. There is a cost to having tons of tests too.


> In both codebases I’ve merged PRs without any tests and frequently see others do the same.

I would never accept such PR. This kind of policies then end up biting you in the long run (experience talking)


>> Maybe the tests are only a symptom. A symptom of something else that causes the quality.

Love this comment!


Not convinced at all.

Clearly a quick and poorly written article.

Short and not elaborated.

Clickbait.


> was 100% sure that I know how the code works and that this can’t happen again.

No, no, no and nope.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: