- Prefer real objects, or fakes over mocks. It will make your tests usually more robust.
- Use mocks when you must: to avoid networking, or other flaky things such as storage.
- Use mocks for “output only objects”, for example listeners, or when verifying the output for some logging. (But, prefer a good fake)
- Use mocks when you “need to get shit done”, it’s the easiest way to add tests in an area that has almost none, and the code is not designed to be easily testable. But remember this is tech debt, and try to migrate towards real objects over time.
That’s my short advice I told many times. So might as well comment with it here.
I think the original ideas of mocks (if you go and read Growing Object-Oriented Software, Guided by Tests) had some merit: In that style of TDD, mocks are used to discover (hopefully somewhat stable) interfaces between components, and in theory, it fits with the idea that OOP is about "objects sending messages to each other". I can believe that it's possible to write good systems with this kind of approach.
Unfortunately, in practice, mocks are rarely used like that and most "OOP" designs have horrible boundaries and are really not much about message passing anymore. That leads to brittle mocks where you constantly have to change tests when you change implementation details.
I have also gravitated away from classic OOP and much more towards the "functional core, imperative shell" concept as outlined in the article (although it's difficult to keep this pattern throughout a codebase, especially if you have team members). In such a system you really rarely need mocks.
Agreed that fakes, when you have them, are nicer than mocks, especially when the system to be faked has a large API (i.e. use a redis fake, instead of checking the exact commands you send to redis).
However, for some outside systems, writing a fake can be a lot of effort. In such a case, I think it's totally valid to write a "gateway class" that isn't unit tested (you can cover it with integration tests instead) which exposes a nice API (e.g. "storeFile(...)") and then to use mocks of that class in other tests.
In test cases where you extensively involve mocks, more often than not in my experience, you end up testing that your mocks do the thing you told them to.
Yeah, particularly the case with a lot of "glue" type code that's really just passing stuff back and forth and not really making any decisions with it. I've always struggled to feel that mock-based testing in this scenario was anything other than busy-work.
The proper metric is tests/feature - most test code bases are so bad you can't even pick out what feature is being tested, due to a blind adherence to 'code coverage'
It is the equivalent of spot checking a road 100 times for cracks but forgetting to make sure you put up stop signs at the intersections - I honestly believe if you are a tech lead or boss advocating this you should be fired on the spot for encouraging dangerous and malignant practices...
most of the bugs are in the joints of the system, not in the components
it's much easier to write modules that are internally consistent, much harder to be globally consistent across modules. mocks ensure you only test for internal consistency
Wouldn't the integration (or global consistency as you call it) of objects be exactly what was worth testing in that's the hard part?
Generally objects are simple enough that I can reason about them in my head. That's the whole point of encapsulating the state after all. That means tests are really less critical, since a thorough inspection should do. Between components it's MUCH harder to get any sort of coverage in your internal model, so tests that can be repeated become more useful.
I want the tests to fail if the software doesn't work. Not if some object doesn't do what it says it will in a way that doesn't matter to the system.
The hard part is exactly what I want tests to cover.
Mocks don't test the joints, which you have just suggested are where the bugs are.
Fakes get you a lot further without needing more comprehensive integration testing.
Fakes simulate dependencies - so you can test the joints - but can be tested themselves, and most importantly can have conformance tests that validate that the fake acts like what it is faking.
Fakes also make integration testing easier too - if you want to focus on say, making sure your payments always make it into the database correctly, integration testing across the medley becomes (relatively) trivial, even automatable.
I could mock all day but it would not be any help since the api's we use are brittle its much more efficient to fail in a graceful way where you can recover than plastering the codebase with mocks of happy paths
I have not found storage to be flaky and so I don't mock it. Tmpfile always gives me a unique file, and that is all the fake I need. I don't even look up the various forms of temp file to see which ones don't have a race condition as in practice they never do (if I was writing encryption or other such production code I would, but for a unit test the odds of. A race causing a failure are low enough to ignore)
At minimum, you need to be able to choose where the files are stored to use temp,
I find thtre is still the problem of complicated setup. Also, if you are going for particular semaitics (e.g. cross process interactions), isolating that for testing it more specifically can be helpful.
For me, I look for higher level abstractions and mock when possible and don't sweat testing against temp files otherwise. I've had several TDD people jump to wanting to mock each filesystem call. One was for a cross process storage API. I was trying to get them to just have a Backend interface for reading and writing instead, with tests just using an InMemory implementation as a Double.
We're running business critical system for several years now, creating simulators has been one of our "secrets" that contributed to the success.
Not only our tests are running in as close to production environment as possible, we're also using them for local development where developers can spin up functioning system on own dev machine.
We spin up a temporary and local MySQL instance per test run with helpers to generate necessary data on-demand. All our tests therefore use real queries on a real db. Due to speed constraints, this db is shared between all the tests you run in that session, so it’s possible to influence tests after yours runs in CI. The reality is that that is pretty easy to detect and it’s caused us to write less brittle code.
We're using mssql, on bootstrap we're running all migrations, seeding database, then backing it up; each test file restores its own database from backup which is much faster than alternatives; e2e tests run against simulated environment that has single database; e2e are split into concurrent runs with distinct set of tests to speed things up.
Sqlite is a great simulation for SQL. You need to limit your SQL to the subset that us supported by both it and your target database though, which might be a problem.
In a world of Docker existing there’s rarely a reason not to just use your target database - I can have a non-production grade Postgres instance up and running in less time than it took me to write this message.
It means external services your project depends on in production should:
1. be used directly if possible/practical, if not...
2. ...then it's likely better to write simulators for them as opposed to mocking individual methods on the client in individual tests
There are several factors to consider. External services need to be able to run in isolated, temporary environments with short bootstrap time – if that's not possible, it's better to write simulator which provides this functionality. Services must provide determinism, if they don't, simulator should probably be written with control apis to provide this functionality etc.
In general the idea can be summarized as opting for using the highest level of functionality available so tests capture as wide code surface (used in production) as possible.
It complements, not replaces, lower level methods – ie. it still makes perfect sense to structure code as composition of pure functions with unit tests. It still makes sense to rely on static analysis and do not test in unit tests what is guaranteed by static type system etc.
The side effect of simulators means you can bootstrap your project locally in lightweight fashion for development - with all simulated functionality available.
For example if you're working on trading application, instead of mocking at the low level prices, order creation calls etc. it's better to write exchange simulator and use it instead – where full lifecycle of an order will work as expected – in tests and locally, when developing application.
It all sounds great. I agree totally in principle! I am finding that testing my fairly small Go project (static site generator, because the world definitely needs another one) chews up massive amounts of time. So I tend to avoid the testing pass for longer than I should. Any thoughts on that issue?
It isn't immediately obvious to me why a small Go static site generator would require "massive amounts of time" to run its tests, so it's hard to answer what you're doing wrong.
Are your tests perhaps just too darned big? You don't, in general, need to render 5000 pages of something to test your template doesn't crash or something.
It could also just be disk access. Consider trying an in-memory file system, or if you're on linux, look at using /dev/shm which is a RAM disk.
It is also possible you've snuck in a quadratic or worse time algorithm. There's nothing fundamental to the problem of a static site generator that would require such algorithms, but speaking from experience it is an environment where it's easy to loop over the return value of one function, which itself loops over something other function (quite likely the same data), which itself loops over the same structure, and it's easy to end up with O(n^3) or O(n^4) without realizing it. It's especially easy to end up with that being a "for each page" type of loop. Static site generation should be O(n), give or take small factors (maybe O(n log n) for some things technically, but at a scale where O(n log n) is practically O(n) anyhow).
Make it CI’s problem. You should only really be running tests regularly for the component you’re currently working on with CI making sure you don’t have accidental regressions.
if you don't mind spending the effort, have a quick profile to see where the tests are taking a long time - is it initializing the real components that could be mocked?
Or are the tests themselves taking a long time because of other factors, such as IO etc?
For IO, may be there should be an abstraction over these IO api and you use an inmemory option instead for testing.
Of io is the problem I general solve that by testing smaller data sets. I have not found io to the local disk is slow. Of course if it is network io that is bad, but local disks are fine for the size of data in my tests
I try to write the test first, or at least stub in the test how I think it should work. Or at least enough notes to pick it up "tomorrow". Then I have a reference for usage i can build with that in mind
As a potential counter-argument, the use of mocks can enable testing of functionality that the current concrete implementation doesn't exercise. It's easier than one would think to accidentally rely on implementation details rather than coding just to the interface (and optionally any documented restrictions to that interface).
They explicitly call out clocks as a source of non-determinism that probably should be mocked, but I'll re-use them as an example anyway because everyone is familiar with them: it's extraordinarily useful for the tests to execute nearly immediately rather than actually waiting on a clock, and rare behavior like a clock running backward, two consecutive timings being identical within the clock's resolution, or whatever other weird artifacts that your code should handle are definitely better explicitly tested rather than not mocking the clock. Other domain-specific interfaces are often similarly able to exhibit a weird edge case that ought to be explicitly tested (rather than accidentally relying on a "nice" implementation) if you really want to unit test the callers and not integration test the coupled system.
I think you have missed the point of fakes versus mocks (as this article puts it) based on what you said - fakes should be able to do everything mocks can, more, even.
But I hear you complain "but that's so much work to maintain!!" - mocks are more work to test less, if you need a better fake often, write a library which has them. What you are referring to is known as fuzzing, not unit testing, and fuzzing has a better track record than straight up "testing" generally for finding issues. Fakes can have deterministic or indeterminate behavior, the case the OP makes is for deterministic "unit" testing with more real behaviors than not. This doesn't preclude good fuzzing...
"It's easier than one would think to accidentally rely on implementation details"
To be somewhat philosophical, what if that 'implementation detail' is the feature? No amount of testing will save you from incompatible feature sets, to 'enable testing of functionality that the current concrete implementation doesn't exercise' you run the very real risk of baking in incompatible sets of features into the core of your app. You shouldn't be designing your app at the unit level...
It is different. You tell the mock to give you an specific error with specific information, while you tell the fake to give you the error for an specific condition.
It is subtle and may not be worth the time, but again, it may be.
The heart of the argument presented is that using mocks in unit tests is problematic because if an interface is changed, that will possibly break every test that involves mocking that interface and that's friction to making changes. This is silly. If you use real implementations and change the interface, you have the exact same problem except that your configuration/setup is also likely to have to change in subtle ways.
Let's just assume the absurd notion that a choice of creating tech debt or changing a common interface is a real choice. If there was a serious change that could not be accounted for with easy test changes, I'm not the only one to see the tests commented out with a "TODO: fix these". Developers tend to be pragmatic.
If you want to change code you will always measure the effect it will have on the code and tests are incidental, not the primary concern. Make it work. Make it right. Make it fast. I want to be able to trigger all code paths and exceptions (make it good). Using a real dependency, I would be left in the unfortunate situation of depending on knowledge of the internals of that dependency. It may not allow me to execute specific paths via pure configuration at all.
I don't think using real dependencies is a good idea at all for unit tests. Integration tests are a different and I do fear that they are being confused.
Write as much code as you can that has no dependencies. Unit test that code exhaustively. Fake all inputs that don't contain behavior. Mock all interactions that do. Then write functional tests that check that the glue and state management actually work with the real things.
The plumber is still going to run water and check for leaks before they leave, no matter how many certifications the copper piping came with. But that's only at the end of a long process of work and inspections.
Nothing pisses me off like finding a suite of tests that has fakes with logic in them. By the time I find them, the fakes are longer than the tests. Often the commit history shows that this accumulated by accretion, and nobody ever pulled the emergency stop lever. Other times it's people who are wrong-headed about what problems tests are trying to solve (coverage chasers are but one category).
> The heart of the argument presented is that using mocks in unit tests is problematic (...) If you use real implementations
No. As you say, the moment you have a real out-of-process dependency, you no longer have a unit test, but an "integration-with-external world" test.
The heart of the argument is to not mock out internal (in-process) business logic. Instead:
a) do not use mocks (e.g. leveraging Moq); use fakes (aka "simulators"), i.e. proper classes having in-memory implementations of the out-of-process dependencies and crucially
b) replace only out-of-process dependencies, which usually are flaky/nondeterministic. Never replace internal business logic.
If you feel you need to replace internal business logic, you are not following the functional core architectural pattern. After you refactor to functional core you will no longer need to replace any dependencies in your unit test, as there won't be any - you will just call your tested pure method and make assertions on the returned value.
Special case here is testing "top-level/controller" logic. Here you need to use the in-memory fakes, but there will be only few, reused across all tests, and such test will be an "end-to-end internal-business-logic integration test", but still it will have all the properties of a unit test - it will be fast, deterministic, and you will be able to run it as part of a unit test suite out-of-the-box, with no environment setup necessary.
> If you want to change code you will always measure the effect it will have on the code and tests are incidental, not the primary concern
Exactly. Mocks are very brittle and break all the time, causing unnecessary rework. If you instead rely on a small set of fakes of out-of-process dependencies, you will drastically reduce test suite rework and improve signal-to-noise ratio.
> Using a real dependency, I would be left in the unfortunate situation of depending on knowledge of the internals of that dependency.
And what happens if the real dependency changes behavior, but you forget to update your mock? You will have a test that will be running against a mock that simulates obsolete behavior, no longer present in production. In worst case scenario the test will be green/passing, while there is a bug in production. Will you remember to always comb over your entire test suite and review all mocks to faithfully simulate actual production behavior?
> It may not allow me to execute specific paths via pure configuration at all.
If you write fakes, you will have full control over how to configure them.
> Will you remember to always comb over your entire test suite and review all mocks to faithfully simulate actual production behavior?
This doesn’t make sense. If I write unit tests for my ‘FunctionExecutor’, and it has a dependency with a function ‘shouldExecute’, I don’t care how that function is implemented, only that it returns a boolean. If the implementation of my ‘shouldExecute’ function changes, there is no need to update any other test as long as it still returns a boolean.
I do agree that what you call fakes are better, because you get the extra guarantee of them implementing the same interface (e.g. your compiler will scream bloody murder if you don’t update both).
> I don’t care how that function is implemented, only that it returns a boolean
Let's take your example of "shouldExecute". I assume your unit test operates on some inputs (with the values provided inside the unit test itself, naturally), and "shouldExecute" has potentially some nontrivial logic in it. Say, it reads value of some environment variable and if it is right, returns "true".
Now there are two possibilities:
a) your test inputs are made up. For example, you never set the environment variable value, and coerce "shouldExecute" to always return "true" anyway. The problem with such a test is that it is fiction, not a test of an actual production behavior. Sure, you will test what would happen if the logic would determine it should execute given no environment variable is set, but this will never happen in production. In production lack of environment variable would result in "shouldExecute" returning "false" and you should test for *that*. So you do care about the details of "shouldExecute", because you need to be aware that it returns "true" only if appropriate environment variable is set. And if you don't care if the "shouldExecute" returns true or false, then why do you call it in the first place? What are you even testing? I hope your "shouldExecute" doesn't have any side effects you depend on, and I do hope you do not use code coverage as a goal unto itself.
Plus, such a test cannot be used as "executable documentation", because the inputs are simplified to the point of irrelevance, and cannot help in understanding actual program behavior.
b) your test inputs are realistic, and reflect actual production behavior. This means you will have properly set up the environment variable value so that "shouldExecute" returns true. With that, the test is similar to actual production behavior, has good bug-catching ability, and can serve as executable specification. But here again you will have to worry about "shouldExecute" implementation.
---
Let me offer you another example. Imagine we want to test:
Compile(SourceCode sourceCode)
{
ValidateSyntax(sourceCode);
/* complex logic post-processing the sourceCode here */
}
I could say here "I am testing Compile, and I do not care if ValidateSyntax throws an exception; I will just coerce it to return without throwing an exception".
And then I can write a test that takes as input some simple sourceCode like "blablabla" and claim I have somehow tested the "/* complex logic post-processing the sourceCode here */". But this is silly, in reality such fake input would never survive the validation, and thus testing what happens after is just waste of time. Hence I need to ensure sourceCode passes the *actual* validation, and hence I need to understand the logic inside the "ValidateSyntax" method.
It gets worse. Now imagine we have thoroughly tested the complex logic while using mocked out ValidateSyntax, but now the syntax has changed and thus the validate method behaves differently. If I have a mock of the "ValidateSyntax" method I might still feel good - the test is green, the coverage is still high. Except it is all I lie. I run "Compile" on some production data and it blows up. Why? Because the test was working against a "ValidateSyntax" mock that was mocking the result of validating obsolete syntax, which, now, with the changed behavior, would actually not pass validation and blow up. Basically my test was telling my "Compile" method works IF I assume syntax is still the obsolete one, but it doesn't tell me how my code behaves with the current syntax.
So every time program behavior changes, I need to go through all my mocks that were duplicating that behavior, and see if they are still faithfully reflecting it. Otherwise I risk ending up with a green test suite that tests nonexistent, impossible program executions.
I’m not quite sure what you are trying to say here.
In both of the examples you give you seem to be assuming that ‘not caring about the functionality of function X in function Y’ means ‘not caring about function X at all’.
This is untrue. You should test both.
If you want shouldExecute to have an env variable, you have a separate test for that, one for both the positive and negative scenario.
In the same sense you have a separate test for Compile and Validate, but in the Compile test you may not care to do the validation. At the very least you should have a test for Validate separate from Compile.
Naturally, I agree that the shouldExecute / ValidateSyntax methods should be tested too, but this is not the whole story.
Let me clarify my point with a more precise example.
Let's say we want to test this:
ParseUrlDomainAndPath(string url) { string validatedUrl = Validate(url); (domain, path) = /* inline logic to extract domain and path from the validatedUrl */; return (domain, path) }
Now I have few options to unit test it:
1. I could mock out "Validate" to return the "url" passed as input, call in the test ParseUrlDomainAndPath("___:||testDomain|testPath"), and assert it returns (testDomain, testPath). Such test might even pass, if the inline logic for extracting domain and path is not too fussy about the delimiters. In such case I will end up with a test telling me "if you call ParseUrlDomainAndPath with URL that has | instead of / and some weird schema of ___, it will successfully return". This is a lie. If you call it in production, it will fail validation. So the test gives you false impression on how the system behaves. You are testing how parsing of domain and path works on URL that has | instead of /, but this won't happen in production. Thus, you are testing made-up behavior. Waste of time.
2. As 1., but instead I could mock out Validate to return whatever validatedUrl I want, completely disregarding url. In such case, what is the point of even having Validate involved in the test here? Instead, let's refactor to functional core - let's take the inline logic, capture it in a method called "ExtractDomainAndPathFromUrl(string validatedUrl)" and pass validatedUrl to it directly. No need to deal with Validate at all, no need to mock anything, no need to fixup any broken mocks. Great!
3. As 1., but as input I pass "foo". The mocked out Validate returns "foo", and we are now trying to extract domain and path from it. This will either throw an exception or return garbage. So our test now has failed. But we don't care about this failure, at all. In actual production behavior the domain and path extraction logic would never even execute, because Validate would fail beforehand. So here we have reverse situation to 1.: In 1. we have a test telling us production will work while in reality it won't (as Validate will throw), and here we "found a bug" (the test fails) that doesn't matter as it is impossible to happen in production. Again, waste of time.
So with 1. and 3. being waste of time, the only option that is left is 2. You have one test that a) checks that Validate behaves correctly and b) checks that ExtractDomainAndPath behaves correctly and c) checks that both of these methods collaborate with each other correctly (aka "mini integration test").
You could now argue that this is wrong, I should have one test for Validate, one test for ExtractDomainAndPath and one test for ParseUrlDomainAndPath that mocks out Validate and mocks out ExtactDomainAndPath. But let me ask: why? In such case you lose the benefit of having "mini-integration" test. When you test ParseUrlDomainAndPath with everything mocked out you test an empty husk of logic, only if the calls are made in proper sequence. You cannot even really assert anything meaningful! (aka "mockery" anti-pattern). You end up with 3 tests instead of 1 and cr*pton of unreadable, brittle mock logic. And chances are, the one test of ParseUrlDomainAndPath that doesn't use any mocks of internal business logic, will already cover significant parts of ValidateUrl and ExtactDomainAndPath, and so will reduce the need of additional "corner case" tests testing these methods directly. Having 1 test with proper in-process dependencies instead of having mocks and 3 tests is just win all over he place: less testing logic, better ability to catch bugs (due to bonus mini-integration testing and realistic data), executable specification aiding in program comprehension, less brittle tests, no misleading green tests, and no made-up, irrelevant failing tests.
There is one more, very important benefit: you can step such test with a debugger and see how all the components collaborate with each other, on real data. But if you use mocks and fake oversimplified data, you get very shallow slices of code and cannot reason about anything relevant.
What you are testing is that results of shouldExecute() are followed.
This is verified either that your passed in mock function also got called with the right parameters, or that the return value shows that the function was executed.
If your shouldExecute function is bad, it's tests should fail, not some other file
In other words, shouldn't we prefer integration testing, or perhaps partial integration testing. This requires an initial effort of setting up your test framework/environment, but in my experience integration tests provide good value for the time you put into testing.
Rather than a granular test on a single class, test the orchestration of many classes. You end up hitting big % of code. As always, depends on the project. If we're building a rocket ship, you need both granular testing and coarse testing.
This is exactly how I feel as well! Units in isolation certainly deserve testing, but the actual long-lived value comes from testing the public interface of the module/program/etc. In one compiler project I maintain, I stopped writing unit tests years ago in favor of just feeding it input like users would and asserting on the outputs. It gives me both great freedom to refactor quickly, and great confidence that I'm not regressing.
As you say, depends on the project. What I described above is entirely free of side effects - I wouldn't dream of testing a web service this way.
Of course integration is never in conflict with unit testing - they're different and can happily coexist.
People keep saying this all the time, but apart from the fact that nobody can agree on what an "integration test" is (because there's almost always some part of the application flow that you're stubbing out), it just becomes immediately apparent in a code base of sufficient size that "just use integration tests for everything" is only possible if you severely under-test (which usually includes thing like not properly testing for error conditions etc.).
What? Nobody is advocating for integration tests to the exclusion of unit tests.
but apart from the fact that nobody can agree on what an "integration test" is
Not being precise doesn't invalidate a guideline. The ideas that "stubbing less is better" and "testing functionality end to end is good bang-for-buck" aren't crappy because people don't agree on the details.
it just becomes immediately apparent in a code base of sufficient size that "just use integration tests for everything" is only possible if you severely under-test
Your parent comment said "If we're building a rocket ship, you need both granular testing and coarse testing." Nobody is advocating for integration tests to the exclusion of unit tests. If you write a hash table or a CSV parser, yes you should unit test it.
But for most application-ish functionality you should reach for integration tests first. For example, testing direct message functionality in an app, checking that after a send there's a notification email queued and the recipient inbox endpoint says there's 1 unread will get you really far in 20 lines of code. Is it exhaustive? Of course not. But the simplicity is a huge virtue.
if you severely under-test (which usually includes thing like not properly testing for error conditions etc.)
I advocate "default to integration testing for application functionality". Those focused on unit tests often mock exactly the things most likely to break: integration points between systems. "Unit" tests of systems are often really verbose, prescriptive about internal state, and worst of all don't catch the bits that actually break.
> Nobody is advocating for integration tests to the exclusion of unit tests.
Oh, I got here just now, but well, let me advocate it. (Well, not all unit tests, but most of them.)
The ideal layer to test is the one that gives you visible behavior. You should test there, and compare the behavior with the specification.
Invisible behavior is almost never well defined and as a consequence any test there has a very high maintenance cost and low confidence results. Besides, it has a huge test area that comes with the large freedom of choice there. It is a bad thing to test in general.
Now, of course there are exceptions where the invisible behavior is well defined or where it has a lower test area than the visible one. On this case it's well worth testing there. But those are the exception.
> I advocate "default to integration testing for application functionality". Those focused on unit tests often mock exactly the things most likely to break: integration points between systems. "Unit" tests of systems are often really verbose, prescriptive about internal state, and worst of all don't catch the bits that actually break.
You're writing this as a comment to an article that explains exactly how to write unit tests that aren't brittle and avoid mocking.
"Only integration tests" vs. "brittle unit tests" is a false dichotomy.
I wholeheartedly agree that people can't agree on what "unit test" means precisely, either (even though I think your specific examples are a bit disingenuous). In particular, classical and London-school / mockist TDD have rather different definitions of it.
That's why it's important to have a well-rounded test strategy with different types of tests that have different purposes, instead of using some blanket approaches.
Great article, not the first time I read this argument against the usage of mocks; I never understood though how to solve the problem of the explosion of the path that must be tested (usually exponential).
As an example, consider a single API that uses ~3 services, and these 3 services have underneath from 1 to 5 other internal or external dependencies (such as time, an external API service, a DB repository, and so on).
How can I test this API, the 3 services, and their underneath dependencies without exponential paths to test - I want to be able to cover all the paths of my code, and ensure that my test only tests a single thing (either the API, the service, or the dependency interface); otherwise, it is not an unit test.
I always felt like that these type of tests without mocks works super-nice in nice situations without any external, or even complex but internal, dependency; otherwise, it becomes very very hard to test ONLY what I want, and not all the dependencies underneath.
Mocks allow me to stub the behaviour of a service/dependency that I can test in a separate fashion, covering all the paths, and ensuring that each unit test covers a single unit of my code, and not the integration of all my components.
Your dependencies do need to work reliably if you're not going to use mocks. But if they don't, then mocks might be necessary to ensure test reliability. That said if the interaction between your system and its deps is sufficiently unreliable that you need mocks for test reliability, how are you going to have any confidence in the system's production behavior?
Like, if I test one of the workflows of my service, it might involve executing queries and transactions on an underlying database. But these operations have essentially 100% reliability, so the fact that these operations are being "tested" at the same time as my service do not impact test reliability. I gain nothing by mocking them out.
Well, I strongly believe in testing all the queries/transactions (for example the repository pattern is super helpful to separate concerns and allow you to test only what concern your queries and the DB); but those are tested separately from the workflow of your service, if we are talking about unit tests. Why? Because otherwise either you write a gaziliion of tests, or they are unreliable. As an example, if you have in the service you wanna test various code paths, and your queries have paths of their own (what happens if you don't retrieve any object? if some property is null? and so on), I just think it gets messy quickly.
My solution is to test things separately, at least in unit tests. Obviously, this has pitfalls (it's easy and fun to write green tests, so not always tests respect the interface of the components and they don't get red even if something is wrong); but that's where integration testing come into play.
Have a few code paths where you touch multiple external services, and you want to test that everything actually works? Create integration tests that use either a fake or the actual service in a `staging` environment, take 10x time to run, but test the path that is most important for your logic. Obviously, if you many unit tests, you will have way less integration tests, but they serve different scopes, and one cannot substitute the other!
> I gain nothing by mocking [database queries and transactions] them out.
What about the fact that you will need to spin up, migrate and possibly seed a database in your CI pipeline? What about the toll this will take on the execution speed of your test suite? Additionally, consider that you also need to test the behavior of your system when the query fails, and using a mock implementation that always throws an exception is a trivial and reliable way of achieving this.
> That said if the interaction between your system and it's deps is sufficiently unreliable that you need mocks for test reliability, how are you going to have any confidence in the system's production behavior?
Sometimes your codebase depends on external services which are flaky for reasons beyond your control, just the way it be sometimes. Mocks are useful to ensure the system behaves a certain way when everything goes right as well as when everything goes wrong.
Ultimately the article raises many good points about avoiding mocks if it can be helped, but don't forget a test that only tests the happy path of your system is not very useful. Mock an error in that dependency you expect to always work and understand what would happen, make the necessary provisions.
> What about the fact that you will need to spin up, migrate and possibly seed a database in your CI pipeline?
The database system I use can be configured to start up reasonably quickly, and can be configured to operate on memfiles to reduce io pressure on the CI system. In fact, testing against the full scale local database is the only supported methodology for this particular rdbms.
> Sometimes your codebase depends on external services which are flaky for reasons beyond your control, just the way it be sometimes.
No doubt no doubt. As I mentioned, mocks or fakes might be necessary in a condition like this.
> Ultimately the article raises many good points about avoiding mocks if it can be helped, but don't forget a test that only tests the happy path of your system is not very useful.
My team uses interception and error injection for this case. We still have the real backend, but requests can be forced to fail either before or after executing on the backend.
> My team uses interception and error injection for this case. We still have the real backend, but requests can be forced to fail either before or after executing on the backend.
Really cool. What technologies are you using to achieve this? We've also had to tackle stuff like this before, but I'm not sure of the optimal way of doing it.
To be honest, this part is handled manually via a test value injector. There's a global map of names to "adjusters", which are either values or callbacks that are allowed to manipulate a value. If there's a particular request we want to fail after making, we do something like this:
// System under test code:
auto status = DoRequest();
TestManipulate("after_request", status);
if (IsError(status)) {
// Handle errors.
}
// Test code
SetTestValue("after_request", MyErrorStatus());
RunSystemUnderTest();
The module that handles test value manipulation is typically disabled by a global variable, and lives in the cold section under FDO and opt compilation, so it costs essentially nothing. It cannot be enabled in release builds due to the code that enables it being compiled out.
We find in many cases this isn't really necessary, as the error handling code in most cases just bubbles up the error, and so it is not particularly interesting to test. There are a few cases where we do this where the error handling is more complex than "return same error to user" or "retry after delay."
This type of thing can't always solve the problem, but it's often good enough to get things done.
I don’t get it. When I unit test, I want to only test what I’m testing. A dependency or the result of a dependency is not what I’m testing. So I mock the result of that dependency to unlink the test from the dependency. If an interface changes I generally want to know anyway, as the test may be different or even obsolete depending on the change.
I hear you, and that's how things work at my present shop.
However, let me give a strong defense of the point.
It turns out that the only thing you can test is a pure function. This is just the nature of a test, we pin down the inputs to a piece of code and we see what outputs it produces. All of the mocking that you are doing is an attempt to turn an impure function into a pure function. Even more extreme setups where you connect your container to a container running postgres, is trying to turn that container into a pure function in a different way. You don't need to do any of this if the function is pure in the first place.
Once we've established that “functional core” is a lazier means to the same ends the question of dependencies still comes up, and the “should I mock dependencies” question becomes “should I promote this internal function call to the main I/O section and pass in its result as an argument?”... And the answer is that that changes the language with which this outermost level is written. And probably this outermost shell should be written to sound like the business logic that you are implementing, and you should never do that.
If that's the logic then it means that the sort of testing you're doing is extremely particular and fussy. Because it suggests that every test should really be constructed at a business level, it is a story about your product that you want to make sure holds fast even when the internals are changed. This works really well with domain driven design, because that says that your modules should be also business level entities, so each module comes with some tests that say, here's what this sort of person interacts with the system like. So you are always testing integration of the pure functions, and why would you not. If those pure functions do not integrate together, you want to know about it, and you do not want to know about it through a persnickety test which just fixes the inputs and outputs of something that has no business relevance, because you know what happens in those cases: the developer just rewrites the test to say the opposite of what it used to say, so that it passes now. There is no semantic check on the test output because there cannot be if it is scoped too small.
I think you can quibble a lot with those details but I think that's the strongest case you can make for it?
I feel like you are still making the case for it. “Should I promote the logic of the dependency to the calling module?” No, I shouldn’t. That’s why I made them a dependency. They could be dependent to many modules, or there could be many implementations based on ioc implementations or a some random factory or strategy pattern. Or maybe I do not control the dependency as it is external to our platform. I can understand all too well the engineering bias to do less work. “But now all my tests are broken” may be the correct result.
With that said without concrete examples too argue about it’s hard to say if we are even disagreeing. The articles examples were wanting and to what level you break up your code is a hard fought learning exercise. Some don’t care, some say no more than fits on a screen, and on the other end some people use an NPM package to find out if a number is even.
When I unit test, there is normally not much going on in the “unit” besides its use of dependencies, so the tests are largely circular (asserting things about mock expectations).
I still have to write them, because Thou Shalt Have Unit Tests. I can’t consolidate the overly-trivial units, because that would not be Architecture Best Practices, and might even be Spaghetti Code.
Isn't that sort of the point of the unit test in that case? You test the use of those dependencies:
Write a test where a mocked dependency returns something unexpected and see how your 'unit' responds.
This is why mocks are useful because often you rely on implementation details of your dependencies and only test the happy cases. With a mock you can return whatever edge cases you come up with and ensure your unit handles everything as expected.
No? The point of a unit test is to tell me something non-obvious about how the function behaves in certain circumstances. When unit testing glue code there is no information in the test result that is not also literally written out in the unit under test (it makes these calls in this order).
The way I think about tests is: what’s the cost of failure here vs the cost of testing?
Sometimes, bugs in this part of the codebase would not be a showstopper, so why test as heavily?
The most important tests by far are smokescreen integration tests for critical paths through the system. I tend to care much less about other tests in many cases.
Obviously, if I was writing a compiler or database or medical software, that’d be different. But I’m generally writing web applications where if the entire application were to fail for a day, we probably wouldn’t even lose a customer.
This article espouses what is known as "classicist testing" school of thought and rejects "mockist/London-style testing". I wholeheartedly agree with it. I have been championing it in my team of 20+ devs since many years now, to great effect. You can read more about it in the excellent book by Vladimir Khorikov from January 2020 titled "Unit Testing Principles, Practices, and Patterns".
For fakes, spin up the real thing. If you’re not able to model your database transactions deterministically, then your transactions could themselves be flawed and tests are great way to catch that.
Deterministic tests are not a goal in and of themselves. Controlled non determinism is valuable. This is popularized in various frameworks under the names of property checking and fuzzing which will let you know the seed to use for the failure for example so that while the runs don’t have the exact same input/output for every invocation, you get better coverage of your test space and can revisit the problematic points at any time. If you’re doing numeric simulation, make sure you are using a PRNG that’s seedable and that you log the seed at the start if you’re using a seed (and make sure time is an input parameter). Why is this technique valuable? You transitively get increasing code coverage for free through CI/coworkers running the tests AND you have a way to investigate issues sanely.
"Non-deterministic tests" usually refers to a test whose output depends on the execution environment in a way that cannot be controlled. For example, a multithreaded test with a race condition, or a test that uses the real-time clock.
This is fundamentally different from a test that uses random numbers. Another way to look at it: Deterministic tests can be used in `git blame`. A unit test that uses a specific PRNG algorithm and sets the seed is fine. A test with a race condition is not.
Fuzzing is very useful, but it's not unit testing. If you want to run a fuzzer or some other kind of endless randomized testing in CI, it should be a separate job from the unit tests (IMO).
If you are testing the externalities, they aren't unit tests, but integration tests.
The better advice is: continue to isolate unit tests away from real dependencies, and ALSO have integration tests that test the way the package connects to dependencies (and the other packages in the software dependent on it).
I have two services that communicate over the network. Neither does anything useful without data from the other. So I write my tests like this article suggests and inject at the boundaries of the services, the network boundary in this case. I've built tests that now only test my idea of what the services will return. IMO these aren't particularly useful tests: what happens when the remote service starts returning unexpected data or errors due load problems? I guess my point is just this article is good advice: prefer testing your actual code and dependencies, but unit testing isn't a replacement for integration testing.
I worked somewhere with a hard rule that you are not allowed to test internal classes.
So if I write my own sort algorithm I am not allowed to unit test that class
unless I make it public.
But if the sort algorithm is to solve a specific sub problem in a library there is no need to make it public - it may not make much sense.
So I had to test it “through” its consumer class(es) that is public. For illustration sake lets say a MVC controller mocked up to the eyeballs with ORM mocks, logging mocks etc.
This bugged me because having
direct access to a functional
core I can quickly amplify the number of test cases against it and find hidden potential
bugs much quicker.
I would say that testing only the public contract is generally very helpful.
When the logic gets super complex, I would think that a good comprise is to modularize your code - and test the public API of each module. A good heuristic is to see if that modules are (or could be) helpful by themselves in the future.
You then write tests for a module assuming it's dependents cover their edge cases.
As I wrote it I thought the same thing! I think some of the problem was org friction of adding a new module, getting approval to do so. This was a place with about 400 devs so they could really let anyone freely add modules and waiting for architecture approval would take too long for a typical ticket.
> So if I write my own sort algorithm I am not allowed to unit test that class unless I make it public.
I think there's a slow movement away from this kind of straightjacket. The compiler sees it all anyway. Class accessors serve 2 purposes.
1. They are a social tool; and this is only if you believe that developers who work on source code, which they can read, cannot be trusted to call methods to do what they need.
2. They are a convenience for opaque modules, so that users who may not have access to the source code, can avoid using APIs that may have no effect or problematic side effects, while also decluttering the API.
Python allows for methods to act as if they are declared public when using a special syntax eg _backdoor(). Go allows for any method that is in the same package, to access any other method regardless of accessors...like tests for that package. In javascript, you can use rewire.js to mock methods in imported modules (which are effectively methods in closures).
These solutions are an improvement to the classic rigid class access pattern that many languages are stuck with and forces the kind of theatre you have experienced.
In database land you can even freely bring up a number of _proprietary_ databases without needing an account or license. I spin up MySQL, Postgres, Oracle and SQL Server databases in containers in Github Actions for tests. Unfortunately IBM DB2 seems to require a license key to spin up their container. :(
I have used this approach and now I perceive mocks as a bad code smell. It's worth noting that long and complicated flows need to be wrapped in sagas (which are just imperative shells) and things can get not so easy. I would love to hear what are other clean alternatives in such cases. Still, "functional core, imperative shell" is the way to go, code really fits in the head and tests actually make sense.
When you build and sell a network security appliance, the appliance has to work properly all the time. You have to give guarantees of how it will perform, guarantee that its functions do what you claim they do, and eliminate all possible bugs. So you have to test it in every configuration possible.
So you make models and build automation test frameworks and labs full of prototype equipment so you can run 10,000 tests an hour, 24/7. You automate setting up networks and nodes and passing live traffic to test detectors and rule sets. You use custom hardware to emulate ISP-levels of real traffic.
You can't really mock anything. You have to be sure each function will perform as you describe. So most of this testing is end-to-end testing. Using a mock wouldn't tell you if the millions of code paths are working correctly; they'd mostly just tell you if the syntax/arguments/etc of a call were correct. Unit tests are basically one step above evaluating that your code compiles correctly, but it's not testing the code works as expected. End-to-end tests are what you can rely on, because it's real traffic to real devices.
That's gonna look different for a lot of your apps, but for web apps that means getting real familiar with synthetic tests and how to test with different browsers. For SDKs it means end-to-end tests for the whole stack of functions, which is a lot more testing than you may have expected. For APIs it means getting the consumers of your APIs involved in end-to-end testing. It also means "testing in production", in the sense of spinning up new production deployments and using those as your end-to-end testing targets (rather than having a dedicated "dev/test/cert" environment which is always deviating from production). This can take a significant effort to adapt in legacy systems, but for Greenfield IaC projects is not very hard to set up.
I 100% agree, but rarely have found much to put in a functional core. Most everything these days is distributed microservices talking to each other (different argument) and having more than a couple lines in a row that don't call some other service, even in a well-factored app, is almost a cause for celebration.
I’ve spent the last few years working on test infrastructure for blockchain projects so this question is regularly on my mind, and I’ve found the best solution is to just include the dependencies. Compute and storage is cheap enough that you can spin up these external tools and the extra level of assurance makes me sleep easier, and while flakey tests are theoretically an issue I rarely find them (and they tend to be programmatically fixable).
If a unit test requires an external dependency then just use an integration test (or check it’s covered by system tests) and leave it at that.
I prefer mocks over real services because I onboard many people to a repo. Too many services attached to unit tests turns into a fatigue until eventually no one runs the tests any more I found.
I agree that fakes are very good for testing, with one caveat: the fakes must be owned by the same folks that own the real implementation.
When fakes are owned by anyone else, e.g. the folks writing the system under test, there's a high risk that the semantics of the fake and the real implementation will diverge, rendering the tests that use the fakes much less useful.
Ian Cooper talked about this (and more) 5 years ago: https://www.youtube.com/watch?v=EZ05e7EMOLM Well worth listening to this talk. It completely changed the way I did testing.
Agreed. Push all external dependencies (DBs, APIs, etc) to the edge. That's it !
Why do you need a functional core though ?
You can use decoupled classes and an ioc container that does the wiring for you.
If some method has several dependencies that are tangentially related with what I'm testing, I'm going to mock the shit out of them, and let people know we should do something about it.
Testing "the real thing" sounds fun until the budget for new features skyrockets, because several man-weeks are needed to get decent coverage. Your client will hate it, and your client's clients will hate it even more.
So... in order to do unit testing, you should shift your entire system to be written in functional style, and then write integration tests instead of unit tests? Because that's what you get when you use the real components.
- Prefer real objects, or fakes over mocks. It will make your tests usually more robust.
- Use mocks when you must: to avoid networking, or other flaky things such as storage.
- Use mocks for “output only objects”, for example listeners, or when verifying the output for some logging. (But, prefer a good fake)
- Use mocks when you “need to get shit done”, it’s the easiest way to add tests in an area that has almost none, and the code is not designed to be easily testable. But remember this is tech debt, and try to migrate towards real objects over time.
That’s my short advice I told many times. So might as well comment with it here.