Hacker News new | past | comments | ask | show | jobs | submit login
Prefer table driven tests (2019) (cheney.net)
28 points by PaulHoule 8 months ago | hide | past | favorite | 64 comments



I split tests into two categories. One is "I'm testing a bunch of different variations of the same thing." Table tests are great for this, they make the code pretty readable and, perhaps more importantly, the process of writing the inner part of the table test seems to nudge you to think about all the permutations and combinations of what you're testing can be. It leads to good tests and helps you reason about your inputs.

The other kind of test is what I call a "storybook test." Table tests are less useful here, this is usually for testing CRUD in a coherent manner but can apply to any kind of object that has a lifecycle that you need to transition between various states. You may argue: aha! This is a state machine and you ought to simply independently test each state transition! Fair enough, I say, and yes in some cases I might actually do so if the API that's exposed lends itself to having its internal state set so explicitly. But more often than not I argue that the story itself is more valuable than exactly what permutations of the inputs and state get tested at each call. What the test represents is that some flow is important, gives an example of it, and helps show that this general use-case is what we want to keep working, not necessarily these specific state transitions.

Mind you reality is messy so tests often blur and blend and so on. But when I sit down and write a new test, I'm usually imagining it as one of these two archetypes of tests.


You might like Hypothesis' approach to stateful testing. See https://hypothesis.readthedocs.io/en/latest/stateful.html I think what they call stateful testing is close to what you describe as a storybook test.

Hypothesis is a property based testing library.


> [T]he process of writing the inner part of the table test seems to nudge you to think about all the permutations and combinations of what you're testing can be. It leads to good tests and helps you reason about your inputs.

I think this is not inherent. If you need to test all permutations and combinations, do the exhaustive property testing instead. Table tests (ideally) represent a pruned tree of possibilities that make intuitive sense to humans, but that pruning itself doesn't always neatly fit into tables. But otherwise I generally agree.

When I write a specific set of tests I first start with representative positive cases, and I think you would call them as a storybook test (because it serves both as a test and an example). Then I map out edge and negative cases, but with a knowledge of some internal working in order to prune them. For example if some function `f(x, y)` requires both `x` and `y` to be a square number, I can reasonably expect that such check happens before anything else so we can only feed square numbers when we are confident enough that checks are correct. And then I optionally write more expensive tests to fill any remaining gaps---exhaustive testing, randomized testing, property testing and so on.

Note that these tests are essentially sequential because each test relies on assumptions that are thought to be verified by earlier tests. Some of them can make use of tables, but the entire test should look like a gradient of increasing complexity and assurance in my opinion. Table tests are only a good fit for some portion of that gradient. And I don't like per-case subtests suggested in the OP---it just feels like a workaround over Go's inability to provide more contexts on panic.


Btw, those tables making intuitive sense to humans sounds like a strength, but it's also a big weakness. Humans aren't always good with thinking about corner cases.

Property based testing is one way to get past this limitation. (As you also imply in the later part of your comment.)


One of difficulties with property testing is that humans are notably bad at specifying good enough properties. A canonical example is sorting: if your property is that every consecutive element should be ordered, an implementation that duplicates some element in place of others won't be caught. We can't always come up with a complete property, but intuitive cases can improve otherwise incomplete properties.


Yes, coming up with good properties is a skill that requires practice.

I find that in practice, training that skills also helps people come up with better designs for their code. (And people are even more hopeless at coming up with good example-based test cases.)

Of course, property-based testing doesn't mean you have to swear off specifying examples by hand. You can mix and match.

When you are just starting out with property based testing, as a minimum you can come up with some examples, but then replace the parts that shouldn't matter (eg that your string is exactly 'foobar') with arbitrary values.

That's only slightly more complicated than a fixed example, and only slightly more comprehensive in testing; but it's much better in terms of how well your tests are documenting your code for fellow humans. (Eg you have to be explicit about whether the empty string would be ok.)


>The other kind of test is what I call a "storybook test." Table tests are less useful here

Storybook tests lend themselves well to forked tests. E.g.

  steps:
  ...
  - click: shopping basket

  variations:
    Empty basket:
      following steps:
      - click: empty basket
      - items in basket: 0

    Purchase basket:
      following steps:
      - click: purchase
      ...


Are you dogmatic about things such as only one assert per test case or similar?


Not at all! Quite the opposite. I encourage people to write tests that make sense, are readable, and are a good use of their time. I encourage people to NOT write tests if they don't actually believe that it improves their belief the code actually works in prod, and also if they think it's not worth the LOC.


In Python land this is called "parameterized tests": https://docs.pytest.org/en/7.1.x/example/parametrize.html

I think the author might want to look into mutation testing or property based testing to get to something closer to fully testing stuff.


Yes, and going further, I think you'll enjoy the amazing lib Pytest-cases[1] which enables naming the parametrized tests, listing them not as decorators but as separate functions/classes.

See the europython video intro: [2], or as slides[3]

[1] https://smarie.github.io/python-pytest-cases/ [2]: https://www.youtube.com/watch?v=QTyfr2UNR98 [3]: https://ep2021.europython.eu/talks/649sqwq-powerful-tests-an...


> Please don’t email me to argue that map iteration order is random. It’s not.

He definitely knows about this, but more clearly: there is no specified behavior about the map iteration order with a clarification that it can vary for each iteration of the same map, but the intent is that it is explicitly going to be unpredictable [1] and the main Go implementation always behaved like that since 1.0, so such argument is not exactly wrong. (I don't know whether this distinction is intentional. Most likely a simple oversight.)

[1] https://go.dev/doc/go1#iteration


As usually, Gophers might have some good ideas here and there, but they are stuck in their own world..

The obvious-in-hindsight next step up from table driven tests is property based testing. That's where you ask the computer to generate the tables for you, and also to automatically shrink any counterexamples it finds to be as simple as possible.

See https://fsharpforfunandprofit.com/series/property-based-test... for an introduction.

(One big benefit you get from property based testing is that the computer only needs to be taught once to always remember to also test edge cases like empty strings and zero length arrays, and NaN and infinity for your floats etc. Instead of a human having to remember for every test that zero is also a number.

Of course, property-based-testing libraries also allow you to exclude those values, when they are not applicable. Eg not everything that consumes floats needs to handle NaNs.

And explicitly filtering NaNs and infinities from your test case generation serves as good documentation for those kinds of assumptions.)


The problem with property-based tests is that for non-trivial input data, like e.g. source code for a compiler, it’s hard to implement shrinking, since it’s hard to tell in general what it means to simplify a program (or another piece of non-trivial input). Another issue is that randomisation introduces flakiness when the input space is large. I prefer to write tests by writing a small number of simple base cases and then code which produces increasingly more complex combinations of them up to a certain size. It’s completely deterministic and the order goes from simpler to more complex, so when the test fails, it fails at the simplest case it could find and doesn’t need to go from more complex, to simpler, which is much mire complex. By moving from property-based tests to this approach I’ve sped up my tests and made them more reliable. Previously they weren’t as effective at finding bugs, and when they did find them, it was hard to reproduce other than by copying the random generator seed. And the shrinking code was very complex.

So my approach intuitively is a bit like writing down a small set of simple axioms and then generating their more complex consequences. The key being the order.


You are right that shrinking is a problem for QuickCheck inspired property based testing libraries. Please do have a look at Python's Hypothesis.

Hypothesis is perhaps the most advanced property based testing library. Or at least: miles ahead of QuickCheck. Hypothesis 'solved' shrinking by integrating it with generation. See https://hypothesis.works/articles/integrated-shrinking/ for an overview. And https://hypothesis.works/articles/compositional-shrinking/ for some more.

I am glad that you found a way to (manually) generate useful test cases. I would like to learn more, and I wonder how we can perhaps automate your technique to make property based testing better!

(David R. MacIver, the author of Hypothesis, has lots of interesting ideas. He says that property based testing is such a great concept that QuickCheck, despite making all the wrong design choices whenever they had any choice, was still a revolutionary library.)


> Or at least: miles ahead of QuickCheck

I'm not so sure about this. The blog post saying that it's better to tie size to generation is dated 2016.

It looks like QuickGen was tying size to generation in 2008: https://hackage.haskell.org/package/QuickCheck-2.1.0.1/docs/...

EDIT: I take it back, it looks like there is shrinking in modern QuickCheck https://hackage.haskell.org/package/QuickCheck-2.14.3/docs/T...


Hypothesis approach looks interesting. I’ll look into it.

As for my approach to generation, there’s not much to it. Instead of generating arbitrarily complex values and then trying to shrink them to simpler ones, I start with simple values and, if they work, proceed to more complex ones. In every test run I use the same values. It’s basically a brute-force, which omits values which are sufficiently similar to values already tested.

The omission is achieved by having a base set of simple values where each is judged by me to be sufficiently different from the others (I basically copy e.g. tricky numbers/unicode values from quickcheck implementations, which are used inside generating code), and the base set is ordered from simplest to less simple. Then I run a loop which goes over all elements of the base set, run the test, then take a function which based on the base set produces more complex values, run the test on that etc. So the earliest fail is going to be the simplest value it can generate.


> Another issue is that randomisation introduces flakiness when the input space is large.

But that is precisely the point of property-based testing - as opposed to hard-coded inputs. The flakiness is in your code, not your test. Change your code until the PBT stops flaking.


Door A: tests find bugs sometimes (property-based)

Door B: tests find bugs almost always with a fraction of the code (my approach)

I choose Door B.

I still test properties tbh. I just don’t let randomness ruin it and don’t need to implement any shrinking.


Python's Hypothesis automatically saves the random seeds that find counterexamples, so the next time you run your test suite, you can reproduce that bug.


If your code does the same thing every time it's run.


Sure. But that's a problem with running any test suite (property based or otherwise).


In other words:

Python's Hypothesis automatically saves the random seeds that find counterexamples, so the next time you run your test suite, you may or may not be able to reproduce that bug.


This is not an exclusive choice. Choose all doors as each will gike you something the others don't. All includes things not yet invented.


I have a finite budget of code and time I can afford to spend. One of the choices gives me a much better RoI in my experience. I prefer to spend the rest of the budget on the main code.


> and also to automatically shrink any counterexamples it finds to be as simple as possible.

This actually is a problém with most quickcheck clones (and quickcheck itself), that's why it's better to use a library that is inspired by Python's Hypothesis (combining generation and shrinking, in a nutshell). So Hedgehog for Haskell and F# (instead of Fscheck) and Rapid for Go https://github.com/flyingmutant/rapid

But tables and property tests complement each other, even if you can get rid of some tables by using property testing.

Example article about the shrinking of Hypothesis vs. Quickcheck https://hypothesis.works/articles/integrated-shrinking/


Yes, Hypothesis is great! See also my response at https://news.ycombinator.com/item?id=39215177

I was fortunate enough to have met the author of Hypothesis, David R. MacIver, in London a few years ago.

In addition to better shrinking, Hypothesis also has a few more tricks up its sleeves compared to QuickCheck.


I’ve recently started using property based testing in C# using FsCheck. The FsCheck docs are already pretty unclear with poor examples, and the C# snippets are tiny and barely explained.

I want to use it more but it’s actually quite painful to even understand how to achieve the thing I want. I briefly looked at Hedgehog which seems to have even less documentation let alone anything helpful for C#.

Just started reading that link and I agree, I’ve found it confusing how FsCheck has Arbritary and Gen. Are there any other differences?


Yes, sadly the documentation of both could be way better. If somebody doesn't know property testing already (for example by using Hypothesis in Python), it is really hard to both learn the concept and the actual usage of the libraries (and their concepts) at the same time.

But it is quite strange that there is no C# alternative to FsCheck, I would have thought there would be. I'm sorry I can't help much more.


There are utilities for property-based testing in the standard library https://pkg.go.dev/testing/quick@go1.21.6. There's also fuzzing since Go 1.18 https://go.dev/doc/security/fuzz/.


Yes, but using quick is just a PITA, I'd recommend using Rapid https://github.com/flyingmutant/rapid instead


> As usually, Gophers might have some good ideas

More as usual, gophers rediscover old ideas and pass them off as innovations. Pytest added pytest.mark.parametrize back in 2011, junit 4 had first-class parametrized test support (don’t know if it was in 4.0 released circa 2006, but it was definitely in there no later than 2009), and data-driven testing was a thing in the late 90s.


I see no place in the linked article where this is passed off as an innovation. I see no claim that it is impossible in any other language or that Go is uniquely good at table based testing. All there is is a claim that in the Go community specifically this practice has been getting more popular, and this is a relatively old article now, dating back to the community being pretty young, so it's a sensible claim at the time.

I am also not aware of anywhere where the Go designers try to pass anything off as an "innovation". To my eyes they've always been very up front that they're not looking to innovate and that Go was always about recombining established ideas.

There is only one thing I am aware of that I have seen from the Go designers where they claimed actual innovation, and it was down in some grotty details about how garbage collection works in Go, and even that was more from the perspective of answering the FAQ "Why don't you just do exactly what Java does?" and giving a detailed answer about relevant differences in the runtime than any sort of "look what we've discovered".

If you are reading claims of "innovation" where none exist, I guess this makes sense of your hostility, but are you sure people are actually claiming what you think they're claiming?


Downvoters care to comment what's wrong with the information?

The sentiment also is the impression I got so far from the Golang community. Their own flavor for many things, but not much new. I think that is sort of the philosophy behind Golang, isn't it? They wanted a simple language, that many people could easily switch to, without learning new concepts. I mean, for 12 years of its existence Golang didn't even have generics, making it superficially even simpler. That is what you get with Golang. Not too surprising really. Some people like it, some don't.


Technically there's nothing wrong with it. That's the problem. These "gotcha" takes that some random language from 20 years ago already had feature x so talking about it now makes no sense are as common as they're misguided.

A simple example is Go's autoformatter `goftmt`. In my opinion an earth-shattering invention. But to see that you need to go a bit beyond the basic, binary thinking that only true novelty has any value.

There's a fundamental difference between a language like Python that has 2-3 competing autoformatters used by <5% of Python programmers [1] and Go's single autoformatter used by >95% of Go programmers.

Technically Python and Go are equivalent here, but in reality the gap between them is enormous. That's why pointing out that feature x has existed since the 70's or that every other language has x too isn't a useful insight.

[1]: I'm of course talking about the state of Python and Go around the time Go was released.


Table-based testing is property-based testing. Nothing in property testing says who has to choose the values used.

There are libraries that automatically test valid/invalid inputs for types. But automatic case generation is not required for property based testing.

A refinement is pairwise testing, where the machine chooses some minimal subset of cases to span the test space to combat combinatoric explosion and run in a reasonable amount of time.

https://www.pairwise.org

Further refinement is possible to cut the run time by pruning the cases-to-be-run based on previous failures. Unfortunately, then things start to seem even more "magical" and there is no set number of test cases always run and non-technical (or non-testing) management tends to have problems with that. One day we'll have nice things . . .


I've never heard of parameterised tests being called property based testing - they don't define a property just a list of cases. At that point we may as well call all unit tests property based tests and lose any useful distinction.


I agree that calling table based testing 'property based testing' is stretching the definition.

> Nothing in property testing says who has to choose the values used.

I take issue with that. Properties of interest are some variant of 'for all x from some set s, some property holds.'

Then instead of testing all possible x, we randomly sample. But it is important that our tests define the set s.

When we do example based testing (whether traditional unit testing, or table based testing), the set s is usually only implied. So you might be testing your functionality for strings 'foo', and 'bar' and perhaps 'foo bar'. But it's not clear, whether you also mean it to pass for the string '' or '안녕하세요' or '\t\n'. That's at best implied, or perhaps vaguely described in a human readable comment.


There's some efforts to guide test generation for property based testing to make the instruction pointer explore as large a space as possible.

This effort is more mature in the fuzzing community. See eg American Fuzzy Lop https://github.com/google/AFL


And mutation testing... which in my opinion is just way easier to do and more mentally straight forward. Plus you get the satisfaction is actually knowing you're DONE, unlike PBT where you just give up after a certain amount of CPU time has passed.


There's no silver bullet, not even mutation testing. For example, if the code for a certain case is straight up missing, there is nothing to mutate!

I also consider mutation testing to be in a different domain from property based testing, as mutation testing is some kind of metatesting instead. PBT is a 'regular' testing technique. This means that they are in fact complimentary, for example by using PBT to check mutants. (Although mutation testing does seem to assume a sort-of 'fixed' test set, but you can also fix a test set with PBT by e.g. explicitly setting a seed and test count.)


> Plus you get the satisfaction is actually knowing you're DONE

Have you somehow managed to overcome the maxim that "Testing can show that defects are present, but cannot prove that there are no defects" ?


No. But with mutation testing you can know that the test suite tests all behavior of the program.

If that behavior is wrong or not it can't say. For example it can't detect that the code needs MORE behavior! Some if-statement or loop that was never written.


How you write the tests that you're testing with mutatation does not matter. Or, to rephrase, you need both if you want to do mutatation testing (which is really slow).


I'm fond of taking this a step further and actually offloading the test cases to a separate language-agnostic file, then have implementations' test harnesses read cases from that file. The Ruby gem for Base32H¹ does exactly this to test the encoder/decoder logic, pulling the test cases from CSV files in a separate repo² included as a Git submodule. One of these days I need to port the other reference implementations over to using the tests repo, but once that's done I'll be able to update the test cases in one place and they'll automatically work everywhere.

----

¹ https://github.com/Base32H/base32h.rb/tree/master/spec

² https://github.com/Base32H/base32h-tests


What do you gain from having a separate language agnostic file?

If you are going to have a table, it seems to make sense to have the table next to the code that's using it. (Though I prefer property based testing.)


> What do you gain from having a separate language agnostic file?

Right now, nothing.

In the future, when I overcome my laziness, it's a single place to define tests that multiple projects in multiple languages (in this case, encoding/decoding libraries) need to pass in an identical manner. If I ever identify more edge cases that I need to test, I can add them in one place and all existing implementations would automatically test them, and for future implementations I only need to implement enough to read and loop through the test files and then I'm done.

Another advantage comes about in the corporate world (where most of my programming energy has less-than-fortunately gone): having test cases defined in a format like CSV that Excel can edit makes it easier for non-programmers (i.e. the vast majority of stakeholders for the average corporate software project) to read and write the test cases.


Table driven tests are fine for more than, let’s say 6 test entries, but otherwise a set of assert or require is much more pleasant to the eye. AND if you want to debug, just click on the stacktrace in your IDE and set the break point for that failing test. Way faster.

Again, table driven tests definitely have their place, but in the spirit of XP, it is not the most simple thing out there. Do not use by default.


You can target specific table driven tests by name in order to just run/debug a single case


Wow… this can’t possibly be how you write tests in Golang?!

Code: https://hastebin.com/share/puciyelayi.go

This is just insane, the ceremony is twice the length of the actual testing code. I seriously hope this is not representative of the language as a whole.

In Java I get the outcome with the Annotation @ParameterizedTest.


> this can’t possibly be how you write tests in Golang?!

Only for very simple functions like in the example. Most of the time it’s much worse.


The "ceremony" is just the for loop and the t.Run( ... ) call. Everything else you'd need to do in Java too.


Decidedly no. Unless you don’t know how to use JUnit. See my example below.


So where does the testSplitArguments method come from?


Can you write up the Java version too? I don't think it would be significantly more terse.

Maybe you'd lose the:

    for name, tc := range tests {
    t.Run(name, func(t *testing.T) {
But gain:

    @ParameterizedTest


Spock is the most readable framework for testing java IMO, their data driven testing is great, but it might not be a fair comparison because it's written with groovy:

  def 'split #input with sep #sep should return #expected'() {
    expect:
    Split(input, sep) == expected

    where:
    testCase      | input   | sep || expected
    "simple"      | "a/b/c" | "/" || ['a', 'b', 'c']
    "wrong sep"   | "a/b/c" | "," || ['a/b/c']
    "no sep"      | "abc"   | "/" || ['abc']
    "trailing sep"| "a/b/c/"| "/" || ['a', 'b', 'c']
  }


Alright, it's not quite as concise as I remember because you can't use arrays in a CSVSource. But I still argue that it is a vast improvement. You have your data + test condition and almost nothing on top.

Code: https://pastebin.com/14hLe6Fz

And now all of the horribleness of requiring a struct definition + defining the data inside your method + for loop is gone. The intend is clear and based upon your liking you can move the test data into a separate class, at the end of the file or next to the test case.

PS: I just realized you can't edit your post after 1+ hours have passed!


Placeholder

Will do once I’m back from work, ~ 3h


I was getting the same feeling.


I'm not convinced this is better than just using functions.

Change TestSplit() into DoTestSplit(), give it function parameters (input, sep, and want), and let it do all the work of calling Split(), comparing actual against expected, and failing with a good error message.

Then call it from four one-liner functions TestSplitSimple(), TestSplitWrongSep(), TestSplitNoSep(), and TestSplitTrailingSep().

The test framework can then have the right number of test cases and report failures the right way because there's still a separate function for each test case. (Or I assume so. I don't know Go.)

And to me, this is less cognitive load than, "Oh, here's a map literal, and the name passes through from here to t.Run()." When reading this, I'd rather focus on what I'm testing than how the table-based testing is doing it.

I think tables and sub tests would be useful in certain specific situations, though. I'm sure they're more concise if you have 25 cases instead of 4. In some cases, you might have two or more tables and want to test all possible combinations.


Table driven tests are a lot better than a bunch of imperative tests but they rapidly become unwieldy to debug, maintain, and evolve. Their readability often isn’t great.

If you’re using go, check out https://github.com/cockroachdb/datadriven. It takes a little bit of effort to craft a testing dsl, but it is so worth it.

Also, snapshot style testing where the test writes out its expectations and you just inspect it and save it (part of datadriven) is wonderful.

I’ve been using insta in rust lately and it’s some of what I want but not quite datadriven.


I also can’t stand when I want to debug a single table test with VSCode debugger+dlv and I have to comment out hundreds of lines of table tests.


The goal of this incredibly long-winded detour seems to be to avoid having something reasonably nice like:

https://www.kylheku.com/cgit/txr/tree/tests/015/split.tl


Test how it makes sense for your use case. Table driven tests are fine, and I use them for cases that call for it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: