Hacker News new | past | comments | ask | show | jobs | submit login

As usually, Gophers might have some good ideas here and there, but they are stuck in their own world..

The obvious-in-hindsight next step up from table driven tests is property based testing. That's where you ask the computer to generate the tables for you, and also to automatically shrink any counterexamples it finds to be as simple as possible.

See https://fsharpforfunandprofit.com/series/property-based-test... for an introduction.

(One big benefit you get from property based testing is that the computer only needs to be taught once to always remember to also test edge cases like empty strings and zero length arrays, and NaN and infinity for your floats etc. Instead of a human having to remember for every test that zero is also a number.

Of course, property-based-testing libraries also allow you to exclude those values, when they are not applicable. Eg not everything that consumes floats needs to handle NaNs.

And explicitly filtering NaNs and infinities from your test case generation serves as good documentation for those kinds of assumptions.)




The problem with property-based tests is that for non-trivial input data, like e.g. source code for a compiler, it’s hard to implement shrinking, since it’s hard to tell in general what it means to simplify a program (or another piece of non-trivial input). Another issue is that randomisation introduces flakiness when the input space is large. I prefer to write tests by writing a small number of simple base cases and then code which produces increasingly more complex combinations of them up to a certain size. It’s completely deterministic and the order goes from simpler to more complex, so when the test fails, it fails at the simplest case it could find and doesn’t need to go from more complex, to simpler, which is much mire complex. By moving from property-based tests to this approach I’ve sped up my tests and made them more reliable. Previously they weren’t as effective at finding bugs, and when they did find them, it was hard to reproduce other than by copying the random generator seed. And the shrinking code was very complex.

So my approach intuitively is a bit like writing down a small set of simple axioms and then generating their more complex consequences. The key being the order.


You are right that shrinking is a problem for QuickCheck inspired property based testing libraries. Please do have a look at Python's Hypothesis.

Hypothesis is perhaps the most advanced property based testing library. Or at least: miles ahead of QuickCheck. Hypothesis 'solved' shrinking by integrating it with generation. See https://hypothesis.works/articles/integrated-shrinking/ for an overview. And https://hypothesis.works/articles/compositional-shrinking/ for some more.

I am glad that you found a way to (manually) generate useful test cases. I would like to learn more, and I wonder how we can perhaps automate your technique to make property based testing better!

(David R. MacIver, the author of Hypothesis, has lots of interesting ideas. He says that property based testing is such a great concept that QuickCheck, despite making all the wrong design choices whenever they had any choice, was still a revolutionary library.)


> Or at least: miles ahead of QuickCheck

I'm not so sure about this. The blog post saying that it's better to tie size to generation is dated 2016.

It looks like QuickGen was tying size to generation in 2008: https://hackage.haskell.org/package/QuickCheck-2.1.0.1/docs/...

EDIT: I take it back, it looks like there is shrinking in modern QuickCheck https://hackage.haskell.org/package/QuickCheck-2.14.3/docs/T...


Hypothesis approach looks interesting. I’ll look into it.

As for my approach to generation, there’s not much to it. Instead of generating arbitrarily complex values and then trying to shrink them to simpler ones, I start with simple values and, if they work, proceed to more complex ones. In every test run I use the same values. It’s basically a brute-force, which omits values which are sufficiently similar to values already tested.

The omission is achieved by having a base set of simple values where each is judged by me to be sufficiently different from the others (I basically copy e.g. tricky numbers/unicode values from quickcheck implementations, which are used inside generating code), and the base set is ordered from simplest to less simple. Then I run a loop which goes over all elements of the base set, run the test, then take a function which based on the base set produces more complex values, run the test on that etc. So the earliest fail is going to be the simplest value it can generate.


> Another issue is that randomisation introduces flakiness when the input space is large.

But that is precisely the point of property-based testing - as opposed to hard-coded inputs. The flakiness is in your code, not your test. Change your code until the PBT stops flaking.


Door A: tests find bugs sometimes (property-based)

Door B: tests find bugs almost always with a fraction of the code (my approach)

I choose Door B.

I still test properties tbh. I just don’t let randomness ruin it and don’t need to implement any shrinking.


Python's Hypothesis automatically saves the random seeds that find counterexamples, so the next time you run your test suite, you can reproduce that bug.


If your code does the same thing every time it's run.


Sure. But that's a problem with running any test suite (property based or otherwise).


In other words:

Python's Hypothesis automatically saves the random seeds that find counterexamples, so the next time you run your test suite, you may or may not be able to reproduce that bug.


This is not an exclusive choice. Choose all doors as each will gike you something the others don't. All includes things not yet invented.


I have a finite budget of code and time I can afford to spend. One of the choices gives me a much better RoI in my experience. I prefer to spend the rest of the budget on the main code.


> and also to automatically shrink any counterexamples it finds to be as simple as possible.

This actually is a problém with most quickcheck clones (and quickcheck itself), that's why it's better to use a library that is inspired by Python's Hypothesis (combining generation and shrinking, in a nutshell). So Hedgehog for Haskell and F# (instead of Fscheck) and Rapid for Go https://github.com/flyingmutant/rapid

But tables and property tests complement each other, even if you can get rid of some tables by using property testing.

Example article about the shrinking of Hypothesis vs. Quickcheck https://hypothesis.works/articles/integrated-shrinking/


Yes, Hypothesis is great! See also my response at https://news.ycombinator.com/item?id=39215177

I was fortunate enough to have met the author of Hypothesis, David R. MacIver, in London a few years ago.

In addition to better shrinking, Hypothesis also has a few more tricks up its sleeves compared to QuickCheck.


I’ve recently started using property based testing in C# using FsCheck. The FsCheck docs are already pretty unclear with poor examples, and the C# snippets are tiny and barely explained.

I want to use it more but it’s actually quite painful to even understand how to achieve the thing I want. I briefly looked at Hedgehog which seems to have even less documentation let alone anything helpful for C#.

Just started reading that link and I agree, I’ve found it confusing how FsCheck has Arbritary and Gen. Are there any other differences?


Yes, sadly the documentation of both could be way better. If somebody doesn't know property testing already (for example by using Hypothesis in Python), it is really hard to both learn the concept and the actual usage of the libraries (and their concepts) at the same time.

But it is quite strange that there is no C# alternative to FsCheck, I would have thought there would be. I'm sorry I can't help much more.


There are utilities for property-based testing in the standard library https://pkg.go.dev/testing/quick@go1.21.6. There's also fuzzing since Go 1.18 https://go.dev/doc/security/fuzz/.


Yes, but using quick is just a PITA, I'd recommend using Rapid https://github.com/flyingmutant/rapid instead


> As usually, Gophers might have some good ideas

More as usual, gophers rediscover old ideas and pass them off as innovations. Pytest added pytest.mark.parametrize back in 2011, junit 4 had first-class parametrized test support (don’t know if it was in 4.0 released circa 2006, but it was definitely in there no later than 2009), and data-driven testing was a thing in the late 90s.


I see no place in the linked article where this is passed off as an innovation. I see no claim that it is impossible in any other language or that Go is uniquely good at table based testing. All there is is a claim that in the Go community specifically this practice has been getting more popular, and this is a relatively old article now, dating back to the community being pretty young, so it's a sensible claim at the time.

I am also not aware of anywhere where the Go designers try to pass anything off as an "innovation". To my eyes they've always been very up front that they're not looking to innovate and that Go was always about recombining established ideas.

There is only one thing I am aware of that I have seen from the Go designers where they claimed actual innovation, and it was down in some grotty details about how garbage collection works in Go, and even that was more from the perspective of answering the FAQ "Why don't you just do exactly what Java does?" and giving a detailed answer about relevant differences in the runtime than any sort of "look what we've discovered".

If you are reading claims of "innovation" where none exist, I guess this makes sense of your hostility, but are you sure people are actually claiming what you think they're claiming?


Downvoters care to comment what's wrong with the information?

The sentiment also is the impression I got so far from the Golang community. Their own flavor for many things, but not much new. I think that is sort of the philosophy behind Golang, isn't it? They wanted a simple language, that many people could easily switch to, without learning new concepts. I mean, for 12 years of its existence Golang didn't even have generics, making it superficially even simpler. That is what you get with Golang. Not too surprising really. Some people like it, some don't.


Technically there's nothing wrong with it. That's the problem. These "gotcha" takes that some random language from 20 years ago already had feature x so talking about it now makes no sense are as common as they're misguided.

A simple example is Go's autoformatter `goftmt`. In my opinion an earth-shattering invention. But to see that you need to go a bit beyond the basic, binary thinking that only true novelty has any value.

There's a fundamental difference between a language like Python that has 2-3 competing autoformatters used by <5% of Python programmers [1] and Go's single autoformatter used by >95% of Go programmers.

Technically Python and Go are equivalent here, but in reality the gap between them is enormous. That's why pointing out that feature x has existed since the 70's or that every other language has x too isn't a useful insight.

[1]: I'm of course talking about the state of Python and Go around the time Go was released.


Table-based testing is property-based testing. Nothing in property testing says who has to choose the values used.

There are libraries that automatically test valid/invalid inputs for types. But automatic case generation is not required for property based testing.

A refinement is pairwise testing, where the machine chooses some minimal subset of cases to span the test space to combat combinatoric explosion and run in a reasonable amount of time.

https://www.pairwise.org

Further refinement is possible to cut the run time by pruning the cases-to-be-run based on previous failures. Unfortunately, then things start to seem even more "magical" and there is no set number of test cases always run and non-technical (or non-testing) management tends to have problems with that. One day we'll have nice things . . .


I've never heard of parameterised tests being called property based testing - they don't define a property just a list of cases. At that point we may as well call all unit tests property based tests and lose any useful distinction.


I agree that calling table based testing 'property based testing' is stretching the definition.

> Nothing in property testing says who has to choose the values used.

I take issue with that. Properties of interest are some variant of 'for all x from some set s, some property holds.'

Then instead of testing all possible x, we randomly sample. But it is important that our tests define the set s.

When we do example based testing (whether traditional unit testing, or table based testing), the set s is usually only implied. So you might be testing your functionality for strings 'foo', and 'bar' and perhaps 'foo bar'. But it's not clear, whether you also mean it to pass for the string '' or '안녕하세요' or '\t\n'. That's at best implied, or perhaps vaguely described in a human readable comment.


There's some efforts to guide test generation for property based testing to make the instruction pointer explore as large a space as possible.

This effort is more mature in the fuzzing community. See eg American Fuzzy Lop https://github.com/google/AFL


And mutation testing... which in my opinion is just way easier to do and more mentally straight forward. Plus you get the satisfaction is actually knowing you're DONE, unlike PBT where you just give up after a certain amount of CPU time has passed.


There's no silver bullet, not even mutation testing. For example, if the code for a certain case is straight up missing, there is nothing to mutate!

I also consider mutation testing to be in a different domain from property based testing, as mutation testing is some kind of metatesting instead. PBT is a 'regular' testing technique. This means that they are in fact complimentary, for example by using PBT to check mutants. (Although mutation testing does seem to assume a sort-of 'fixed' test set, but you can also fix a test set with PBT by e.g. explicitly setting a seed and test count.)


> Plus you get the satisfaction is actually knowing you're DONE

Have you somehow managed to overcome the maxim that "Testing can show that defects are present, but cannot prove that there are no defects" ?


No. But with mutation testing you can know that the test suite tests all behavior of the program.

If that behavior is wrong or not it can't say. For example it can't detect that the code needs MORE behavior! Some if-statement or loop that was never written.


How you write the tests that you're testing with mutatation does not matter. Or, to rephrase, you need both if you want to do mutatation testing (which is really slow).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: