The problem with property-based tests is that for non-trivial input data, like e...

eru · 2024-02-01T12:19:04.000000Z

You are right that shrinking is a problem for QuickCheck inspired property based testing libraries. Please do have a look at Python's Hypothesis.

Hypothesis is perhaps the most advanced property based testing library. Or at least: miles ahead of QuickCheck. Hypothesis 'solved' shrinking by integrating it with generation. See https://hypothesis.works/articles/integrated-shrinking/ for an overview. And https://hypothesis.works/articles/compositional-shrinking/ for some more.

I am glad that you found a way to (manually) generate useful test cases. I would like to learn more, and I wonder how we can perhaps automate your technique to make property based testing better!

(David R. MacIver, the author of Hypothesis, has lots of interesting ideas. He says that property based testing is such a great concept that QuickCheck, despite making all the wrong design choices whenever they had any choice, was still a revolutionary library.)

mrkeen · 2024-02-01T12:47:33.000000Z

> Or at least: miles ahead of QuickCheck

I'm not so sure about this. The blog post saying that it's better to tie size to generation is dated 2016.

It looks like QuickGen was tying size to generation in 2008: https://hackage.haskell.org/package/QuickCheck-2.1.0.1/docs/...

EDIT: I take it back, it looks like there is shrinking in modern QuickCheck https://hackage.haskell.org/package/QuickCheck-2.14.3/docs/T...

GrumpySloth · 2024-02-01T19:18:53.000000Z

Hypothesis approach looks interesting. I’ll look into it.

As for my approach to generation, there’s not much to it. Instead of generating arbitrarily complex values and then trying to shrink them to simpler ones, I start with simple values and, if they work, proceed to more complex ones. In every test run I use the same values. It’s basically a brute-force, which omits values which are sufficiently similar to values already tested.

The omission is achieved by having a base set of simple values where each is judged by me to be sufficiently different from the others (I basically copy e.g. tricky numbers/unicode values from quickcheck implementations, which are used inside generating code), and the base set is ordered from simplest to less simple. Then I run a loop which goes over all elements of the base set, run the test, then take a function which based on the base set produces more complex values, run the test on that etc. So the earliest fail is going to be the simplest value it can generate.

mrkeen · 2024-02-01T12:58:16.000000Z

> Another issue is that randomisation introduces flakiness when the input space is large.

But that is precisely the point of property-based testing - as opposed to hard-coded inputs. The flakiness is in your code, not your test. Change your code until the PBT stops flaking.

GrumpySloth · 2024-02-01T13:43:43.000000Z

Door A: tests find bugs sometimes (property-based)

Door B: tests find bugs almost always with a fraction of the code (my approach)

I choose Door B.

I still test properties tbh. I just don’t let randomness ruin it and don’t need to implement any shrinking.

eru · 2024-02-02T14:41:30.000000Z

Python's Hypothesis automatically saves the random seeds that find counterexamples, so the next time you run your test suite, you can reproduce that bug.

mrkeen · 2024-02-02T16:04:46.000000Z

If your code does the same thing every time it's run.

eru · 2024-02-03T02:35:45.000000Z

Sure. But that's a problem with running any test suite (property based or otherwise).

mrkeen · 2024-02-03T18:58:39.000000Z

In other words:

Python's Hypothesis automatically saves the random seeds that find counterexamples, so the next time you run your test suite, you may or may not be able to reproduce that bug.

bluGill · 2024-02-01T13:56:21.000000Z

This is not an exclusive choice. Choose all doors as each will gike you something the others don't. All includes things not yet invented.

GrumpySloth · 2024-02-01T14:16:42.000000Z

I have a finite budget of code and time I can afford to spend. One of the choices gives me a much better RoI in my experience. I prefer to spend the rest of the budget on the main code.