What Is Fuzz Testing?

nsajko · on April 7, 2021

> Any part of your software that consumes data across a trust boundary is a perfect candidate for fuzzing.

Sure, but one thing I don't understand is why fuzzing is not used more often for testing basically any pure function (whose output depends only on its input, and which has no side effects, or whose side effects are easy enough to rollback during fuzzing).

This is the method: take multiple distinct implementations of the same function/algorithm and give them all the same data. Usually, you'd just be checking for them crashing or tripping up some sanitizer; but now you can check if each implementation's output matches the outputs of all the other implementations, and crash if any of the outputs doesn't match (can be accomplished with __builtin_trap(); in C/C++). The fuzzer will register this crash like any other failure, and then you know you have a bug, and with which input the bug manifests.

> Current fuzzing tools (open source or otherwise), aren’t very developer friendly, often forcing users to learn completely new testing paradigms, work with low-level structures they don’t understand, and significantly modify their application to get any results at all.

This list of issues seems sort of manufactured, and I doubt this FuzzBuzz product can improve the situation, as the article doesn't give any information on what the product actually is.

> Fuzzbuzz uses automation and intelligence to make fuzz testing as developer friendly as possible [...]

Advertising a product without saying anything about it is off-putting to me. I know it works sometimes, but I think that's usually when you're "big" already, like Coca Cola.

andrei · on April 8, 2021

>Sure, but one thing I don’t understand is why fuzzing is not used more often for testing basically any pure function [...]

Agreed 100%, and is actually what we encourage people to do. Since this is an intro article though, we wanted to keep things simple, and everyone understands the danger in accepting inputs over a trust boundary. Your suggested method is what the fuzzing community calls differential fuzzing [0]. It’s been incredibly effective at finding bugs in crypto libraries [1], and is currently being used to fuzz different Ethereum node implementations [2]. There are other ways you can fuzz functions, and we sort of hint at this in the post when we say:

“If you can define a property that must hold true for any given input (also called an invariant), then the fuzzer will look for inputs that break your invariant”.

Usually this translates into writing assertions the same way you might when you’re writing property-based tests [3]. In fact, I think the fuzzing community has a lot to learn from property based testing. These are more advanced topics though, that we hope to cover in a later post, and why we omitted these details from this one.

>This list of issues seems sort of manufactured [...]

Developer friendliness means different things to different people depending on their area of expertise, years of experience, or interest levels. While the list may seem manufactured, we’ve found that unfriendly tooling and uncertainty about what to tackle first can turn developers off even trying to write a fuzz test at all. Understanding what makes a good fuzz test, instrumenting your code properly, running many fuzz tests at scale, and triaging and interpreting the results of a fuzzing run can make fuzzing prohibitively difficult for a new engineer to set up. This is what we’re focused on solving.

>Advertising a product without saying anything about it is off-putting to me [...]

Fair enough. Fuzzbuzz isn’t quite ready for public access yet, so that’s why we’re a bit vague here, but the intention was not to advertise our product (and is why we only wrote a couple paragraphs at the bottom). We were just excited to write a post about fuzz testing, and figured anyone who’s interest was really piqued could get in touch. We hope to expand this post and use this as an educational resource long-term.

[0]: https://en.wikipedia.org/wiki/Differential_testing

[1]: https://guidovranken.com/2019/05/14/differential-fuzzing-of-...

[2]: https://github.com/sigp/beacon-fuzz

[3]: https://hypothesis.readthedocs.io/en/latest/stateful.html?hi...

nsajko · on April 8, 2021

Wow, thank you so much for these links, I'm ashamed of not knowing about differential fuzzing before!

andrei · on April 8, 2021

No reason to be ashamed, it's still a fairly niche concept. We're huge fans of it though. We'll definitely be writing an in-depth post about it in the coming weeks/months.

jononor · on April 8, 2021

This kind of fuzzing-but-for-correctness is more commonly called property based testing. In that world, the use of a reference implementation as you suggest is often called an "oracle" - and is a good and simple example of a very effective property to test (agreeing with oracle X).

Property based testing is not as popular as it should though. But for most programming languages there are decent libraries/frameworks for it, and generally a much more ergonomic choice than using a fuzzing library for this. Most frameworks will do automatic minimization of failure cases, for example. In Python one has Hypothesis as a great implementation. It also has tools for testing stateful code, using a state machine specification.

AzzieElbab · on April 8, 2021

I thought so too after reading the article. Different type of fuzzing is fuzzing the tests, messing around with assumptions while making sure they do not hold when they shouldnt

dogma1138 · on April 8, 2021

Microsoft’s OneFuzz is tackling some of these issues

https://github.com/microsoft/onefuzz

The biggest problem with fuzzing when it comes to “developer friendliness” isn’t just how to setup the fuzzer and the fact that you need to often write quite a bit of additional code to support fuzzing but that the results aren’t easily consumable.

Getting a fuzzer to cause a crash or some unhandled exception isn’t particularly difficult understanding the actual implication of such crash is where these tools “fail”.

SAST / DAST tools with all their issues such as false positives and relatively limited coverage at least provide actionable results.

Fuzzing not only requires a much higher understanding of the code itself and of its execution to get it working in the first place but the results are often useless for many if not most developers.

Basically it doesn’t help you bridge the gap between seeing a BSOD or a kernel panic and getting a working zero day.

rmasters · on April 9, 2021

The link below is a relatively simple example of differential fuzzing between implementations in different programming languages using AFL. It works by reading and writing to a second process it spawns and aborting on differences. Before writing this, I could not find any working examples of this technique, although I'm sure they are out there, somewhere.

https://github.com/ironmeld/doubleback/blob/main/src/c/tests...

pydry · on April 8, 2021

>Sure, but one thing I don't understand is why fuzzing is not used more often for testing basically any pure function

IME bugs pure functions are responsible only for a fraction of bugs in most code. The vast majority of bugs I see are requirements/communication related or due to the interactions between different sub systems.

YMMV and it's obviously heavily project dependent (e.g. if your job is writing parsers).

Rendello · on April 8, 2021

Property-based testing libraries are good for this. They're usually not /quite/ fuzzing, as they're not as advanced, but they generate millions of tests are reduce failing examples to their simplest forms.

I talk about it in a few of my recent posts, if anyone's looking into it.

2OEH8eoCRo0 · on April 7, 2021

I wrote a rudimentary fuzzer at work for the sole purpose of generating core dumps. It worked so well that I got a stern talking to about not helping them release the software.

andrei · on April 7, 2021

Basic fuzzers can surprisingly go a long way. Barton Miller (the professor who first coined the term fuzzing), actually wrote a paper last year [0] where he just ran a very basic fuzzer against a bunch of common UNIX tools. Even after all these years of testing/usage, they still managed to find a ton of issues.

[0]: https://arxiv.org/pdf/2008.06537.pdf

Forge36 · on April 7, 2021

Isn't that the 1995 paper? The original study was 1990.

He was my teacher, and he taught this in class :)

I'm pretty sure his fuzzer was "for each file in system send as input to program X" That was the original tool, revisited was repeating the original test (might have tested against more programs which had become common since then)

andrei · on April 7, 2021

Argh, this is the paper I meant to link: https://arxiv.org/pdf/2008.06537.pdf

(Updated the above comment as well)

Forge36 · on April 7, 2021

Did they expect to release in a non working state?

kelnos · on April 7, 2021

Most companies, for better or worse, value time to market over fixing every last bug.

neolog · on April 8, 2021

Sounds like it didn't work very well for the task of finding significant bugs?

Ekaros · on April 7, 2021

At simplest and most straight forward level fuzz testing is pretty simple to get started with. Collect some input(API calls, files, etc.), pass it to fuzzer(for example radamsa[0]), throw it at program and observe...

Ofc, depending on system collecting input and sending it to system might be bit more complicated. Hardest part is often the observing and finding that an error happens.

Not that this gets you full coverage, for more complex things like protocols something custom that takes lot more effort is probably needed.

[0] https://gitlab.com/akihe/radamsa

cjlovett · on April 8, 2021

Nice article. In a way you could think of the following as "concurrency fuzzing": [0]: https://news.ycombinator.com/item?id=26718273

tommek4077 · on April 8, 2021

Great joke to not showing any content if you dont allow Javascript. Is this the first Fuzztest itself? ;)