Differ: Tool for testing and validating transformed programs

woodruffw · 2024-01-31T17:21:56.000000Z

Differential fuzzing is woefully underutilized -- our experience is that it consistently[1] finds[2] bugs that "traditional" fuzzing techniques struggle to discover, and that the primary obstacles to its adoption are harness and orchestration complexity. DIFFER goes a long way towards overcoming those obstacles!

(FD: My company.)

[1]: https://github.com/trailofbits/mishegos

[2]: https://x509-limbo.com/

22c · 2024-01-31T17:30:54.000000Z

Thanks for your (companies) work on both this and graphtage!

Out there question:

Do you guys have any plans on releasing a tool which could be used for diffing filesystems?

eg. Run the tool, run an arbitrary program which modifies the filesystem, run the tool again and it reports on what has changed?

Using existing diff tools for this right now is clunky. I'm surprised there's not a go-to tool used by security researchers (or perhaps there is and I don't know about it).

wisemang · 2024-01-31T18:59:26.000000Z

For windows I used to use a tool called regshot for this. Main use case is Windows Registry diffing (hence the name) but also supports filesystem diffing.

https://github.com/Seabreg/Regshot

woodruffw · 2024-01-31T17:53:37.000000Z

> Do you guys have any plans on releasing a tool which could be used for diffing filesystems?

I don't think we have any immediate plans for this, but it's an interesting idea. I think you could use graphtage for this (since it performs an IR transformation to do diffs against arbitrary graph-shaped inputs), provided you're able to transform your FS changes/metadata into a graph-shaped object (even just a JSON array of events).

More specifically, assuming you have a journaling FS, I think you could do something like "record the journal for the period you're interested in, and then build a graph of (event, inode) items during that period." But there might be edge cases I'm not thinking of.

armchairhacker · 2024-01-31T19:32:34.000000Z

A big problem is that proving that transformations preserve semantics is very hard. Formal methods has huge potential and I believe it will be a big part of the future, but it hasn't become mainstream yet. Probably a big reason why is that right now it's simply not practical: the things you can prove are much more limited than the things you can do, and it's a lot less work to just create a large testsuite.

Example: CompCert (https://compcert.org/), a formally-verified compiler AKA formally-verified sequence of semantics-preserving transformations from C code to Assembly. It's a great accomplishment, but few people are actually compiling their code with CompCert. Because GCC and LLVM are much faster[1], and have been used so widely that >99.9% of code is going to be compiled correctly, especially code which isn't doing anything extremely weird.

But no matter how large a testsuite, how many tools like these exist, there may always be bugs. Formal verification provides guarantees that tests can't, and as we start relying on software more it will become more important.

[1] From CompCert, "Performance of the generated code is decent but not outstanding: on PowerPC, about 90% of the performance of GCC version 4 at optimization level 1"

pfdietz · 2024-01-31T19:53:56.000000Z

If I understand correctly, CompCert doesn't promise to compile correct programs correctly. Rather, it promises that if the compile succeeds, it's correct. This means it suffices to have (for example) a proved-correct check that a coloring register allocator allocated correctly, but that's allowed to abort if the coloring is incorrect. The register allocator itself need not be proved correct.

In practice, this is probably fine, since you won't run into bugs where the compiler aborts very often (and if you do, you know that you did.)

pfdietz · 2024-01-31T18:16:20.000000Z

Differential testing is wonderful on compilers. You can compare different compilers, or the same compiler with different flags, or a compiler on code that's undergone transformations that renders its behavior on specific inputs unchanged.

William McKeeman started the current wave of this in 1998 at DEC; Regehr and coworkers at U. of Utah with Csmith, and so on. I used it in 2005 for Common Lisp and continue to use it today for testing SBCL.

Taikonerd · 2024-01-31T17:30:36.000000Z

This is super interesting! Sort of like property-based testing, but the "property" is that the new code and the old code should have the same output given the same input.

I bet that fuzzing old, unmaintained programs leads to many, many crashes. (If the old code crashed on a certain input, should the transformed code crash too, to maintain "compatibility"?)