Why wouldn't you have an LLM write some code that uses something like libfuzzer ...

moyix · 2024-03-10T21:32:17 1710106337

They're actually orthogonal approaches – from what I've seen so far the LLM fuzzer generates much higher quality seeds than you'd get even after fuzzing for a while (in the case of the VRML target, even if you start with some valid test files found online), but it's not as good at generating broken inputs. So the obvious thing to do is have the LLM's fuzzer generate initial seeds that get pretty good coverage and then a traditional coverage-guided fuzzer to further mutate those.

These are still pretty small scale experiments on essentially toy programs, so it remains to be seen if LLMs remain useful on real world programs, but so far it looks pretty promising – and it's a lot less work than writing a new libfuzzer target, especially when the program is one that's not set up with nice in-memory APIs (e.g., that GIF decoder program just uses read() calls distributed all over the program; it would be fairly painful to refactor it to play nicely with libfuzzer).

fooker · 2024-03-10T23:01:36 1710111696

Because you want to fuzz more effectively, not write a fuzzer more effectively.