Automatically fixing bugs in C programs with Genetic Algorithms

tansey · on July 25, 2011

Their research is certainly cool and novel. On the surface, this is a straight-forward application of genetic programming (evolving abstract syntax trees):

1. Take function which fails some test case(s)

2. Parse AST

3. Find a bunch of other lines of code in the program and use those as possible mutations

4. Evolve until test performance is improved

The trick is #3. They are going on the hypothesis that the solution to bugs are often found in other parts of a program. For instance, you pass in a variable and forget to check for it being null. It's likely that you have a check for that somewhere else in the program, and if so, then you can add that (templated) line of code into your buggy function and it will now pass that test case.

It's certainly not a panacea, but it does work remarkably well for many bug cases.

peterhunt · on July 25, 2011

Actually, I tried to implement something like this as an undergrad and I found that the search space was simply intractable and the fitness function too coarse grained (I.e binary). What I really like about this approach is that they do a trace of the program execution as a heuristic to guide the evolutionary strategy. That's awesome.

As a side note, a few months ago I reached out to this team to ask them a few questions. They are super nice.

rat · on July 26, 2011

How readable/maintainable is the resulting code?

Volpe · on July 26, 2011

That's the next genetic algorithms problem :P

rcfox · on July 25, 2011

John Regehr has comments on when this should or shouldn't be used: http://blog.regehr.org/archives/383

sehugg · on July 25, 2011

Yep, an evolutionary algorithm will take advantage of any shortcut and any loophope in your fitness function. They're best for optimization, not for ensuring correctness :)

eru · on July 25, 2011

If you care to read the output, that seems like a good way to find errors in your fitness function.

peterhunt · on July 25, 2011

That's a great point. Specifically, it initially seems like it would be a bad idea to have an automated system repair code that humans will need to maintain. However, I've looked at some of the examples of fixes this project has created and many of them are relatively small, and perhaps handing them to humans for code review, cleanup and "humanizing" might be a valid way to go.

jfruh · on July 25, 2011

I've often thought about evolutionary engineering like this, comparing it to biological systems. The human body (and really all biological life) is a remarkable machine -- it runs for decades, much longer than just about anything people build, and under most circumstances can repair itself quite impressively. But if it needs repair beyond what it can do itself, you run into a harrowing mess, with difficulty swapping in new parts, some parts doing double duty, difficulty separating various systems, etc.

It's also not extensible at all, which is a big problem for software engineering. An evolutionarily derived codebase might do really well at the problem that provided the parameters for its initial evolution, but trying to get it to do something else might be difficult or impossible. I suppose you could use more evolutionary iterations to get it to do something else, but maybe it'd be easier to start from scratch at that point?

_delirium · on July 25, 2011

There was a fascinating example of strong dependence on the initial problem specification in an early hardware-evolution experiment in the 90s. The evolved chip for an audio-processing task was massively smaller than the best human-designed chip, which turned out to be because rather than sticking to digital-logic design, the evolutionary process had taken advantage of electromagnetic interference between circuit components as a core aspect of the design (this was possible because the evolution was done on FPGAs, not in a simulator). But as a result, it only worked at the temperature it was evolved at!

One of the papers: http://www.informatics.sussex.ac.uk/users/adrianth/ices96/pa...

jfruh · on July 25, 2011

Yeah, I remember hearing about this! Wasn't there also a creepy case where an evolved circuit involved FPGA that weren't attached to anything else in the circuit, and yet were still needed for the thing to work properly? I seem to recall it similarly involved electromagnetic interference.

william42 · on July 25, 2011

I think that's the same one.

_delirium · on July 25, 2011

Yep, see the discussion around Figure 7 in this section of the paper: http://www.informatics.sussex.ac.uk/users/adrianth/ices96/no...

They drew a graph of any units on any connected path from input to output, and then ran a search to see if any others had any effects, by clamping random units' values to 0 or 1 and seeing if that degraded the result. They found that one unconnected unit in particular degraded performance significantly (the bottom/right gray-colored one in Fig. 7): it had several active connections routed through it, but its own output went nowhere. So they hypothesized that it was modulating the signal in a way not captured by the mathematical abstraction of the FPGA as a digital circuit.

click170 · on July 26, 2011

One has to wonder then .. What if there was a spec to make sure it worked in any temperature? Would it not then attempt to evolve to fit the spec, perhaps fixing the temperature problem..?

_delirium · on July 26, 2011

Yeah, that's what some of the followup work is. Since it's evolving physical hardware, though, temperature is a real physical quantity that affects performance, not a spec that it takes as input. So they experiment with varying the temperature during the evolution process, which should weed out members of the GA population that were only fit at very specific temperatures. That does complicate the setup a bit, because you have to actually vary the temperature using heater/AC.

munin · on July 25, 2011

it's funny, this is exactly how lots of lower-division undergrads at my school approach writing code! i've heard professors refer to it as "debugging via brownian motion"

Someone · on July 25, 2011

You have very persisting undergrads. Most would skip the "check whether the tests that passed previously still pass" step.

munin · on July 25, 2011

their grade is (sometimes) largely based on how well they pass unit tests.

Havoc · on July 25, 2011

Is it just me or is this a horrible idea? Surely the GA will evolve in ways that fix the problem but introduce some new problems that you haven't even considered/tested for?

breckinloggins · on July 25, 2011

Sure, if you intend to just run the algorithm and check in the changes proudly proclaiming "it's fixed!"

A better use for it would be as an assistant to help you track down the root cause of stubborn bugs.

It's kind of like git-bisect in reverse.

dkubb · on July 26, 2011

I've used a similar technique to this called Mutation Testing to find code that wasn't tested by a unit test.

It doesn't use a GA, but it's still pretty neat. It parses the code into an AST and then mutates each node in the AST, rerunning the tests to see if they pass or fail. They are supposed to fail; if they continue to pass, it means a state isn't being tested.

The computer acts as my assistant tracking down cases I forgot to test, so that I can write the test to catch the mutation. Sometimes it even finds sections of code that are impossible to reach normally, allowing me to remove the dead code paths that I might otherwise have missed.

aidenn0 · on July 25, 2011

On the other-hand a sizable percentage of developers already use this technique, informally, for bugfixing. My intuition is that replacing those developers with this algorithm would only improve code quality.

cpeterso · on July 25, 2011

A more tractable problem is genetic Acovea (Analysis of Compiler Options via Evolutionary Algorithm), a genetic algorithm to find the "best" optimization CFLAGS for compiling programs with gcc. Unfortunately, the Acovea project is no longer maintained.

http://web.archive.org/web/20101223023921/http://www.coyoteg...

I've been told that Intel's compiler performance team has investigated similar genetic algorithms.

InclinedPlane · on July 25, 2011

Such a hugely promising area of research. Also consider something even finer grained, adjusting optimization options on individual blocks of code at run-time or through instrumented binaries using genetic algorithms. It could revolutionize optimization.

erikb · on July 25, 2011

I enjoy the idea a lot, that one day people won't need to program their machines anymore but can discuss with them what they actually want. Yes, there will still be the need for the good old coding (maybe even more, because someone needs to write these eloquent programs, right?). And yes, the resulting programs of these discussions will be a lot more unclear (and thus prone to all kinds of bugs). But for what does your grandfather need a 100% correct program? He needs some food (no matter if carrots or beans), some clean clothes (no matter if the red or the blue shirt) and someone who reminds him to take his pills and tell him how the weather tomorrow maybe could be.

I wonder why we don't read more about these topics. Not just because of the opportunities for less educated people and the chance to automate repeatable tasks we can't repeat just yet. There are so many things where we are actually a lot more flexible then we think. For example when I prototype my new android game, I don't really care what way finding routines the game characters use. In the beginning I just want to put together something really fast to see if my game idea is worth anything. I would really like an IDE to just fill in the empty spots itself with anything that might work. And after time, using my more concrete programming input and the data from the test runs that every coder does while developing the IDE could improve the code itself.

I imagine an IDE that I can tell "I want to code a computergame. It should be rougelike. Very rougelike computergames are rouge, nethack, ADOM. A little rougelike computergames are Diablo, Dungean Siege, Baldurs Gate, Oblivion, World of Warcraft. Not rougelike computergames are Counter Strike, Doom, Sim City. Not computergames are VIM, Firefox, Word. Make prototype!"

eru · on July 25, 2011

Vi is actually quite nethack-like in its interface.

brindle · on July 25, 2011

1. The search space is all possible abstract syntax trees. That is a large search space.

2. The correct solution must also satisfy all test unit tests. Basically the bug fixing is driven by the fitness function that determines if the bug is fixed and the program still satisfies the specification, i.e. the unit tests.

If you have a complex program the challenge will be coming up with a complete specification of the program's behavior.

aninteger · on July 25, 2011

One of the problems with machine generated code is that it's rarely readable by human beings so you'd still have to take a lot of time to see what the algorithm changed. The other huge problem is false positives (although OpenBSD made lint work for them so maybe it's not that bad)

amikula · on July 25, 2011

Fascinating concept. What if humans wrote the tests, and a genetic system like this used a specialized language (machine code or JVM bytecode, for example) to write a solution? What level of sophistication could a system like this attain?

_delirium · on July 25, 2011

That's been an idea since the 60s in various forms, but tractability tends to be a big problem. The classic approach is to randomly generate Lisp code trees, guided by various heuristics or randomized-optimization algorithms (e.g. genetic programming). There have been some successes, but typically smallish algorithms for well-defined problems, rather than "normal" full-sized programs.

Some other attempts include: inductive logic programming (automatically infer logic programs from desired example output), partial programming (use statistical machine learning to fill in incompletely specified behavior in a program), and renewed attempts to use exhaustive or heuristic search where the search space is constrained by strong modern type systems (e.g. MagicHaskeller).

rcfox · on July 25, 2011

I assume it would be like finding the interpolating polynomial of a bunch of points in a row. (Like this: . . . . . .) The resulting polynomial would go through all of the points, but have very large peaks/valleys in between.

Essentially, you'd get a system that passed all of your tests, but produced garbage for anything not covered by the tests.

eric-hu · on July 26, 2011

Your post got me excited because it sounds like you're describing Runge's Phenomenon ( http://en.wikipedia.org/wiki/Runge%27s_phenomenon ).

The wikipedia article doesn't have the best illustration, but I think the inherent idea is really wise: in trying too hard to meet your initial constraints, you can come up with a solution that's only useful at those constraint points.

rcfox · on July 26, 2011

I'd never heard of Runge's phenomenon, but it does indeed sound like I was describing it. Thanks for giving me a name for it.

eric-hu · on July 27, 2011

Interesting, I thought that was exactly what you were talking about :)

I learned about it in numerical analysis. It illustrates the downside of trying to be too precise--you can make a polynomial that will go through an arbitrary number of data points.

As the number of points increases, your function will look less like a line and more like a magnitude 9 earthquake on a seismograph. The function will pass through all the points used to define it. However, it'll be useless when predicting the original data's behavior, as it changes too quickly on small input.

Instead, mathematicians find more useful functions by relaxing the conditions so that the model function only has to come 'near' the data points.

peterhunt · on July 25, 2011

That's true, but I think this problem will eventually be tractable. We have tools like QuickCheck that can generate randomized test cases and ensure certain properties are maintained. While this probably isn't enough today, I could imagine this becoming a good starting point to improve this technology.

zwieback · on July 25, 2011

I think if you took this technique plus input from compiler warnings and static code analysis and rolled them all together into a real-time IDE advisor you would have a nice product.

I don't think fixing programs in post processing is such a great idea but if the feedback can be used to make me a better programmer I'm all for it.

Scanning neighboring code is a really good idea, when I'm working on someone else's code I'll try to follow the exisiting practices instead of inserting bits and pieces in my own style.

caustic · on July 25, 2011

That's an interesting concept, but it seems like they are not alone in this field. Just recently I stumbled upon a web page by Moshe Sipper, who seems to work on a very similar topic "Darwinian Software Engineering":

    http://www.moshesipper.com/finch/

He makes a rather grandiose claim -- “We believe that in about fifty years' time it will be possible to program computers by means of evolution. Not merely possible but indeed prevalent.”

5hoom · on July 25, 2011

This just looks so cool. I'm sure this would be very useful on giant legacy codebases where the author is long gone but the bugs remain, and I just like the idea of using ai techniques to refactor code. Seems all futuristic and stuff ;)

zb · on July 25, 2011

Fortunately giant legacy codebases are well known for their excellent automated test coverage, otherwise this comment would sound ridiculous.

ForrestN · on July 25, 2011

I agree. Fun to imagine the software bugfixing future versions of itself.

jpadkins · on July 25, 2011

To some, self improving software is a frightening.

michaelochurch · on July 25, 2011

Watch out with this. If you auto-fix a Java program, the result is that none of the original code survives; it's rewritten from scratch in Scala!

braindead_in · on July 25, 2011

Reminds me of I Robot. Someday these C programs will group together and develop the ability to think independently.