When I saw this title, I was expecting to find something about using a computer to detect things like "use after free" errors. However, skimming the paper, it appears to refer to altering the source code to remedy compile-time errors rather than to finding flaws in program logic. The notion of the authors is to repair broken inputs so that they compile correctly despite syntax errors by the programmer.
I don't think this is a particularly useful thing to do. Essentially it is somewhat like making a compiler which accepts broken inputs using heuristics to guess at what the programmer intended to write. That will not result in better programs being written, rather it will result in the programmer not only writing worse code but not even being aware of the flaws in their code.
I think that what was attractive here is that the training set is very easy to generate, as you just take a bunch of existing code and break it at random places, and then feed the network the original code as a Ground Truth.
Creating a corpus of "use after free" that's big enough would take forever.
> You use this to get suggested corrections when things don't compile.
It repeatedly says that the aim is to fix errors. Assuming, however, that it is intended to give better error messages, here is the output of Clang on the input C program given as an example:
$ cc gupta.c
gupta.c:3:5: warning: incompatible redeclaration of library function 'pow'
[-Wincompatible-library-redeclaration]
int pow(int a, int b);
^
gupta.c:3:5: note: 'pow' is a builtin with type 'double (double, double)'
gupta.c:14:23: error: function definition is not allowed here
int pow(int a, int b){
^
gupta.c:18:14: error: expected ';' at end of declaration
return res;}
^
;
gupta.c:18:14: error: expected '}'
gupta.c:4:11: note: to match this '{'
int main(){
^
1 warning and 3 errors generated.
It identifies the problematic line of the program code better than gcc. The function declaration within main is not legal. I tried this with the gcc compiler and did not get the error above. Running "indent" on the code would have revealed the problems:
$ indent < gupta.c
/INDENT Error@18: Stuff missing from end of file */
A simple count of braces, { and }, would also have revealed the problem.
A simple way to get suggestions is to take your finite state machine (that consumes tokens) to a special error state and just analyze what got you there. You'll pick up quite a lot of errors.
There was similar work by W. Zaremba (OpenAI I think). LSTM trained to interpret python source code. Took a ton of hackery to get it to not fall apart on simple operations and it continued making indexing errors. I wouldn't read too much into these results.
I keep wondering why people don't mine open source commit histories for bug fixes to train a classifier to recognize buggy vs non-buggy code in order to help draw more review attention to code that the classifier thinks smells buggy.
> Change classification uses a machine learning classifier to determine whether a new software change is more similar to prior buggy changes, or clean changes. In this manner, change classification predicts the existence of bugs in software changes. The classifier is trained using features (in the machine learning sense) extracted from the revision history of a software project, as stored in its software configuration management repository.
This is depressing and typical of academic research that apparently has immediate useful application. They apparently just did it to publish a paper and will abandon it now. Nobody else is going to have the motivation to recreate their work into a usable piece of software. It's just one more unfinished project on the massive heap of unrealized and forgotten academic work.
Alas, spending time on writing production-ready software is career suicide in academia. Conferences and journals value novelty, and are not interested in incremental improvements.
If you're really lucky (EvoSuite [1] / Google), BigCompany will take interest in your project and give you some money. The expectation is that you'll hire students to improve your software, and then these students will go on to work for BigCompany.
It isn't clear that this has immediately useful applications, so probably nothing lost. Automatic fix projects are scary in industry. What if you had a co-worker that had 80% of their contributions rejected during code review? That coworker would be DeepFix.
I don't think this is a particularly useful thing to do. Essentially it is somewhat like making a compiler which accepts broken inputs using heuristics to guess at what the programmer intended to write. That will not result in better programs being written, rather it will result in the programmer not only writing worse code but not even being aware of the flaws in their code.