Vellvm: Verified LLVM

nickpsecurity · on Nov 20, 2015

Those liking this will probably like the work below. They did a verified validator for micro-optimizations (around 400) in LLVM's instruction combine pass.

http://rosaec.snu.ac.kr/publish/2014/techmemo/ROSAEC-2014-00...

Another for translation validation:

https://cseweb.ucsd.edu/~lerner/papers/cav2011.pdf

devit · on Nov 20, 2015

Is it feasible to prove the correctness of all existing LLVM passes this way?

What about the backends and Clang and Rust frontends?

gsnedders · on Nov 20, 2015

The frontends are hard because you need to start with a formal definition of the language, really. Rust doesn't really have anything resembling a spec at the moment. C/C++ you still have the first problem of creating a formalization of the spec before you can verify the front-end.

kachnuv_ocasek · on Nov 21, 2015

C actually has multiple formal semantics. The most recent I'm aware of is Robbert Krebbers' (almost) complete formalizatiom of C11 in the CH2O project.[1] See also his upcoming PhD thesis[2] for more details.

[1] http://robbertkrebbers.nl/research/ch2o/

[2] http://robbertkrebbers.nl/thesis.html

gsnedders · on Nov 22, 2015

Arguably C doesn't: C is just what the spec says. You can try and formalise what the spec says (as, rightly, many have done), but it's hard to guarantee correctness of that formalisation.

nickpsecurity · on Nov 20, 2015

https://news.ycombinator.com/item?id=10604379

Work in progress for correctness that's being hit from many, different directions. I don't know of anyone doing the front-ends, though. Not for LLVM, anyway.

gamesbrainiac · on Nov 20, 2015

Unfortunately, I couldn't decipher the significance of this from the first paragraph. What does this mean for LLVM? How does it make working with LLVM better?

nickpsecurity · on Nov 20, 2015

It's part of verification work in general. One must have a formal way of representing a program before proving key properties (eg safety, security) about it. Formal verification also often uses refinement to move from abstract to concrete with intermediate forms in between. See CompCert compiler's stages for how that might apply to LLVM (front ends especially). Also, static analysis tools like Astree Analyzer or SPARK Ada's Examiner have some internal model of the language to analyze to prove absence of bugs. Formal semantics helps with them, too.

So, it's kind of a foundation to build other work on that will let one rigorously verify the correct use, optimization, or compilation of LLVM programs. The link below has lots of examples of awesome work in security or verification that builds on LLVM. Some use Vellvm already with most able to benefit somehow if work is put in.

http://llvm.org/pubs/

geofft · on Nov 21, 2015

It uses Coq, a language for proving things about code (and writing code in a way you can prove things about), to interpret LLVM IR and verify the correctness of transformations on LLVM IR.

IR ("intermediate representation") is the form of code inside the compiler, used after a frontend (like clang or rustc) has parsed and made sense of the particular language you're using, and before the compiler is ready to output native code. All optimizations happen on IR.

So, this is a toolkit for letting you prove the correctness of LLVM optimizations (i.e., that some IR has the same behavior before and after the optimization pass), so that an optimization pass isn't accidentally introducing exploitable bugs into your software.

cgag · on Nov 21, 2015

Upenn seems like a really wonderful place for people who are into software that works. Lot of good stuff coming out of there.