Which language, do you think, is most practical for writing formally verified pr...

neel_k · on Oct 30, 2017

Probably the four most well-known pieces of software with end-to-end verification of functional correctness are the CertiKOS microkernel from Yale (verified in Coq), the SEL4 microkernel (verified in Isabelle), the Compcert C compiler (verified in Coq), the CakeML ML compiler (verified in Isabelle).

Stopping short of full end-to-end functional correctness, there are lot more notable examples, such as the Verve system at MSR (verified type/memory safety using Boogie), and the Bedrock system at MIT (functional correctness down to a fairly low-level IR, done in Coq), the miTLS/Everest TLS stack (protocol correctness verified in F-star, most of the way down to a C-like implementation), and the Iris project (a toolkit for verifying higher-order concurrent programs, currently being used to formalize Rust including unsafe, in Coq).

In machine-checked proof, the three biggest successes are the verification of the Odd Order Theorem (in Coq), the Four Colour Theorem (in Coq), and the Flyspeck project verifying the proof of the Kepler conjecture (in HOL).

In hardware verification, there has been a huge amount of work using ACL2 and HOL, but I don't know this space as well.

Basically, Coq and Isabelle have the most actual successes, but other things like F-star, Boogie and HOL have all worked as well.

logicchains · on Oct 30, 2017

CakeML was verified in HOL (https://hol-theorem-prover.org/), not Isabelle.

neel_k · on Oct 30, 2017

Yes, you're right. Thanks for the correction!

gnulinux · on Oct 30, 2017

Magnificient answer. Thank you.

tom_mellior · on Oct 30, 2017

In addition to the great sibling comment by neel_k: It depends. I don't think that currently there is the one system that is uniformly best.

For example, I wouldn't want to try to write the kind of fiddling with byte arrays involved in cryptography in Coq. F* is certainly better for that. I imagine Dafny is better for it, too. Possibly Why3 as well. Or, if that is really what you want, you might even go for "real" programming languages (ones that exist independently of verification frameworks) such as Ada with SPARK/Ada or C with Frama-C.

If you want to verify purely functional programs, Coq is pretty good, but proving is often boringly manual. Also, the language is more influenced by the OCaml branch of the ML family than by Haskell; for Haskellers there are Idris and Agda. Then there is Isabelle/HOL, which has a lot of proof automation but a bizarre logic.

Oh, I haven't even mentioned Lean. Well. There is really no easy answer: You have to try to get acquainted with several of the frameworks, ideally with an idea for what your application will be. And whichever you choose, it will be hard.

mcguire · on Oct 30, 2017

Having played with dependently typed languages (ATS, Idris) some, and with verification tools (SPARK/Ada, Frama-C/C), I'd like to go a little farther and say that the verification tools are much more like "normal programming" rather than building a proof with the side effect of creating a program.

https://maniagnosis.crsr.net/tags/applied%20formal%20logic.h...

cmrx64 · on Oct 30, 2017

In what way is HOL bizarre?

tom_mellior · on Oct 30, 2017

First, just to clarify, I wrote "Isabelle/HOL", and I'm not claiming anything about any of the many many other systems that have "HOL" in their name. I don't know those.

That said, I think the thing that annoyed me most were the different language levels: "assume P show Q" and "P ==> Q" are both implications, but somehow different. I never quite understood why and how, but there seem to be cases where only one can be proved but not the other. Similarly for different kinds of universal quantification. Your interpretation of "bizarre" may very well vary, but I found these unintuitive and not explained well in the documentation that I found.

(Don't ask for anything much more concrete than this. It's been a while, and I've forgotten the details.)

cmrx64 · on Oct 31, 2017

Fair enough :)

FWIW the differences are largely syntactic. Some of them are due to implication or quantification being in the meta-level (Pure, Λ/long ⇒) vs object-level (HOL, ∀/⟶). Transporting between them (for HOL) is easy and the automation tends to normalize object-level quantification and implication into meta-level.

"assume P show Q" is just Isar-syntax for a structured proof of "P ==> Q" (which is a term).

nickpsecurity · on Oct 30, 2017

For small modules, SPARK with this book from the limited feedback I have:

https://www.amazon.com/Building-High-Integrity-Applications-...

Verification of larger programs is hard for a lot of reasons. Easiest method is to do Design-by-Contract in a memory-safe language. Anything easy to specify in contracts goes there where stuff easier to test for is done that way. If your language doesn't support it, you can do it with asserts, conditionals in functions, or constructors/destructors in OOP.

http://www.eiffel.com/developers/design_by_contract_in_detai...

Then, you hit the app with tests generated from the contracts and fuzz testing that both run while the contract conditions have runtime checks in the code. That means a failure will take you right to the precondition, invariant, or post condition that is broken.

For analyzing distributed stuff, TLA+ is your best bet with an accessible tutorial for a subset available here:

https://learntla.com/introduction/