One can always look at the empty part of the glass, but the quote does continue with a big full part:
"The TLS 1.3 handshake verification is work in progress and still relies on the OCaml extraction mechanism of F* ; thus, the C library still encapsulates the OCaml runtime system.
We have completed verification of the TLS 1.3 record layer it currently extracts to C.
The AES and SHA2 cryptographic assembly routines are verified and extract to assembly via Vale.
HaCl* provides verified C code for multiple other primitives such as Curve25519, Chacha20, Poly1305 or HMAC.
Our test client implements TLS 1.2 + TLS 1.3 Draft 18 and successfully connects to TLS 1.3 test servers. We have a prototype integration of miTLS within libcurl that one can use to git clone a remote repository."
One cool thing with F* is that it starts bringing real contributions such as the HACL* formally verified cryptography to Firefox. https://blog.mozilla.org/security/2017/09/13/verified-crypto...
Realistically, even if F* is not fully foundational it seems to be a very good compromise to write security sensitive applications in terms of proof effort...
Is it easier to write a program that behaves the way you want and a verification to prove the program behaves the way you want, or to just write the program? What if you goofed writing the verification, and now you have a program that is guaranteed to misbehave? How do you verify the verification?
Validation vs verification. Validation: "is this the right spec", verification: "does this meet the spec".
Turns out once you have a formal specification, you can do all sorts of things with it that you couldn't do with just the code sitting alone. You can prove properties about your specification. For example, seL4. Even the abstract specification is quite big and can be hard to reason about in complex cases. How do you know that spec is the one you want? Well, they wrote down some security properties they'd like it to have, and then proved that the abstract spec has those properties (eg, you can configure it to act like a separation kernel; it enforces isolation given an isolated initial condition; it enforces integrity).
How do you know those are the right properties? Well, that's the challenge of software validation isn't it :) Ultimately formal methods doesn't make that problem much harder. You just need to figure out a way to formally phrase the properties you want. A lot of software has very fuzzy validation criteria, and isn't necessarily suited for formal verification for that reason.
In addition to other good answers, there is also the fact that verification doesn't necessarily have to be "deep": "shallow" verification of only some security properties can also be very useful. For example, it may be hard to verify the "correctness" of a web server since the specification of what a web server should do is quite complex.
But (for example, for a web server written in Ada or C) you can also do without a specification and just apply verification tools to only check things like "all memory accesses are valid", proving that you have no buffer overflows that would cause denial of service or security holes.
Of course someone still has to write the verification tool correctly to ensure that it actually gives the safety guarantees you want, but at least it's not you who has to worry about that ;-) And specifying the absence of buffer overflows is comparatively simple.
cmrx64 addressed the validation well. I'll add that there's formal specification tools for requirements (the What) such as Software Cost Reduction and Abstract, State Machines that have helped there. The trick on those is they help be precise about the environment, inputs, and outputs in a way human reviewers can understand. So, the reviewers think on and discuss stuff with their thoughts shared to others in English and the formalism to spot the problems or even improvement opportunities. The main gotcha there is ensuring English and formal specs don't diverge.
Far as verified stacks, the throwaway linked the basic concept but here's two works (old and new) on verified provers so you can see it in action:
The former was early work by Appel et al on getting the TCB of provers down to a few thousand lines of code. The other uses composable logics and small implementation that itself uses the verified tooling of Myreen et al whose research covers a lot of angles:
Most of their work is done in Isabelle/HOL which HOL/Light is aiming to verify in a lightweight way. The idea being you just have to trust the code of the kernel (hundreds to thousands of lines) that checks proof terms and the specs that are fed into them. They're doing everything they can in the proof assistants so the minimal checkers will be able to confirm the operations.
Why didn't refinement types find their way into mainstream languages yet? It seems like it would have a tremendous impact on how programs are written and tested.
I find it even more frustrating that some mainstream languages accept all the complexity of dependent typing while restricting the benefits by hiding it deep within an ad-hoc construction (C++ and Scala are both guilty of this).
C++ and Scala don't really have dependent typing, but rather a single feature that is typical of dependent typing. That is: type-level values of any type.
Both Scala and C++ have a set of type-system features that when combined allow you create functions whose return type can depend on the value of the passed argument. For a good step by step of how to do this in Scala see here[1], and a (unfortunately less in depth) C++ example is here[2].
Statically checking refinements still only works so-so, that is, what do you do when the SMT solver times out? Writing manual proofs (or proof fragments) is great for special-purpose languages used by highly trained people, but it's not quite ready for the mainstream yet.
Checking refinements at run time is considered "too expensive" by many people.
I use the Refined library for scala and it works incredibly well (a few bumps here or there, and some occasional boilerplate for library interop). On my app server, it basically becomes my validation layer for all request data, after which I only worry about object ownership/authorization.
The big win with refined types is from compiler level support though. With relatively simple checking occurring mostly at IO boundaries, you can eliminate or make obsolete massive portions of testing, type checking, manual optimization, etc., and enable massive amounts of automated optimization. In other words, the "too expensive" refinement checking at runtime becomes a net performance and engineering benefit over the whole product lifecycle.
If anything, I would say refinement types make developers more replaceable. If Developer-1 documents all assumptions in the code using refinements, then, when they are replaced with Developer-2, the new developer can be productive more quickly.
To an extent. My feeling is that developers themselves, in general, don't want to learn anything more complicated than they can pick up from a three-month bootcamp and certainly don't want to take on the responsibility for their errors.
Those "major organizations", with the exception of Google and Apple, aren't spearheading language design.
Their point of view has no impact on the evolution of language paradigms, and if you told them they could save money on QA and increase the reliability of their product for virtually no overhead with new paradigms, I don't see why many people would reject the idea.
"Programmers working at Google are early in their careers and are most familiar with procedural languages, particularly from the C family. The need to get programmers productive quickly in a new language means that the language cannot be too radical."
This is a convenient explanation and you will find a lot of evidence for it, but I think it's superficial.
The real question is, why can they afford to take this approach without being outperformed by companies that use better languages and better developers?
For those of us who enjoy exploring more powerful/innovative languages, there is a convenient explanation and a not so convenient explanation.
The convenient one is that the success of these "Blub companies" is largely determined by factors unrelated to software development.
The inconvenient one is that all these interesting and powerful language features do not significantly improve software development outcomes.
I think it is remarkable that some of the companies that are on the Blub side when it comes to software development are among the most cutting edge, research oriented companies in other regards, such as AI.
Also, companies like Google can not only choose among the best candidates, they also have a lot of influence on what people learn before they even apply for a job there. So I simply don't buy the familiarity excuse, at least not for the entire workforce.
I also don't buy the large existing codebase excuse as that would apply equally to new Blub languages and new PL research based languages.
There is simply very little faith in any positive impact of PL research innovations on software development outcomes. And this lack of faith cannot be explained away by claiming that all those lacking faith are simply incompetent.
The likes of Google and Microsoft can and do hire competent PL researchers, just as they hire competent AI researchers. And they do have a massive incentive to improve software development quality and productivity.
I think the truth is that PL researchers simply haven't made the case for the effectiveness of their work.
In part because it's extremely difficult to make that case empirically. All the empirical studies I have seen suffer from huge and mostly unfixable counfounders.
But I think another reason is that PL researchers seem to largely ignore cognitive science and sociology. I haven't seen much discussion about the impact of particular PL features on developers' state of mind in any realistic context.
Nor have I seen much debate about programming languages as group communication tools or about the different roles in which people interact with code as part of a software development process. All of that is apparently considered out of scope.
And that, I think, is the reason why it is so easy for many practitioners to dismiss PL research out of hand. Even where PL research is being picked up decades later it appears more fashion driven than evidence based.
If we look at blub engineering regarding mankind's history, impressive pieces of work were build by recurring to thousand of workers each doing his/her little thing across several years, even generations.
Yet, most likely we would use heavy construction building machines nowadays.
Niklaus Wirth admits on one of his papers that he expected developers would pick Oberon by caring about tools for quality in software engineered, but he was wrong about it.
Hoare has a similar remark on his Turing award speech.
So from my point of view it is a mix of sociology, fashion driven development and office politics.
I should have expanded a bit more. They are spearheading the design of a language or a few languages, and thus have skin in the game. I didn't mean to imply that they are spearheading language design in general.
Pascal also has this, but i remember some Ada-related post from a few months ago here that had code along the lines of (imaginary syntax, i don't remember the exact code):
type Int is range 0 .. 100 check(v)
if (v = 10) or (v = 50) then error;
end;
That would allow all values of v between 0 and 100, except 10 and 50. Note that everything after (and including) "check" is something i came up (and is more Pascal-ish), not what i am talking about. I also remember that this could be used with objects so that you could have something like (this one is Object Pascal-ish, i do not know Ada):
type Foo = object
Enabled : Boolean; Magnitude : 0..1000;
check
if (Enabled and (Magnitude=0)) or
(not Enabled and (Magnitude > 0)) then error;
end;
procedure Bar;
end;
The code in check would be called whenever the object changes externally, so "Bar" can change Enabled and Magnitude with the check being invoked only at the exit of Bar to ensure object consistency and not whenever Bar does the assignment (so it can do a Enabled:=False; Magnitude:=0; and not fail at the first assignment), but assigning those values directly from outside the object would trigger it (so having a var f: Foo; and then writing f.Enabled:=True; would trigger the error).
Or at least that is what i remember understanding.
Maybe i remember some other Pascal-styled language, but the only other such language i know of is Oberon and that goes against the "put everything imaginable, plus extra more" mentality of Ada seems to have :-P.
I've always dreamed about ML language with verification support and dependent types. Does it have something like Haskell's typeclasses or OCaml's modular implicits? Does it support ad-hoc polymorphism in any way?
It's not intending to be a general purpose language, and as such doesn't have a lot of features that make software engineering easier. In fact, it intentionally has a phenomenally slow compiler, because the compiler is solving or attempting to solve various NP-Complete problems simultaneously.
Think of it as a language that will help build tools, analyzers, and possibly linkable libraries for small scale but highly security-critical components, but not necessarily the language you want to use to build an app server.
although, if you don't use any refinement types in your program, i don't see why you can't write general purpose programs and extract code to Ocaml and F#
Compared with Coq, F* makes it easier to write programs with side effects, i.e., actually mutate variables and memory. There is a language fragment which amounts to programming in a subset of C (and this is used for generating C code for the verified crypto stuff in Project Everest mentioned in other comments).
Also compared with Coq, proofs rely heavily on SMT solving with Z3 instead of lots of manual tactics (yay!). On the other hand, where SMT solving fails, you can not resort to manual tactics because there is no tactic language. You have to write proof terms without interactive support. (This may have changed in the last few months, but I'm almost sure it hasn't.)
Don't rely on any of that though, things will have changed in the meantime!
I'm optimistic about F*, but at the moment you have to be quite determined to try to use it for fully verified programming. The same is true for Coq, though.
Sadly not yet. You can see each individual intermediate state but the thing runs fully. We certainly plan to add such a feature though!
We've been mostly studying hammer-like tactics to, say, apply some theorems at specific places in the goal (which the SMT might be bad at), and not for chaining small rewriting or logical steps (which the SMT is good at).
Probably the four most well-known pieces of software with end-to-end verification of functional correctness are the CertiKOS microkernel from Yale (verified in Coq), the SEL4 microkernel (verified in Isabelle), the Compcert C compiler (verified in Coq), the CakeML ML compiler (verified in Isabelle).
Stopping short of full end-to-end functional correctness, there are lot more notable examples, such as the Verve system at MSR (verified type/memory safety using Boogie), and the Bedrock system at MIT (functional correctness down to a fairly low-level IR, done in Coq), the miTLS/Everest TLS stack (protocol correctness verified in F-star, most of the way down to a C-like implementation), and the Iris project (a toolkit for verifying higher-order concurrent programs, currently being used to formalize Rust including unsafe, in Coq).
In machine-checked proof, the three biggest successes are the verification of the Odd Order Theorem (in Coq), the Four Colour Theorem (in Coq), and the Flyspeck project verifying the proof of the Kepler conjecture (in HOL).
In hardware verification, there has been a huge amount of work using ACL2 and HOL, but I don't know this space as well.
Basically, Coq and Isabelle have the most actual successes, but other things like F-star, Boogie and HOL have all worked as well.
In addition to the great sibling comment by neel_k: It depends. I don't think that currently there is the one system that is uniformly best.
For example, I wouldn't want to try to write the kind of fiddling with byte arrays involved in cryptography in Coq. F* is certainly better for that. I imagine Dafny is better for it, too. Possibly Why3 as well. Or, if that is really what you want, you might even go for "real" programming languages (ones that exist independently of verification frameworks) such as Ada with SPARK/Ada or C with Frama-C.
If you want to verify purely functional programs, Coq is pretty good, but proving is often boringly manual. Also, the language is more influenced by the OCaml branch of the ML family than by Haskell; for Haskellers there are Idris and Agda. Then there is Isabelle/HOL, which has a lot of proof automation but a bizarre logic.
Oh, I haven't even mentioned Lean. Well. There is really no easy answer: You have to try to get acquainted with several of the frameworks, ideally with an idea for what your application will be. And whichever you choose, it will be hard.
Having played with dependently typed languages (ATS, Idris) some, and with verification tools (SPARK/Ada, Frama-C/C), I'd like to go a little farther and say that the verification tools are much more like "normal programming" rather than building a proof with the side effect of creating a program.
First, just to clarify, I wrote "Isabelle/HOL", and I'm not claiming anything about any of the many many other systems that have "HOL" in their name. I don't know those.
That said, I think the thing that annoyed me most were the different language levels: "assume P show Q" and "P ==> Q" are both implications, but somehow different. I never quite understood why and how, but there seem to be cases where only one can be proved but not the other. Similarly for different kinds of universal quantification. Your interpretation of "bizarre" may very well vary, but I found these unintuitive and not explained well in the documentation that I found.
(Don't ask for anything much more concrete than this. It's been a while, and I've forgotten the details.)
FWIW the differences are largely syntactic. Some of them are due to implication or quantification being in the meta-level (Pure, Λ/long ⇒) vs object-level (HOL, ∀/⟶). Transporting between them (for HOL) is easy and the automation tends to normalize object-level quantification and implication into meta-level.
"assume P show Q" is just Isar-syntax for a structured proof of "P ==> Q" (which is a term).
Verification of larger programs is hard for a lot of reasons. Easiest method is to do Design-by-Contract in a memory-safe language. Anything easy to specify in contracts goes there where stuff easier to test for is done that way. If your language doesn't support it, you can do it with asserts, conditionals in functions, or constructors/destructors in OOP.
Then, you hit the app with tests generated from the contracts and fuzz testing that both run while the contract conditions have runtime checks in the code. That means a failure will take you right to the precondition, invariant, or post condition that is broken.
For analyzing distributed stuff, TLA+ is your best bet with an accessible tutorial for a subset available here:
TLA+ is a specification language, not a verification language. There is currently no way of establishing correspondence between a TLA+ spec and the code you (presumably) write to implement that spec. There are benefits to this, like using TLA+ to spec the difficult parts of your system without having to buy the whole formal verification farm.
Not just that. F* is convenient for code-level specification, much like Ada SPARK, JML/OpenJML for Java, and ACSL/Frama-C for C, but it's quite hard to specify global correctness conditions in it, like "the database is always consistent", or "no data is ever lost", which is where TLA+ shines.
The gap between local, code-level correctness properties and global ones is a hard one to cross. I am not aware of any tool that can do both (i.e. end-to-end verification) in a way that isn't excruciatingly hard.
And not just _that_, but TLA+ is really good at expressing liveness conditions, like "the database is EVENTUALLY consistent" or "the algorithm will eventually terminate." Most other formal methods don't cover that.
Right; not out of the box, anyway (many proof assistants are powerful enough for deep embedding of arbitrary logics, but, of course, using them is beyond the affordable reach of nearly all software shops). Many can express termination out of the box, which, in theory, can suffice to express most (all?) liveness properties, certainly many useful ones, but again, this is expressed at the function level, and hard to do for global liveness properties.
F* is more geared towards writing verified but otherwise general purpose code directly. Coq is first and foremost a proof assistant, though there are a number of tools that can help you wed it to other programming languages.
One difference with Coq is that in Coq everything is constructive. A proof of a precondition is an object and has to be constructed in F* you could use classical logic, things are proven by a solver.
(I might be wrong.)
In terms of buzzword bingo, F* is as close to my ideal programming language as I can think of, but I do wonder if it'll ever get the tooling support to make it more attractive as a general purpose language.
What is it missing that something like SML has? Why couldn't I just switch from another language to this today, assuming I don't care about IDE support for it?
This actually is a real problem. Searching for "C#", "F#" or ".NET" breaks a lot of search engines and you get nothing useful. If I had to name something now I would go for a relatively unique name like "Clojure", "TypeScript" or "Haskell".
I am not the other person, but when I try to find a JSON decoding module for F* with "f* json", I get almost completely unrelated results. So, either there is no JSON codec for F* (I guess they could use the F#'s one) or Google can't find it.
I believe that Google might have created a rule for that, because when you type this in Yahoo [1] (or I believe even Solr, ElasticSearch or any of the other full text based database search engines) you won't find this as a result.