Last week I took part in a formal methods workshop where one of the topics discu...

Last week I took part in a formal methods workshop where one of the topics discussed was "why we don't have formal specs for most things" (which is a more modest ask than verification). Many things we came up with were what you'd expect: additional cost (which has to be taken into account when changing programs), missing incentives, lack of awareness of FM existence, lack of training, unpolished tooling.

Other, less obvious ones included the inherent difficulty in writing the specs. Namely, FM is about verifying that a system model conforms to a specification. For this to bring value, the specification has to be significantly simpler than the model that we can deem it "obviously correct". But such specifications can be extremely difficult to find. On one hand, they are abstractions of the model, and finding good abstractions is one of the hardest parts of software engineering. Another way to think about it is Kolmogorov complexity; at some point you cannot "compress" the description of a problem any further, i.e., some problems are just difficult by nature.

On the incentives front, one of the interesting viewpoints brought up was that having formal specs for your product can make you more liable for defects, and that some (in particular, hardware vendors) might prefer to have wiggle room there. E.g., consider the Spectre/Meltdown fiasco and Intel's response. Another thing brought up there was that software vendors often don't directly feel the cost of the bugs in their software, at least not short-term. This is different in hardware; e.g. ARM uses FM extensively, though I don't know if they commit to specs publicly. The RISC-V architecture comes with a formal spec, though.

On the other hand, I think that many of the current trends in the computing world are making FM more aligned with the incentives of a core part of the industry. The cloud brings centralization, with few large companies (Amazon, Google, etc) providing a lot of complicated software infrastructure that's critical to their success. An outage of an AWS service directly costs them significant money, and can easily completely halt the customers' operations. Moreover, they operate in an open and potentially hostile environment; a corner case software bug in MS Office wasn't a big deal, and maybe could be "fixed" by providing the customer a workaround. But such a bug in a cloud service can become an exploitable security vulnerability. And in fact you do see AWS having a dedicated formal methods team. Also, the TLS 1.3 development included formal verification. On the other hand, AFAIK Google doesn't employ FMs that much, even though they are in the same game as Amazon with GCP; so there's definitely a cultural component to it as well. But I still believe that it's likely that we end up with a small core of companies that provides a lot of critical infrastructure that heavily uses FM, and that we see little adoption of FM outside of this circle.

As for the article, I agree with most points made, though it unsurprisingly has a bias for using TLA with TLC. I agree that TLA/TLC can often be the best solution, but there are caveats. For example, models can be awkward (e.g., recursive data structures), the specification language is limited, and "just throwing more hardware at it" doesn't really work that great for exponential-time problems.