jacques_chester: You're absolutely right that recreating things bit-for-bit identical can require some real elbow grease. I had to overcome bugs in the Tiny C compiler (tcc) and gcc to make them work, as described in the paper. Date/time stamps can create problems too.
But all these problems are quite doable. Nobody claims that gcc is small, yet I managed to get that working. Compiler makers can follow a few guidelines to make it much easier, see: http://www.dwheeler.com/trusting-trust/dissertation/html/whe...
Check out the graph at the Debian Reproducible Builds project at https://wiki.debian.org/ReproducibleBuilds - yes, some packages don't reproduce bit-for-bit, but they've made a tremendous amount of progress and managed it for most packages.
I think at least part of the solution is convincing upstreams that the world has changed. There are now many people and organizations who are actively working to subvert software - and some of them have a lot of resources and incentive. They're working to break into a variety of things (including repositories, build systems, and distribution processes) so that people run subverted software. Frankly, the world changed decades ago, but only recently have many developers started to realize it. I try to convince people by pointing out past attacks, for example. One piece that might help you is: https://blog.torproject.org/blog/deterministic-builds-part-o...
We can also make it easier. One great thing is that the Debian reproducible builds group has been modifying tools to make it easier to create reproducible builds. That doesn't mean there's nothing left to do, but making it easier makes it way more likely. The "containerization of everything" also has the potential to make life easier - it makes it easier to start from some fixed point, and repeat a sequence of instructions from there.
> Followup question: how do we find independent folk to help us check our work?
I don't have a very good answer for proprietary software. If a company is serious, I think they should pay people to independently review it. There's great evidence that software inspections detect a lot of defects, but in many circumstances detecting & fixing defects is simply not valued as much as the costs of reviewers. We need customers to demand independent analysis of important software.
For open source software, the situation is often better. I think you should work with the people who write/manage/run the relevant system or language package management tools so that the packages are reproducible.
As far as the broader question of "checking our work", there a lot of things you can do to make it easier for people to collaborate. I strongly encourage all OSS projects to try to get a CII best practices badge: https://bestpractices.coreinfrastructure.org/ That has a list of basic things you should do to encourage collaboration and be secure. (Full disclosure: I lead the CII best practices badge project. But you should do it anyway :-) ).
I'll send the link to my engineering director for a squiz. On a superficial scan I think we hit many of these criteria, but not all. Of interest, we're neighbours -- the Cloud Foundry Foundation is also managed by the Linux Foundation.
The one I'm happiest to exceed is the 60 day CVE-fix window. Our policy is to release updated versions of our buildpacks and rootfs within 48 hours of a high-severity CVE being patched upstream -- usually within the same day, actually. Only possible because we have very extensive testing and build automatic.
For internal reviews and teaching, one idea that one of my colleagues floated was having a red team with engineers rotated through that team. The idea being that it's easiest to think like an attacker if you have, for some time, been an attacker.
It would be difficult to find the right tempo, though. It'd take a few weeks to get a grip on common attack types and then start hunting for flaws, and we'd be struggling to find the balance between rotating as many engineers through as possible vs maintaining ongoing feature work and maintenance.
We already did the containerisation of everything (except we didn't lie to ourselves and just called them the processes they are) and discovered flaws along the way, fixing most of them.
Docker wants to find the flaws on its own, so is repeating the last ~30 years of software distribution development.
Every problem containerisation discovers has been discovered. Every flaw it fixes has been fixed. All that is achieved by the modern insistence that containerisation is in any way something new is a larger attack surface. The ability to return to "some fixed point" has always been present and simple. Judging by many years of building other people's software, for fun or profit, developers simply don't bother to do so. None of my work is ever release without returning the system to a known state in order to run the necessary barrage of tests (which, developers take note, includes TESTING THE GOD-DAMN INSTALLATION DOCUMENT) and I've achieved this without the need for chroot, whatever its name du juor.
The chance of an invisible magic snaphotting system not invisibly breaking the 'back-to-basics' assumptions is precisely zero, invalidating any tests performed within or by it.
Developing against a known state and always and only testing against that state is easy and has been since approximately always. Not doing it is lazy. Stop guessing, developers, programme!
The hard part -- sometimes really hard -- is the "bit-for-bit identical" requirement.
Lots of builds are recreatable but not reproducible (there is probably a better term of art here). You can go back to a point in time and build the version of the software as it was, but you are not guaranteed to get a bit-for-bit clone. (See https://reproducible-builds.org for a thorough discussion)
The problem is that there are lots of uncontrolled inputs to a build that are due to sourcecode or compiler changes. Most famously there are timestamps and random numbers, which mess up all sorts of hashing-based approaches.
These can even be non-obvious. Just the other day I and a colleague were investigating the (small but unsettling) possibility that an old buildpack had been replaced maliciously. We compared the historical hash to the file: different. We rebuilt the historical buildpack with trusted inputs: still different.
Then we unzipped both versions and diff'd the directories: identical.
What had thrown our hashes off was that zipfiles, by default, include timestamps. We have a build that is recreatable but not reproducible.
Speaking of builds, we are able to reproducibly build some binaries but not others. Off the top of my head our most high-profile non-reproducible build is NodeJS. Some other binaries (Ruby and Python, in my not-at-all-complete recollection) are fully reproducible.
This difficulty with fully reproducing makes it hard to provide a fully trustworthy chain of custody. A company which uses Cloud Foundry have in actual fact stood up an independent copy of our build pipelines inside their own secure network, so that they can be completely autarkic for the build steps leading to a complete buildpack. This doesn't defend against malicious source, but it defends against malicious builds.
Disclosure: I work for Pivotal, the majority donor of engineering to Cloud Foundry. As you've probably guessed, I'm currently a fulltime contributor on the buildpacks team.
Nixos goes to a great length to steer towards reproduceability. They run builds in chroots, they set all ctime to past 0, they make all directories except the output directory read only, etc. But even then compilers have all sorts of quirks, like running certain code paths multiple times and deciding which one runs faster on ~this~ CPU.
The biggest conceptual mistake we are making is that by default compilers always build for ~this~ machine, linking to this libraries. This makes it so the state of the machine inherently changes with every compilation (aka compiling is not a purely functional operation anymore). If I could go back time and change automake and glibc, cross compiling and explicit dependency handling should be the norm.
(As an aside, containers would greatly benefit too as you wouldn't need to package an entire linux distribution with every binary)
I am sometimes amazed, sometimes disappointed by this reproduceability problem. Computers supposed to be machines that can do the same thing again and again without a mistake, but this is not the case anymore. We have so many layers of complexity and everything is bolted together with duct tape. We focus on developer convenience in the short term but in the long term we completely loose determinism. Sure we can write more code faster than before, but building software is more problematic than ever.
Yet, somehow everything seems to be going to this direction, in fact some people celebrate it and compare it to biology or evolution. I just call is "accidentally stochastic computing".
This is a problem that the GNU Guix package manager[0] (and presumably its inspiration Nix[1]) are helping to solve. Any two git checkouts of Guix with the same git hash on the same architecture should produce bit-identical builds across time and space for many of the programs it packages. It's not true for everything they package yet, but they're making progress.
Interestingly, this problem appears in some pretty diverse circumstances.
The one that springs to mind is video game emulation, from more than a decade ago - the communities there needed a means to reproducibly compress their ROMs, both for verifying dumps and for sharing large sets of ROMs. The tools created have been through many iterations, but they're still in use today in the form of TorrentZip[0] and various relatives like torrent7z.
There are some efforts [1] to make reproducible builds really work, also nix guys have some experience with them, as others have noted. Isolated deterministic environments and stripping binaries/archives (strip-nondeterminism tool) [2] generally do the trick.
As well as supporting work to help independent verification of the "chain of custody". There's 25 of those under that label, if you use the search box.
Bit-for-bit reproducibility does require changing how you do things. Always use checksums, not dates. For zip we have a special version that sets all the timestamps in the archive to a fixed value. Anything writing records to a file based on a hash table needs to sort the entries. The team needs to be dedicated to making all its tools work this way.
Huge fan of your work; thanks so much for what you've done on this topic, and maintaining a clear website that I can refer back to easily :)
I've been wondering if perhaps there's a "trusting trust" problem where we have 3d-printers that can print circuit-boards (eg CPUs)... and also print 3d-printers. The printer is sort of like a compiler in this case. How do you know it won't produce printers that will produce malicious CPUs? It might not be easy to do "bit-for-bit" comparisons between CPUs to make sure one is safe.
Since trusted compilers on untrusted hardware aren't trustworthy, I had hoped that 3d-printing might eventually allow us to trust our own printed hardware... but it might be turtles all the way down!
> I've been wondering if perhaps there's a "trusting trust" problem where we have 3d-printers that can print circuit-boards (eg CPUs)... and also print 3d-printers.
I can answer that question easily: "Yes, there's a problem :-)". DDC can be used to help, but there are caveats.
This quote is probably especially apt: "Countering this attack may be especially relevant for 3-D printers that can reproduce many of their own parts. An example of such a 3-D printer is the Replicating Rapid-prototyper (RepRap), a machine that can “print” many hardware items including many of the parts required to build a copy of the RepRap [Gaudin2008]. The primary goal of the RepRap project, according to its project website, is to “create and to give away a makes-useful-stuff machine that, among other things, allows its owner [to] cheaply and easily… make another such machine for someone else” [RepRap2009]."
I also note that, "... DDC can be applied to ICs to detect a hardware-based trusting trust attack. However, note that there are some important challenges when applying DDC to ICs..." and there's the additional problem that when you're done, you'll only have verified that specific printer - not a different one. Determining if software is bit-for-bit identical is easy; determining if two pieces of hardware are logically identical is not (in the general case) easy. No two hardware items are perfectly identical, and it's tricky to determine if something is an irrelevant difference or not.
If someone wants to write a follow-on paper focusing on hardware, I'd be delighted to read it :-).
It's worse for trusted hardware than most people think. The framework I came up with predicted a number of attacks including analog and material swapping at fabs. So, that's an initial test. Here the basic risk analysis:
I predicted the A2 paper on analog compromise happening, at a high-level rather than specific attack, largely due to our hardware guru on Schneier's blog bragging about mixed-signal attacks years ago. He taught us they resisted attempts to counterfeit or patent sue them by disguising key functions in analog or RF components the digital tools couldn't even see. He said competitors did, too, with him regularly having to carefully inspect lowest-level representation of 3rd-party components. I have a feeling they were cloning them, too. ;) Anyway, the actual products were already subverted years ago just for competitive advantage, counterfeiting, etc. He said some counterfeiters were so good they cloned his company's products down to the transistors. I said, "Holy shit!"
One more thing for you while I'm still on this: cost reduction via merged designs. The mask and fab runs for prototyping cost tons of money. A well-known way to reduce that is many companies sharing one run (eg shuttle run or MPW) to get their test chips cheaper. A less-known trick, at least outside embedded, is them putting multiple products on one ASIC to do same thing for production runs with a factory-setting telling it what chip to look like. My hardware guy gave example of 3G or WiFi circuitry embedded in microcontroller used in your input devices... perfect for keylogging... that was only incidentally there since supplier offered both feature-phone SoC's and peripheral chips. And simply didn't want to manufacture two lines. Such extras might be re-enabled, even remotely, depending on how they control access to them. So, gotta watch out for them.
I try to be clear in there that I'm not rebutting your work, which I respect, so much as the extreme amount of focus by all the IT forums on just a few problems that cause almost no failures in real-world vs the ones that cause all sorts of compiler & app issues (including security). I also present an alternative that addresses root causes, leveraging two of your own articles, that knocks out the interception risk as a side effect of just doing the computation and distribution securely.
> I try to be clear in there that I'm not rebutting your work, which I respect, so much as the extreme amount of focus by all the IT forums on just a few problems that cause almost no failures in real-world vs the ones that cause all sorts of compiler & app issues (including security). I also present an alternative that addresses root causes, leveraging two of your own articles, that knocks out the interception risk as a side effect of just doing the computation and distribution securely.
You can see some of my comments below. This discussion is a little subtle, because I agree with you that many other activities need to be done, and you respect my work (thanks!). I don't think there's an "extreme" focus; to my knowledge nearly no money has been spent on DDC, and the amount of funding spent on reproducible builds is dwarfed by the funding spent on formal methods and related work (by orders of magnitude). I think the "focus" that you see is more the excitement about things that can be done now, without radical retooling.
"to my knowledge nearly no money has been spent on DDC, and the amount of funding spent on reproducible builds is dwarfed by the funding spent on formal methods and related work (by orders of magnitude)"
By focus, I mean the IT people out there using, building, discussing, or securing FOSS. They talk a ton about these things but not that other stuff. As you note, there is a lot of work and money going into it by CompSci, proprietary companies (surprise!), and militaries.
What are the implications if we don't start from a trusted compiler?
e.g. we have compilers A and B. Either of which may, or may not, have back doors. Is it possible to detect back doors via compilation?
A complex backdoor can detect compilation of A or B, and insert different backdoor code into each. But will those back doors be identical?
i.e. is there a a compilation path (A -> B' -> B'', etc.) such that we can either detect that the backdoors are either bit for bit identical, or that there are no back doors?
DDC works if the trusted compiler has back doors and other nastiness - as long as the back doors won't affect the DDC results. This is noted in section 4.3: "... something is “trusted” if we have justified confidence that it does not have triggers and payloads that would affect the results of DDC. A trusted program or process may have triggers and payloads, as long as they do not affect the result. A trusted program or process may have defects, though as shown later, any defects that affect its result in DDC are likely to be detected. Methods to increase the level of confidence are discussed in chapter 6." http://www.dwheeler.com/trusting-trust/dissertation/html/whe...
Chapter 6 discusses various ways to make this likely: http://www.dwheeler.com/trusting-trust/dissertation/html/whe... While you're doing tests, it's probably best to do them on an isolated system or network. One interesting appraoch is to use old computers to compile newer compilers (possibly through emulation). It's not likely that the older computers will have malicious attacks or backdoors that will work against newer compilers.
You can also apply it multiple times using multiple different trusted compilers. In that case, an attack would have had to subvert all of those compilers (and/or their environment). This quickly becomes vanishingly unlikely.
"One interesting appraoch is to use old computers to compile newer compilers (possibly through emulation"
I strongly second this. It's what I came up with to deal with hardware, subversion risk post-Patriot Act. The older it is, the better as the oldest stuff predated wireless hardware and had hardware so scarce they didn't waste transistors.
One can also use embedded boards made with MCU's from the old, process nodes. The 8- and 16-bitters will be slow, painful, and maybe not subverted. Alternative, Leon3 GPL CPU on diverse FPGA's. Or my brute-force approach: same compiler on one machine and ISA from each subversive, non-cooperative country comparing output. Lot cheaper to do that now with excellent FOSS support. :)
It strikes me that the use of diverse systems to reinforce assumptions of trust within a given subsystem is an architectural paradigm not limited to compilers. The key problems are implementation feature-set or edge-case differences and overhead (real time and maintenance/up-front development). In fact, it would be ideal with multiple client versions/implementations on any service (particularly distributed or financial) and indeed I have done this in the past. Not sure if this paradigm has a name... anyone? I suppose you could just say consensus-based hedging.
It's in fact a long-established technique in high-assurance systems going back to I think aerospace or security-critical where the triple-modular redundancy trick was reapplied with separate teams building each one. Security through diversity also re-emerged relatively recently as a very active sub-field of INFOSEC/IT. If you're interested in that stuff, use these terms in various combination when you're Googling: "artificial diversity," "automated obfuscation," "moving target software security," "security diversity software." Also, including "pdf" helps given most good ones are papers. The word "survey" will occasionally land you on a pile of gold with references all in one spot. :)
This again. A perfect example of solving the wrong problems in a clever way. To his credit, Wheeler at least gives credit to the brilliant engineer (Karger) who invented the attack, points out it took 10 years before that knowledge reached anyone via Thompson (recurring problem in high-security), and did the reference essays on the two solutions to the actual problem (high-assurance FLOSS & SCM's). That's what you're better off reading.
Here's a quick enumeration of the problems in case people wonder why I gripe about this and reproducible builds fad:
1. What the compiler does needs to be fully specified and correct to ensure security.
2. The implementation of it in the language should conform to that spec or simply be correct itself.
3. No backdoors are in the compiler, the compilation process, etc. This must be easy to show.
4. The optimizations used don't break security/correctness.
5. The compiler can parse malicious input without code injection resulting.
6. The compilation of the compiler itself follows all of the above.
7. The resulting binary that everyone has is the same one matching the source with same correct or malicious function but no malicious stuff added that's not in the source code already. This equivalence is what everyone in mainstream is focusing on. I already made an exception for Wheeler himself given he did this and root cause work.
8. The resulting binary will then be used on systems developed without mitigating problems above to compile other apps not mitigating problems above.
So, that's a big pile of problems. The Thompson attack, countering the Thompson attack, or reproducible builds collectively address the tiniest problem vs all the problems people actually encounter with compilers and compiler distribution. There's teams working on the latter that have produced nice solutions to a bunch of them. VLISP, FLINT, the assembly-to-LISP-to-HLL project & CakeML-to-ASM come to mind. There's commercial products, like CompCert, available as well. Very little by mainstream in FOSS or proprietary.
The "easy" approach to solve most of the real problem is a certifying compiler in a safe language bootstrapped on a simple, local one whose source is distributed via secure SCM. In this case, you do not have a reproducible build in vast majority of cases since you've verified source itself and have a verifying compiler to ASM. You'll even benefit from no binary where your compiler can optimize the source for your machine or even add extra security to it (a la Softbound+CETS). Alternatively, you can get the binary that everyone can check via signatures on the secure SCM. You can even do reproducible builds on top of my scheme for the added assurance you get in reproducing bugs or correctness of specific compilations. Core assurance... 80/20 rule... comes from doing a compiler that's correct-by-construction much as possible, easy for humans to review for backdoors, and on secure repo & distribution system.
Meanwhile, the big problems are ignored and these little, tactical solutions to smaller problems keep getting lots of attention. Same thing that happen between Karger and Thompson time frame for Karger et al's other recommendations for building secure systems. We saw where that went in terms of the baseline of INFOSEC we had for decades. ;)
Note: I can provide links on request to definitive works on subversion, SCM, compiler correctness, whatever. I think the summary in this comment should be clear. Hopefully.
Note 2: Anyone that doubts I'm right can try an empirical approach of looking at bugs, vulnerabilities and compromises published for both GCC and things compiled with it. Look for number of times they said, "We were owned by the damned Thompson attack. If only we countered it with diverse, double compilation or reproducible builds." Compare that to failures in other areas on my list. How unimportant this stuff is vs higher-priority criteria should be self-evident at that point. And empirically proven.
They're barely useful for low assurance. Just read the Csmith paper testing compilers to see the scope of the problem. They solution to what they're really worried about will require (a) a correct compiler, (b) it written in cleanly-separated passes that are human-inspectable (aka probably not C language), (c) implemented with correctness checks to catch logical errors, (d) implemented in safe language to stop or just catch language-level errors, (e) stored in build system hackers can't undetectably sabotaged, (f) trusted distribution to users, and (g) compiled initially with toolchain people trust with optional, second representation for that toolchain.
Following Wirth's Oberon and VLISP Scheme, the easiest route is to leverage one of those in a layered process. Scheme, esp PreScheme, is easiest but I know imperative programmers hate LISP's no matter how simple. So, I include a simple, imperative option.
So, here's the LISP example. You build initial interpreter or AOT compiler with basic elements, macro's, and assembly code. Easy to verify by eye or testing. You piece-by-piece build other features on top of it in isolated chunks using original representation until you get a real language. You rewrite each chunk in real-language and integrate them. That's first, real compiler that was compiled with the one you built piece by piece starting with a root of trust that was a tiny, static LISP with matching ASM. You can use first, real compiler for everything else.
Wirth did something similar out of necessity in P-code and Lilith. In P-code, people needed compilers and standard libraries but couldn't write them. The could write basic system code on their OS's. So, he devised idealized assembly that could be implemented by anyone in almost no code and just with some OS hooks for I/O etc. Then, he modified his Pascal compiler to turn everything into P-code. So, ports & bootstrapping just required implementing one thing. Got ported to 70+ architectures/platforms in 2 years as result.
The imperative strategy for anti-subversion is similar. Start with idealized, safe, abstract machine along lines of P-code with ASM implementations. Initial language might be Oberon subset with LISP or similar syntax just for effortless parsing. Initial compiler done in high-level language for human inspection with code side-by-side in subset language for that idealized ASM. It's designed to match high-level language, too. Create initial compiler that way then extend, check, compile, repeat just like Scheme version.
The simple, easy code of the initial compilers and high-level language for final compilers means anyone can knock them off in about any language. That will increase diversity across the board as many languages, runtimes, stdlibs, etc are implemented quite differently. Reproducible build techniques can be used on the source code and initial process of compilation if one likes. The real security, though, will be that many people reviewed the bootstrapping code, the ZIP file is hashed/signed, and users can check that source ZIP they acquired and what was reviewed match. Then they just compile and install it.
I'm not disagreeing, yes, reproducible builds don't make a lot of sense from a security point of view. But they do from an infrastructural/package management point of view and could make some things easier, more manageable, more reliable.
Excellent, it's easy! Why haven't you completed this yet? :-)
So, a few thoughts.
The CSmith paper "Finding and Understanding Bugs in C Compilers" is a fun paper: https://www.cs.utah.edu/~chenyang/papers/pldi11-preprint.pdf - however, let's delve further. They found defects in every compiler they tried, proprietary and OSS. They even found defects in CompCert - because they were defects in CompCert’s unverified front-end code. What's more, they focused on "atypical combinations of C language features" - which are important, but to far fewer users.
Yes, it'd be awesome to have compilers that are perfect have absolutely no defects. Let's work on that. But it will be many, many years before they are widespread.
Besides, while no-defects would be awesome, many people are more interested in a different and simpler requirement - they want to detect subversion of binaries (where the binary and source do not correspond). Yes, provably perfect compilers could do that, but you don't need to wait for them; reproducible builds and DDC can provide that now, and you don't have to wait for anything.
So let's talk about VLISP. VLISP spawned an amazing number of papers, and was interesting work. Where's the code? MITRE never released it to my knowledge. To me, programs I can't run are in the "who cares?" category, and stuff people can't reproduce & investigate isn't really science anyway. Besides, VLISP only generated code for computers people generally don't use anymore. (You'll notice that I posted the scripts demonstrating DDC so that others can reproduce their execution.)
Sure, p-code was awesome in its time, I used it. But when newer hardware (the IBM PC) came along, it got superceded. More importantly for our story, it got superseded before there was time to develop any complex proofs. This is a more general problem: As long as formal proofs demand a massive amount of time, their results will become useless due to obsolescence. We must improve the tooling and models so that the proofs and layers can be done in a much faster and cost-effective way. Proving obsolete stuff is not very helpful. I think the ProVal approach (using Why3) is especially promising, but fundamentally, we need to make it not a massive research effort to write a high-assurance compiler.
Oh, and a little out-of-scope: As someone who's written Scheme & Common Lisp for decades, the problem with Lisp isn't its imperative nature; lots of people like Elm and Haskell, which are also functional languages. The problem with Lisps is their hideous syntax; Lisps don't even support infix by default, something every grade schooler learns. One solution is here: http://readable.sourceforge.net/
Anyway, what you've outlined is basically a program to build up from small safe components into larger trustworthy components. It's a sound strategy, and one that has been repeatedly advocated for decades by many people. But we also need to admit that it's going to take a long time, because our tooling is only just becoming good enough, and even then only in certain cases. There are serious limitations you're glossing over.
I'm disagreeing with a person I respect on complicated topic rather than offering "disdainful sneers." I do respect how he replies to the disagreement. A scholar and a gentleman he is.
"Excellent, it's easy! Why haven't you completed this yet? :-)"
I did. Anyone that ever wrote their own Scheme, Oberon, or C compiler in primitive language did. Common assignment in classrooms. :P Anyway, I took a brain injury in an accident that cost me most of my memory back in 2011 or... idk. It's also hard to retain new things. People were amazed at what I've put together on hardware & some other stuff but I can't remember how to program hands-on, do set theory, a bunch of things. Only what I repeated a lot in year(s) leading up to injury. Unfortunately, I was in resaerch rather than build mode then. :(
Meanwhile, I still got enough fragments and learning ability to work on high-level passing stuff onto specialists. I've been mostly focused on solving secure hardware since most of rest is already done to point people can just build it from published techniques with some effort put in. Got digital, mask, and fab stuff pretty far but analog, RF, and EMSEC might be uncheatable. Might actually need years to decades of specialist knowledge for solution. (sighs) Also do lots of evangelism of high-assurance methods on various areas. However, the lack of effort on this very important problem in compiler assurance has led me to consider rolling up my sleeves and bumbling my brain-damaged ass through a compiler project with maybe this start:
That, similar in PreScheme, or redo Oberon or Component Pascal. Then use those to build the rest. Who knows. Just a lot of stuff to relearn. QBE is nice on C side as it's simple plus has small number of high-bang-for-buck optimizations. So, if I did it, I'd recode that in CakeML with me checking initial tool (eg simple scheme/ML) by hand in assembly. I also found a way to do C code (or C compatible code) in ML. So there's options if I decide to take up your challenge. :)
"So let's talk about VLISP. "
I lost too many bookmarks in recent crash. I had a link to the page which had what looked like a downloadable Scheme48. The code generation was done for x86, PPC, ARM, and C language. You bet you can buy those today. Anyway, I'd do a current project in Myreen's LISP or CakeML if I wanted verification as they're current, maintained, and have more assurance. Actually, my scheme is side-by-side a fast version for development pace and safe verion for reference/production if not too slow. So, MLton and CakeML or (fast LISP here) and VLISP/LISP1.5. I theorize I'd get best of both worlds that way.
"We must improve the tooling and models so that the proofs and layers can be done in a much faster and cost-effective way."
Totally agree. I can't say which will win. Why is good. Lean is interesting. Coq & Isabelle/HOL are getting good libraries. Dependent types are faster but controversial. C has LiquidTypes, Frama-C, and Microsoft's VCC plus lots of static analysis. Don't know where it will go but doing anything critical in a functional language will definitely help. Certify equivalence to imperative for distribution. Just look at what productivity vs expertise COGENT achieves in the filesystem paper vs seL4 verification in Haskell/HOL/C:
I agree and thanks for Readable. Looks nice! Prior work I saw on this was Dylan language from Apple. Unfortunately didn't get adoption. Maybe time for another shot at beautiful, system LISP given Clojure's success.
" But we also need to admit that it's going to take a long time, because our tooling is only just becoming good enough, and even then only in certain cases. There are serious limitations you're glossing over."
I'm actually ignoring them on purpose to get a short-term win as you put it. Specifically, I'm expecting incremental, well-tested, inspectable compiler without expecting formal verification. I list it as necessary for full-trust but not necessary to get baseline up. The Goulum paper I linked and Scheme examples shows how little work first steps are if one uses published techniques & tooling good for job. Case in point by one person:
The next component is keeping the implementation simple and putting in DbC-style interface checks in there. Initial quality comes from inspection, testing, modularity, and mostly pure interfacing. Overtime these two extra things allow high-assurance crowd to rewrite each pass using whatever tech they have available.
In case I'm not being clear, let me illustrate with example. The CompCert compiler verified compilation by using cleanly-defined passes into intermediate languages with specific steps on each. The medium-assurance approach just does that in Ocaml and/or CakeML with testing, QuickCheck, assertions, etc. Like regular development with slightly more effort and little expertise. The passes are also kept simple enough that their assurance can go up plug-and-play over time with others' help when they formally verify them. This can work with any software that doesn't channge a lot at once with usefulness over time. This includes compilers, protocols (esp SSL), key OS components, provers, standard libraries, maybe even parts of databases. I call it, similar to hardware field, Design for Verification for software. Like Orange Book B3, just spec, structure, and code it enough that it might get improved later.
Meanwhile, you get immediate benefits over a less-structured, monolithic compiler in C or C++ without things like QuickCheck or assertion-based, test generation. You also get more overtime without extra work if large uptake causes CompSci to focus on it. I cheat by recommending the ML stuff they already focus on. So, yeah, I'm all about interim stuff that doesn't need full, formal verification. If only because I both lack time and memory to use formal verification myself. Gotta help the new kids get finished faster and easier so I can trust/use the compiler, too! :)
> I did. Anyone that ever wrote their own Scheme, Oberon, or C compiler in primitive language did. Common assignment in classrooms. :P
Well... you implemented a compiler. And hey, that's a great thing. I've implemented several myself. But I doubt that you proved that it was correct, or that it was a large-enough language that many people would want to use it directly. :-).
> I'm actually ignoring them on purpose to get a short-term win as you put it. Specifically, I'm expecting incremental, well-tested, inspectable compiler without expecting formal verification. I list it as necessary for full-trust but not necessary to get baseline up.
Okay. But as you note, you're also focusing on shorter-term wins... they're just different shorter-term approaches for for different goals. And that's okay.
"Well... you implemented a compiler. And hey, that's a great thing. I've implemented several myself. But I doubt that you proved that it was correct, or that it was a large-enough language that many people would want to use it directly. :-)."
You got me there. Nobody wanted to pay me for it or anything haha.
"but as you note, you're also focusing on shorter-term wins... they're just different shorter-term approaches for for different goals. And that's okay."
Fair enough. I'm going to continue on my route for this topic, though, given there's already a lot of attention on the other ones. Parallel searches sometimes bear fruit. If it goes nowhere, you can bet I still have a copy of your papers and the reproducible build stuff so I can jump on that. :)
We all agree that reproducible builds and DDC do not solve all problems. No one said they did.
I encourage continued research work on solving steps 1-8 (and some related ones, too). As you noted, I've advocated for work in many of them, and have written about them. I'm no stranger to SCM security or formal methods :-). But many of these steps are notoriously difficult. CompCert is a great step forward in the research community. But CompCert only supports a subset of the C99 programming language, is proprietary, and produces executable code that performs significantly worse than Freely-available gcc and clang/llvm when used at their usual optimization levels (-O2). Neither gcc nor clang/llvm are ever going to be supplanted by a subset proprietary compiler that generates poorly-performing code. In addition, where are the rest of the compilers, for all the rest of the programming languages? I don't think the ECMAScript ES6 specification meets point #1, and you can't get the rest until you have point #1.
When you say, "The implementation of it in the language should conform to that spec or simply be correct itself." - sounds good. I guess you'll use formal proofs. How do you know that those formal methods tools aren't subverted?
> The "easy" approach to solve most of the real problem is a certifying compiler in a safe language bootstrapped on a simple, local one whose source is distributed via secure SCM.
Most people do not have access to a "certifying compiler", and in any case, that begs the question of how you certify that compiler. DDC lets you break the cycle.
Reproducible builds and DDC only solve specific problems, but they solve them in ways that can be implemented relatively quickly, at relatively low cost, and can use today's tools. I think that is what's exciting about them - they're much more obviously within reach. Yes, they do require some tweaks to build processes and build tools. But they don't require everyone to radically switch to totally unfamiliar approaches (e.g., to formal methods). And there are subverted binaries attacks - quite a number of them - which can be countered by reproducible builds. DDC can be viewed as the next step in reproducible builds - enabling detection of subverted binaries when a compiler/build tool is involved. I think there's no shame in solving focused security problems today, even if they don't solve absolutely everything; we should celebrate the successes we can get.
> Meanwhile, the big problems are ignored and these little, tactical solutions to smaller problems keep getting lots of attention.
I don't think they're being ignored, but you should expect that projects are less likely to get funding if the require $$BIGNUM, take a long time, and have a significant risk of not turning into something people widely use. We don't all have $$BIGNUM. I think humanity can both work to fix smaller problems (with shorter-term payback) and work to make progress on bigger & harder problems.
"We all agree that reproducible builds and DDC do not solve all problems. No one said they did."
This is the crux of my complaint. I know we plus some people here believe that. I don't think the majority do in practice, though. When it's compiler risk, the Thompson paper, yours, and reproducible builds are about all anyone talks about on the majority of forums. There's also warnings about optimizations reducing security or compilers just screwing up with solution being look at assembly. There's been virtually no interest outside high-assurance and functional programming fields in building robust compilers using whatever techniques can be done or addressing common security risks.
I think that's why you can make some of your later claims about what's out there for various languages. Virtually nobody cares enough to put time or money into it. They'll happily use crap compilers that are potentially subverted or full of 0-days so long as the binary was reproducible. That's root problem on demand side. Least we got the CompSci people staying on it FOSSing key results. Look up K Framework's KCC compiler & C semantics for what a great reference might look like. Designed to look like GCC for usability, passed 99.2% of torture suite by 2012, and didn't get locked up like CompCert. Yay!
" But CompCert only supports a subset of the C99 programming language, is proprietary, and produces executable code that performs significantly worse than Freely-available gcc and clang/llvm when used at their usual optimization levels (-O2)"
It's true. The proper response would've been to do a non-formal, open-source knockoff of CompCert or similar tool in safe, simple language. Then start adding optimizations. Most FOSS people in compilers didn't try but there's some more CompSci teams working on it. QBE backend was exciting, though, as a starting point on first, C compiler in chain if re-written with medium assurance techniques:
" I guess you'll use formal proofs. How do you know that those formal methods tools aren't subverted?"
Oh I saw that coming cuz it worried me, too. Let me present to you a page listing work to knock about everything related to formal methods outside the TCB outside the specs & logics themselves:
Just follow... all.. of he and his accociates' links. They're all great. Quick summary: verified LISP 1.5 to machine code in HOL, verified SML (CakeML) same way, translation validation sort of compiles specs to assembly in HOL, they did theorem proven composed of simpler provers whose root-of-trust starts with their verified LISP/ML/assembly tooling, extraction mechanism in Coq or HOL to ML that's untrusted, HOL in HOL, a HOL/light designed to be tiny + use verified, and now a HOL to hardware compiler in works. These people are on a win streak.
I included specifically the paper that addresses your concern in addition to Myreen's page. It does for provers what my scheme would do for compilers by working from simple to complex with each step proving the next + verified implementations. They then do extra step of converting each layer to preceding layers to verify with those. It's remarkably thorough with you basically trusting simple specifications and implementations at each step + a tiny, first step. Were you as impressed reading it as I was that they both tackled the prover problem and were that thorough?
Note: It's for a FOL. K framework, some certified compilers, hardware verification, an Alloy-to-FOL scheme, Praxis's method, and Design-by-Contract are among stuff I found done in FOL. Already useful. Plus, I found a HOL-to-FOL converter proven in HOL. Just do that or the HOL/Light stuff in FOL... if possible... then we're from specs down to machine code with verified extraction & tiny TCB in Jared's link.
"And there are subverted binaries attacks - quite a number of them - which can be countered by reproducible builds."
This one I'm more open to consideration. The question being, if you have to build it to check in the first place, why aren't we countering the binary risk by solving source distribution and authentication process? Users vetted it was the right source & makefile then produced their own binary from it using local tools that they hopefully checked. No longer a binary subversion risk. It was the requirement for A1-class systems. The source authentication aspect is very easy to implement: (a) have central location with source, (b) replicate it to others where someone at least looks at it, (c) matching hashes for the source files, and (d) signed archives of it available at multiple locations for comparison. I also like to point out that one needs something like this anyway for source-level security, signing/checking distributed items is already a thing, and people constantly develop more SCM bullshit for fun. Seemed low friction.
So, at that point, you are working only with stuff that wasn't MITM'd. Getting backdoored at that point means the problem was in the source supplied or an existing binary (eg compiler) on your system. The former requires verifiable compilers I was advocating that have source checking & distribution above. The second, since distribution was secure, usually means the user is already compromised: you find a subverted, subverting compiler on your machine you erase that system, maybe buy new hardware with firmware-level attacks these days, reinstall the shit, carefully get a new compiler (preferrably source), and reinstall it. They'll have to get source or binary of replacement compiler from trusted place, vet it came from there, build if source, apply an act of faith to believe it's not malicious, and then run it.
It's just not buying a lot vs secure distribution with a diverse checking of source and testing of binary approach. Diverse, source checking against hashes and public keys is also super easy with prerequisite tools on most Linux distros I've seen. From there, the source or local tool might also be malicious. From there, you can do a C interpreter with checks, compile it with Softbound+CETS followed by strong sandboxing, whatever you want to do if avoiding building a whole, certifying compiler. Thing is, all these still have a compiler in the root-of-trust whose code might be incorrect or malicious even if binary output matches. I'm wanting to see a Hacker News post with 200-500 votes on countering that with an open-source project. Other than QBE and TCC which were a nice start, got my props, and I found here. :)
Note: I have other middle ground ideas, too. Maybe also rewriting an optimizing compiler like GCC, Clang, or SGI's OSS one to just use a subset of C a simple interpreter or C compiler can take. One user can verify by eye or hand or code themselves. Bug-for-bug matching to get that initial bootstrap. Maybe even pay AbsInt to use CompCert one time on initial bootstrap with simple, OSS knockoff allowed just for verification of that. There's precedent with TrustInSoft and PolarSSL. Still gonna need to rewrite them overtime for easier verification of correctness, effect of optimizations on security features, and covert/side channel production. Inevitable. Might as well start while top compilers are only (absurdly large number) lines of code instead of (that number + similarly large number) later.
>> " I guess you'll use formal proofs. How do you know that those formal methods tools aren't subverted?"
> Oh I saw that coming cuz it worried me, too. Let me present to you a page listing work to knock about everything related to formal methods outside the TCB outside the specs & logics themselves:
> https://www.cl.cam.ac.uk/~mom22/
> https://www.cs.utexas.edu/users/jared/milawa/Web/
> Just follow... all.. of he and his accociates' links. They're all great. Quick summary: verified LISP 1.5 to machine code in HOL, verified SML (CakeML) same way, translation validation sort of compiles specs to assembly in HOL, they did theorem proven composed of simpler provers whose root-of-trust starts with their verified LISP/ML/assembly tooling, extraction mechanism in Coq or HOL to ML that's untrusted, HOL in HOL, a HOL/light designed to be tiny + use verified, and now a HOL to hardware compiler in works. These people are on a win streak.
WOW. Cool stuff! Hadn't seen that before, thanks for the links.
>> "And there are subverted binaries attacks - quite a number of them - which can be countered by reproducible builds."
> This one I'm more open to consideration. The question being, if you have to build it to check in the first place, why aren't we countering the binary risk by solving source distribution and authentication process?
The problem is that most people don't want to re-build all the binaries. If you can reproduce them, you can rebuild and check.
But all these problems are quite doable. Nobody claims that gcc is small, yet I managed to get that working. Compiler makers can follow a few guidelines to make it much easier, see: http://www.dwheeler.com/trusting-trust/dissertation/html/whe... Check out the graph at the Debian Reproducible Builds project at https://wiki.debian.org/ReproducibleBuilds - yes, some packages don't reproduce bit-for-bit, but they've made a tremendous amount of progress and managed it for most packages.
You can see some related information at this page: http://www.dwheeler.com/trusting-trust/ including a video of me discussing the paper.