reproducible builds never made sense to me. if you trust the person giving you the hash, just get the binary from them. you don't need to reproduce the build at all.
if you trust that they're giving you the correct hash, but not the correct binary, then you're not thinking clearly.
One thing a culture of reproducible builds (and thus stable hashes) however does provide is a lack of excuse as to why the build isn't reproducible. Almost nobody will build from source - but when there's bugs to be squashed and weird behavior, some people, sometimes, will. If hashes are the norm, then it's a little harder for attackers to pull of this kind of thing not because you trust their hash rather than their blob, but rather because they need to publish both at once - thus broadening the discovery window for shenanigans.
To put it another way: if you don't have reproducible builds and you're trying to replicate an upstream artifact then it's very hard to tell what the cause is. It might just be some ephemeral state, a race, or some machine aspect that caused a few fairly meaningless differences. But as soon as you have a reproducible build, then failure to reproduce instantly marks upstream as being suspect.
It's also useful when you're trying to do things like tweak a build - you can ensure you're importing upstream correctly by checking the hash with what you're making, and then even if your downstream version isn't a binary dependent on upstream (e.g. optimized differently, or with some extra plugins somewhere), you can be sure that changes between what you're building and upstream are intentional and not spurious.
It's clearly not a silver bullet, sure. But it's not entirely useless either, and it probably could help as part of a larger ecosystem shift; especially if conventional tooling created and published these by default such that bad actors trying to hide build processes actually stick out more glaringly.
Actually, this xz-utils was a sort of reproducible build issue. But the twist it is wasn't the binary not built reproducibly. It was the tar ball. The natural assumption is it just reflected the public git repository. It didn't.
Debian's response is looking to be mandating the source must come from the git repository, not a tar ball. And it will be done using a script. And the script must produce the same output every time, ie be reproducible. Currently Debian's idea of "reproducible" means it reflects the source Debian distributes. This change means it will reproducibly reflect the upstream sources. That doesn't mean it will be the same - just that it's derived in a reproducible way.
As for trusting the person who gave you the hash: that's not what it hinges on. It hinges on the majority installing a binary from a distro. That binary can be produced reproducibly, by an automated build system. Even now, Debian's binaries aren't created just once by Debian. There are Debian builders all around the planet creating the binary, and verifying it matches what's being distributed.
Thus the odds of the binary you are running not being derived from the source are vanishingly low.
So if something is hidden in the source, a reproducible build will give you the confidence to believe that the source is fully vetted and clean.
I see what you’re saying, but I don’t buy that reproducible builds actually solve anything, especially long term. As this whole xz thing has shown us, lots of things fly under the radar if the circumstances are right. This kind of thing will absolutely happen again. In the future it may not even require someone usurping an existing library, it could be a useful library created entirely by the hacker for the express purpose of infiltration a decade later.
Reproducible builds are a placebo. You must still assume there are no bad actors anywhere in the supply chain, or that none are capable of hiding anything that can be reproducibly built, and we can no longer afford to make that assumption.
A reproducible vulnerability is still a vulnerability.
You're focusing on the wrong issue. Just because reproducible builds don't solve all avenues of attack doesn't mean they're worthless. No, a reproducible build does not give you any confidence about the quality of the source. It gives you confidence that you're actually looking at the correct source when your build hash matches the published hash, nothing more.
A window that can be smashed in is still a vulnerability, so there is no value in people locking their front doors.
> I see what you’re saying, but I don’t buy that reproducible builds actually solve anything,
You're right. They don't solve anything, if you restrict "anything" to mean stops exploits entering the system. What reproducible builds do is increase visibility.
They do that by ensuring everybody gets the same build, and by ensuring everybody has the source used to do the build, and by providing a reliable link in the audit trail that leads to when the vulnerability was created and who did it. They make it possible to automatically verify all that happened without anybody having to lift a finger, and it's done in a way that can't be subverted as Jia Tan did to the xz utils source.
Ensuring everyone has the same build means you can't just target a few people, everyone is going to get your code. That in turn means there will be many eyes like Andres looking at your work. Once one of them notice something, one of them is likely to trace it back to the identity that added it. Thus all these things combine to have one overall effect - they ensure the problem and it's cause are visible to many eyes. Those eyes can cooperate and combine their efforts to track down the issue, because they are all are absolutely guaranteed they are looking at the same thing.
If you don't think increasing visibility is a powerful technique for addressing problems in software then you have a lot to learn about software engineering. There are software engineers out their who have devoted their entire professional lives to increasing visibility. Without them creating things that merely report and preserve information like telemetry and logging systems software, would be nowhere as reliable as it is now. Nonetheless, if the point you are making is when we shine a little sunlight into a dark corner, the sunlight itself does nothing to whatever we find there, you're right. Fixing whatever the sunlight / log / telemetry revealed is a separate problem.
You aren't correct in thinking visibility doesn't reduce these sorts of attacks. By making the attack visible to more people, it increases the chance it will be notice on any given day, thus reducing it's lifetime and making it less useful. Worse, if the vulnerability was maliciously added the best outcome for the identity is what happened here - the identity was burned, all its contributions where also burned and so a few years of work went down the drain. At worst someone is going to end up in jail, or dead if they live in authoritarian state. The effect is not too different from a adding surveillance cameras in self-checkouts. Their mere presence makes honesty look like a much better policy.
Reproducible builds let you combine trust from multiple verifyers. If verifyers A, B and C verify that the build produces the stated hash then you can trust the binary if any of A, B or C is trustworthy.
Or in other words, the point is not for everyone to verify the build produces the expected binary since that would indeed make the published binaries pointless. Instead, most people trust the published hash because it can be independently verified and anyone can call out the publisher of the binary if it doesn't match.
It's easier to hide nastiness in a binary than it is in source. And indeed this XZ hack too relied on multiple obfuscated binary blobs. The hash helps because it makes it a little harder to hide things like this. It's not a silver bullet, but it would have in this specific instance made it harder to hide that malicious m4/build-to-host.m4 in the tarball - after all, had the attacker done that despite publishing a hash they would have needed to use the hash including the modified build script, but then anybody building from the git repo would have had a different hash, and that's a risk for detection.
Reproducible builds and hashes thereof aid in transparency and detecting when that transparency breaks down. Of course, it doesn't mean hackers can't hide malicious code in plain sight, but at least it makes it slightly harder to hide between the cracks as happened here.
"slightly harder" isn't enough. That's what I'm saying that people are not accepting.
The days of 3rd party libraries simply being trusted because they're open source are slowly coming to an end. the problem is not unreproducible builds, the problem is that we implicitly trust source code found online.
Projects relying on 100s of 3rd party libraries are a major problem, and no one seems to care. They like it when their Rust build lists all the libraries that they didn't have to write as it compiles them. They like it when they can include an _extremely_ small Node library which itself relies on 4 other _extremely_ small Node libraries, which each rely on another 4 _extremely_ small Node libraries, until you have a node_modules directory with 30,000 packages in it.
We don't know how to write software any more. We know how to sew bits of things together into a loincloth, and we get mad when the loincloth makes us itch. Of course it does, you didn't LOOK at anything you are relying on. You didn't stop to think, even for a moment, that maybe writing the thing instead of trusting someone else with it was even an option.
As we continue doing this, relying on 3rd parties to write the heavy lifting code, we lose the skill to write such code ourselves. We are transferring the skills that we need into the hands of people that we can't trust, when viewed from a high level.
We need to get back to small applications, which we know because we wrote them in their entirety, and that someone else can trust because the code for the thing is 2500 lines long and has extremely few, if any, dependencies.
We need to get away from software which imports 30,000 (or even 100) third party libraries with implicit trust because it's open source.
"All bugs are shallow with enough eyes" requires that people use their eyes.
Sometimes the perfect is the enemy of the good. As the XZ saga shows, even clearly exceptionally well organized attackers don't have an easy time injecting hacks like this; and things that increase the attackers costs or risks, or reduce their reward can be useful even if they don't solve the problem entirely. Reproducible builds are useful; they don't need to be silver bullet.
compression. encryption. handshakes. datetime. rng or prng. i'm right there with ya but nontrivial tasks require nontrivial knowledge. i don't have an answer for obfuscated backdoors like here, or bad code that happens, but i do know if i tried to audit that shit, i'd walk away telling everyone i understood nothing
if you trust that they're giving you the correct hash, but not the correct binary, then you're not thinking clearly.