It's easier to hide nastiness in a binary than it is in source. And indeed this XZ hack too relied on multiple obfuscated binary blobs. The hash helps because it makes it a little harder to hide things like this. It's not a silver bullet, but it would have in this specific instance made it harder to hide that malicious m4/build-to-host.m4 in the tarball - after all, had the attacker done that despite publishing a hash they would have needed to use the hash including the modified build script, but then anybody building from the git repo would have had a different hash, and that's a risk for detection.
Reproducible builds and hashes thereof aid in transparency and detecting when that transparency breaks down. Of course, it doesn't mean hackers can't hide malicious code in plain sight, but at least it makes it slightly harder to hide between the cracks as happened here.
"slightly harder" isn't enough. That's what I'm saying that people are not accepting.
The days of 3rd party libraries simply being trusted because they're open source are slowly coming to an end. the problem is not unreproducible builds, the problem is that we implicitly trust source code found online.
Projects relying on 100s of 3rd party libraries are a major problem, and no one seems to care. They like it when their Rust build lists all the libraries that they didn't have to write as it compiles them. They like it when they can include an _extremely_ small Node library which itself relies on 4 other _extremely_ small Node libraries, which each rely on another 4 _extremely_ small Node libraries, until you have a node_modules directory with 30,000 packages in it.
We don't know how to write software any more. We know how to sew bits of things together into a loincloth, and we get mad when the loincloth makes us itch. Of course it does, you didn't LOOK at anything you are relying on. You didn't stop to think, even for a moment, that maybe writing the thing instead of trusting someone else with it was even an option.
As we continue doing this, relying on 3rd parties to write the heavy lifting code, we lose the skill to write such code ourselves. We are transferring the skills that we need into the hands of people that we can't trust, when viewed from a high level.
We need to get back to small applications, which we know because we wrote them in their entirety, and that someone else can trust because the code for the thing is 2500 lines long and has extremely few, if any, dependencies.
We need to get away from software which imports 30,000 (or even 100) third party libraries with implicit trust because it's open source.
"All bugs are shallow with enough eyes" requires that people use their eyes.
Sometimes the perfect is the enemy of the good. As the XZ saga shows, even clearly exceptionally well organized attackers don't have an easy time injecting hacks like this; and things that increase the attackers costs or risks, or reduce their reward can be useful even if they don't solve the problem entirely. Reproducible builds are useful; they don't need to be silver bullet.
compression. encryption. handshakes. datetime. rng or prng. i'm right there with ya but nontrivial tasks require nontrivial knowledge. i don't have an answer for obfuscated backdoors like here, or bad code that happens, but i do know if i tried to audit that shit, i'd walk away telling everyone i understood nothing
Reproducible builds and hashes thereof aid in transparency and detecting when that transparency breaks down. Of course, it doesn't mean hackers can't hide malicious code in plain sight, but at least it makes it slightly harder to hide between the cracks as happened here.