at my previous company, and on the current project, we have explicit audit and mirrors for every dependency. it's a ton of work, especially at first. the developers, especially the front end, sort of resent it. but for our particular niche, especially previously, we put a priority on knowing what we are running and requiring importers to take an active, documented role in vetting that they pull in instead of just grabbing whatever.
the _entire_ ecosystem works against you if you want to do this. it's pretty incredible.
I did this same thing for the last large system I worked on. All third-party packages got audited and checked into our CM system before they could be used.
Also, I prohibited third-party HTTP/S requests, such as to pull a JS package from a CDN. I did this not only so that we could audit it and assure integrity and availability, but also so that we weren't leaking information to third parties under ordinary circumstances.
(Little details: I had some naming conventions for versioning so that multiple versions of a package could be used simultaneously in a checkout. For the JS packages, it simply making a subdirectory named after the package name and version number; for the main application language, it was more complicated, to support their package system better. I made the main backend use thin wrapper modules for each of the third-party packages used, which effectively specified which version to use (with zero runtime cost from the wrapper), since the language platform package system didn't have some kind of manifest to do this.)
our jenkins can't talk to anything except github. it's isolated.
we use git for everything; our "versions" are git hashes. we maintain a manifest for the state of the repo and dependencies. this allows us to upgrade versions of dependencies as part of the normal workflow with ease (just a pull & checkout).
hehe, agreed. But I think the chickens will come home to roost in the next few years. All those no-longer-maintained libraries still being imported; such a huge opportunity for mayhem.
And still no way of avoiding it, apart from auditing every single import. Which hardly anyone does, or does properly.
I think that the test cases included with packages might have the advantage of being able to obfuscate URLs or other strings as benign test dummy data.
This would be especially easy by using the technique called string sampling that the author mentions. I could choose a "Lorem ipsum" like text for use as dummy data, but ensure that the first letter of every word, when combined, forms the domain name of a server that will be used to download a second malicious payload.
This is why we designed TUF and in-toto to detect MitM attacks anywhere in the software supply chain between developers and end-users themselves, and provide E2E compromise-resilience.
It's strange that the paper doesn't mention us considering that we have considerable expertise in this very area.
Whilst TUF absolutely does help with some of the cases in the paper and generally, it's important to notice that at least one of the scenarios in the paper may not be covered by solutions like TUF.
I'm thinking of the scenario where a bad actor takes over an existing library with the original owner's blessing, either by contributing and then taking on maintainership, or via payment to the original owner.
In that case ownership of signing keys may transition to the new owners voluntarily, so there would be no noticable change, in terms of signing of packages.
That's a bit like saying: well, encrypting the iPhone isn't all that jazz, because all I have to do is hit the owner with a $5 wrench.
I mean, yes, but cryptography alone cannot solve that problem. TUF and in-toto provide cryptographic solutions to cryptographic problems, which is much more than anyone else is doing today.
This reads a bit like saying "it's strange that this academic paper doesn't advertise our private company." Especially since TUF and in-toto don't seem to handle the core issue of having to use open-source libraries written by untrusted developers.
What is the effectiveness of obfuscation? My understanding is that the existing dynamic analysis tools can usually defeat anything obfuscated within O(1 day).
And what kind of automated dynamic analysis of popular dependencies do you expect to meet?
And who will analyze its report and prevent the malicious package to be uploaded to the repo?