Browser checks cache for a file with a matching name and hash, ignoring site (this looks in a special cache for files loaded with hash-lookup, you can't check for arbitrary web resources). If there is one, it's used. If not, it loads it from the fallback URL, checks the hash, and catches it.
Now, this would create potential for hash collision attacks. However:
* If hash algorithms found to be weak, browsers can disable them
* Unlikely to be effective given users will usually already have a file cached
This was already proposed and got rejected because of cache poisoning issues.
IF you're able to create a hash collision and IF you're able to deliver (for example) jquery first, your malicious version would be cached and injected into every page that uses the targeted jquery version and makes use of this feature.
This isn't simple, but still an attack vector with huge impact, if successful.
Also:
* If you keep the file forever, you've poisoned this hash forever. If you clear it sometimes, there's a short time window in which you can insert your malicious version.
* If you target an old version of jquery, you're increasing the chance the browser hasn't seen this file yet or forgot about it, to mitigate the poison-forever issue.
If this was even a remotely feasible problem with modern cryptographic hashes, DNSSEC, TLS, SSH, package management systems, most authentication systems, etc. would all be dramatically broken.
If I had this capability I wouldn't waste it on injecting javascript into web pages. I'd create forged browser upgrades and go from there.
Every website using jquery will have the real hash, so you can poison all the mirrors you like and it won't matter.
The only way to get the wrong hash onto sites is to actually publish it on the authoritative server. That's not cache poisoning, that's a malicious official version.
If people start making hash collisions with modern quality hashes, many programs are in serious trouble. Git and Mercurial for a start assume the non-existence of hash collisions.
Given that SPDY is pretty well supported by browsers, and that HTTP2 will have similar features... it may be better just to serve all of your own scripts/css on the same domain via https.
As for your suggestion... the browser can easily cache the CDN resource as it does now... without a complicated new protocol. The main difference being the integrity check. All of that said, as others have suggested, another nicety would be a local-src="" url, that could be used as a fallback if the cdn resource fails the integrity check.
Indeed, I was imagining recently that an HTTP2 server, properly configured, should actually rewrite pages that refer to third-party assets by first retrieving those assets, and then sending its local cache version.
This would also work pretty well for a multi-site reverse-proxy like Cloudflare, where it could hit cache for commonly-used asset URLs even if your site in particular was evicted. Hey, wait a minute...
It doesn't have to be a simple hash either. You could have the author/release engineer sign it and then use the signature instead. This somewhat mitigates simple hash collisions
> Isn't it what SRI does ? (plus SRI is backwards-compatible)
No it's not. SRI means you keep relying on multiple CDNs, so you keep having cache fragmentation, CDN unavailability continues to be a problem, and uncached items still require a second connection.
I respectfully disagree. The SRI spec doesn't tell you how to implement it [1]. In fact it allows implementing everything you described. A browser could decide to completely disregard the src= element and load the resource based on the hash matching a file already present in cache (no matter what CDN it comes from). The end result is equivalent: the same content is interpreted. That means no more downtime when a CDN is down, no more cache fragmentation, and so on.
On a side note: there is no need to match the file name, only the hash matters.
[1] http://www.w3.org/TR/SRI/#conformance: "Conformance requirements [...] can be implemented in any manner, so long as the end result is equivalent"
Accidental collisions in a 256 bits space are not practically possible. You have 2^100+ higher chances that your CPU or RAM accidentally corrupts some data that leads you to execute the wrong JavaScript anyway, or that a meteorite falls on your head at this instant. It is utterly pointless to check the file name for this reason. If you worry about SHA256 collisions, you worry about the wrong things. That's why all modern crypto used everywhere around you already assumes SHA256 collisions are currently impractical.
All you need to do is to have an option to migrate to stronger hashes the day attacks start weakening SHA256, which is why SRI uses a "sha256-" prefix so we can migrate to stronger hashes when needed.
* Loading common resources is fast - already loaded by other sites
* If resource isn't cached, the browser doesn't need to open a second connection to a CDN, it can grab the file from the same site
* Proliferation of multiple CDNs ceases to be an issue, as each does not have separate browser cache
* Evil CDNs cannot provide bad JS as integrity is checked and the site isn't relying on a CDN anyway
* If a CDN goes down, your site isn't broken, because the fallback can be to a local file
Would work something like this:
Browser checks cache for a file with a matching name and hash, ignoring site (this looks in a special cache for files loaded with hash-lookup, you can't check for arbitrary web resources). If there is one, it's used. If not, it loads it from the fallback URL, checks the hash, and catches it.Now, this would create potential for hash collision attacks. However:
* If hash algorithms found to be weak, browsers can disable them
* Unlikely to be effective given users will usually already have a file cached