Cases like this seem to confirm the approach LetsEncrypt took of only issuing certificates of a somewhat-short lifetime, which kind-of forces a user to fully automate the handling of certificates (monitoring expiration, taking measures to request a new cert in time, deploying the new cert,...).
The practice of issuing certificates with a (sometimes very) long lifetime, from one year and up, results in a situation where such automation is not strictly required, and complex bureaucratic processes can be put in place to replace certs, which becomes a major issue when 'emergency' revocations are necessary. I'd argue such bureaucratic processes don't even increase 'security', because in the end they rely on people performing manual operations (often with more rights granted than strictly required), whilst an automated system can be more easily vetted, tested, and locked down.
Aside from the necessity of enforcing good security policy here, it's brutal to observe the situation Actalis was stuck in based on the thread's ongoing comments. They clearly got themselves into bad/unsustainable deals with big customers where they made promises that couldn't be fulfilled in these circumstances, so their choices were to (likely) lose those customers + harm their customers' users, or to risk getting kicked out of the root program. And if they don't play their cards right it's possible they BOTH lose their customers and get booted out of the root program eventually anyway. Not a fun situation to be in, especially because in this case it sounds like they got screwed by a bug in third-party software and not specifically due to bad internal processes.
I think Actalis found itself between a very hard rock and an even harder place.
I am italian and I have worked with some public entities similar to the ones Actalis provided certificates to.
There is a private network "SPC" of public italian organizations, with many machine-to-machine HTTPS web services that MUST by law provide updates to the central government with quite strict deadlines.
On such networks, certificate pinning is very common and possibly even recommended, contrary to the "Basic Requirements" and recommendations of CAs.
Failing to respect such deadlines causes penalties to the local governments, and in grave cases may even be a crime: "public service interruption" which would initiate a trial, with more fines and possibly jail time.
Thus Actalis had to choose between:
1. follow the CAs "Basic Requirements" that force CAs to quickly revoke certificates when a problem is discovered. Then most of the certificates would be revoked before the public customers managed to replace them - disrupting their operativity, risking penalties for the missed deadlines and possibly trial and jail time for "public service interruption". To avoid this, they would then need to demonstrate in a public trial that the public customers were well informed that certificates could be revoked and re-issued at any time with very short warning time, and they did everything they could to avoid the "public service interruption", both pre-emptively (when negotiating the sell of certificates and educating the customers) and re-actively (when the serial numbers vulnerability was discovered). Quite a hard path.
2. contact the customers, push them to quickly replace the compromised certificates, and revoke them only afterwards, thus avoiding service disruptions.
They chose 2. Unluckily italian public organizations are very slow, which in the end caused Actalis to miss their BR deadlines by a long shot.
Thank you! Reading between the lines it seemed clear that just revoking the certs would have caused major infrastructure problems, but what you describe sounds severe. No wonder they chose to take the hit in BR deadlines!
Oh, this is nothing. A while back the browser vendors decided that since underscores weren't technically allowed in subdomain names, every CA who'd issued such certificates needed to revoke them all. It turned out that some of those certificates weren't terribly easy to replace. In particular, a whole bunch were in use by a health insurance enrollment system that was right in the middle of the main enrollment period and because of that could only receive changes that were absolutely essential. So the CA ended up missing the deadline to revoke them by a few months in order to keep this all working. The annointed enforcers of the CA rules were, of course, utterly pissed that their underscore pedanticism wasn't considered important enough to risk people losing access to healthcare for, pointing out that they could certainly deploy a fix if there was some critical security issue so why couldn't they do it for this?
Not a bug, but let's say a "shortcoming". EJBCA felt they'd clearly documented what this did, their users not so much.
And the defence against this stuff is curiosity - which is internal process. If you issue lots of certificates (say, more than a dozen) and you find that the "64-bit" integers in them actually only vary in 63 bits you ought to be suspicious. If Actalis (or other CAs) had declared "Hi, we found out about this after two weeks when we looked at our serial numbers more closely" instead of waiting for the problem with EJBCA to get called out explicitly I'd have _way_ more sympathy.
Likewise if you're sure you are implementing 3.2.2.4.6 Agreed‐Upon Change to Website, curiosity would suggest it's worth taking a look at some of those agreed upon changes and how they were verified, and how some failed. No failures at all? Well that's weird, let's look more closely - oh, we're counting 404 errors as success. Oops. (Yes a real public CA did this and in their case they did find it before someone else reported it).
I kind of feel for Actalis. It seems like they were caught between a rock and a hard place seeing as their customers were not/could not respond as quickly as hoped and revoking the certs could negatively impact end-users by preventing them from for example obtaining prescriptions etc. The language is dense for me but it also sounded like there was a reasonable explanation in the BR for the exception (paraphrasing: ‘negatively impacting a large swath of internet users’) but it didn’t seem to assuage the concern of Ryan. I hope the Actalis guy didn’t lose his job.
From the thread is becomes painfully clear how horrible Actalis is set up to act as a CA. Instead it seems they chose to break the BR by default. Almost 5 months to reissue a little over 250k certificates is not what you may expect from a CA that a major browser should trust.
The argument that there might be some end-users unable to renew their prescription seems mostly used to gain sympathy. Also this will most probably not be “a large swath of internet users”.
I do hope Actalis step up their game and regain some trust. Or it may become the next symantec.
* The baseline requirement is 64 bits of entropy and Actalis were providing 63 bits, i.e. only short by a single bit. It would seem unusual if the baseline requirements were a mere one bit of entropy from insecurity.
* The requirement for 64 bits of entropy is to reduce the risk of hash collision attacks [1] - which have only ever been demonstrated for MD5 and SHA-1, neither of which are used to sign certificates any more.
If web security was a tightrope, this would be like hearing that the second safety net, underneath the first believed-to-be-robust safety net, was found to be strong enough to catch a 900 lbs person, when it was specified for 1000 lbs.
Mozilla recognizes that in some exceptional circumstances, revoking misissued certificates within the prescribed deadline may cause significant harm, such as when the certificate is used in critical infrastructure and cannot be safely replaced prior to the revocation deadline, or when the volume of revocations in a short period of time would result in a large cumulative impact to the web. However, Mozilla does not grant exceptions to the BR revocation requirements. It is our position that your CA is ultimately responsible for deciding if the harm caused by following the requirements of BR section 4.9.1 outweighs the risks that are passed on to individuals who rely on the web PKI by choosing not to meet this requirement.
That statement "may cause significant harm" is what I expect weighed on the CA's mind. When revoking a certificate could kill someone, and there is still a high barrier to exploit (i.e. no "proven method that exposes the Subscriber's Private Key to compromise") it should be up to the CA to clearly explain the situation, and up to Ryan to accept the explanation given. ("It is our position that your CA is ultimately responsible for deciding if the harm [...] outweighs the risks")
Clearly Actalis was not in a position to articulate the harm, which is their fault.
That said, I'm fully aware of the compliance hoops that must be jumped through when providing updates to medical devices. If you have to distribute firmware to medical devices, 4 months can be a remarkably fast turnaround. But in that case, CA-issued certificates are probably inferior to self-signed certificates (on an organisational level) that are not subject to external revocation.
I agree it's on them to get it right. It just seem extenuating circumstances at least played a roll. I think I quoted the wrong part of the BR - it was adjacent to the large amount of users part - but was more along the lines of negatively impacting safety or security or somesuch.
From my limited POV, this seems like collateral damage from overblowing the trivial bug they used to beat the DarkMatter CA over the head.
I didn’t find the arguments of severity convincing then either. But the gist was that they need to be completely consistent and rigorous so it does make sense even if it is a massive inconvenience for people. Again.
> it also sounded like there was a reasonable explanation in the BR for the exception
I believe this is due to Actalis misunderstanding the exception. Mozilla provides an exception for exceptional circumstances which Actalis's obviously were not.
On one hand, this incident was a massive amount of work by probably thousands of people to replace all the revoked certificates. Certificates which are perfectly good for communication and do not pose any significant security risk.
On the other hand, allowing a CA to violate the BR's without pain will just encourage others to do so.
> Certificates which are perfectly good for communication and do not pose any significant security risk.
Is it so? I remember that in 2008 someone was able to create a rouge CA certificate because of the predictability of serial numbers[1]. It was a different time: we still used md5, but are you sure the limited entropy used to generate serial numbers does not pose any security risk?
The difference here is one bit. The BRs say you must use at least 64-bits of entropy, EJBCA out of the box used 63-bits. A bad guy might need to spend say $40 trillion to make a bogus cert instead of $80 trillion. No bad guys have $40 trillion so it's irrelevant. And that would be if we were still using SHA-1 (which is broken, and so the entropy is all that would keep you safe against collision attacks) but in fact Actalis and other CAs are only issuing with SHA-256 which isn't broken.
This is a Brown M&M ‡. It doesn't actually matter in terms of security, 63-bits, 65-bits, it's never going to make a real difference. But we wrote 64-bits in those rules, if we can't trust you to obey that rule, who says you got the really important parts right?
It's not that Actalis has not tried to obey, or purposefully withheld information or tried to mislead the community. The disagreement is on how strict the interpretation of the BR should be.
Would Van Halen abort a concert if there was a single brown M&M in a bowl of 1000? Probably not because even though it's a violation of their contract, they got their point across, it still means the organisers had read through the full contract and tried to obey.
Reading through the discussion, I wish I could be as strict as Ryan Sleevi is in demanding that browsers fix their incompatibilities with the web's BR (ehm.. standards). Chrome, there's this bug where this element is placed one pixel off from where it should be (it's by no means critical and does not impact users of any website in any meaningful way, but according to the CSS Box Model Module Level 3 spec, paragraph suchandsuch it's wrong). How about you fix it by next week or I'll uninstall you from all systems on the world.
In the 2008 attack, the CA was using sequential serial numbers. They weren't randomized at all.
The attackers had to do a large amount of computation to produce colliding certificates even when they knew exactly what the rest of the content of the certificate would be.
In the aftermath of that, we got MD5 deprecation and also a requirement that certificates include randomness that wouldn't be predictable to the subscriber, so that the subscriber doesn't know what the collision target is.
It's a little complicated to foresee the exact size of the benefit from this in different threat models, but in the model where the attacker has the capability to produce two related texts with the same SHA-256 hash, the current precaution means that the attacker has only a 1/2⁶⁴ probability that using that capability in conjunction with a certificate issuance will yield a matching certificate.
In 2008, certificate issuance usually cost money for the subscriber, where now it needn't, but there are still issuance rate limits and there's now Certificate Transparency, so all of the attempts will become public.
A bigger risk is presumably an n-way collision capability where an attacker can produce, not just 2, but n related plaintexts that all have the same SHA-256. In that case the attacker has an (n-1)/2⁶⁴ probability per certificate issuance that the issued certificate has the desired hash, assuming nothing unexpected or uncontrollable happens during the certificate issuance. (Another tricky problem, for example, is the time of issuance, which can be specified accurate to the second by the CA and appears in the certificate.)
Especially when nobody has demonstrated a SHA-256 collision or research that's close to producing one, and all attempts would be public in Certificate Transparency, and all CA issuance is rate-limited in some way, it doesn't seem like even 1/2⁶³ is that bad. Just five or ten bits of entropy in the certificate would probably have been enough to stop the 2008 researchers' attack from succeeding at all.
The attack would also have to have been carried out while the existing certificates were being issued (if there was no successful attack during certificate issuance, there won't be an attack after-the-fact).
I like tialaramex's brown M&M analogy: browser vendors are concerned with ensuring that CAs take rules and policies very seriously, even if there's no conceivable way that a particular problem could be related to an attack or vulnerability.
Could somebody explain to me why Mozilla (or whatever organisation is using bugzilla here) are in a position to dictate policy here?
If the majority of outstanding certificates were held by the Italian government, major banks and hospitals, what are the CA supposed to do if they're just told "No, you won't revoke the certificates until we're ready, we don't think the risk is worth it"? Further, reading a comment below on the usage of these certificates by the Italian state for mandatory reporting: it sounds like revoking could be considered a criminal offense...
This very much reads like a private entity mandating that tens if not hundreds of thousands of Euros are spent by the Italian state over a very minor security risk.
Somebody has to decide who is trusted. Mozilla (a not-for-profit) thinks that it suits their mission best if they're deciding, at least when it comes to their browser, Firefox, by default.
If you think somebody else should decide - maybe the Government of Italy, or the Queen of England, or Donald Trump, or you personally, then here's a few questions for your new Root Trust Programme:
1. Why? At least Mozilla's rationale is related to a fact, they make Firefox, so it trusts whatever they decide, what would be the rationale for why the Pope gets to decide?
2. Are they actually doing it? This is largely a tedious responsibility. But, if you decide to slack off, every Firefox user gets screwed. So, you know, you're going to need to put those hours in. Forever. I've lost count of how many people or organisations decided they could do better and didn't last a year.
3. Where's the transparency? The main way Mozilla stands out from the other big trust store operators (Apple, Microsoft, Google, and arguably Oracle) is that they're a not-for-profit and so they operate transparently. Your contributions are welcome at m.d.s.policy https://groups.google.com/forum/#!forum/mozilla.dev.security... where we are currently discussing the minutiae of Certificate Policy documentation. If your alternative is less transparent, how is that not worse?
I'm not saying Mozilla isn't a good organisation to run this, I'm saying it seems insane to have what seems like policies that don't allow for any proportional response.
I don't know how involved you are, but to a lay observer this story seems like Mozilla's policies are entirely black and white, to the benefit of nobody (except perhaps to reduce work I suppose, which is reasonable, but not really a valid reason in terms of security)
Is there no tiered approach to risks? Hell, in this situation it seems like more harm and risk will have been created by the rush to reissue certificates that would have been caused by this theoretical security vulnerability.
Edit: Actually, on further reading, it seems like the issue is more that Actalis didn't correctly invoke their right to this discretionary power?
You may also enjoy https://wiki.mozilla.org/CA/Incident_Dashboard , which all the CAs responding to such incidents need to be aware of, and which shows that there is a rather large amount of proportionality, based on an appropriate degree of transparency and communication.
Well, the Italian government agreed to those terms when they bought the certificates. If they didn't like that, they could've gotten them from somewhere else, or maybe set up a CA of their own, governed according to their own policies. Sure, then their CA wouldn't be pre-loaded in browsers, but they also wouldn't have to bother with pesky details such as responding in a timely way to incidents.
CAs are held to a strict security standard. Nobody is forcing any entity to act as a CA - if you don’t want that kind of responsibility, you don’t have to be a CA. But if this stuff isn’t taken seriously, the padlock icon means absolutely nothing.
My question was why this "regulatory agency" (without statutory powers) believes it is completely acceptable to cause direct harm without any discretion on the size of the risk.
What of the much greater potential harm of allowing non-compliant CA's? A CA's 'customers' are not just the people it sold certs to but also everybody on the internet who uses a browser. One way to read this incident is as a story of this inherent divided loyalty.
Browser vendors effectively are the ruling party of the Internet at the moment. And Mozilla can mostly only follow Google's lead, as Chrome is (basically) everyone's browser. It doesn't matter what you put on your server if Chrome, Firefox, and Safari refuse to accept it. Whether it's the trusted certificates list, or features of HTML and JavaScript, or decisions about what sort of web content will trigger the browser to block parts of your site, browser developers determine what the public will see.
The best part here too is that Mozilla's link on revocation basically says "we understand sometimes it's more risky to revoke according to our policy than to take a little longer, we just don't care and will utter our disappointment in you either way".
Sounds like a good insurance policy where your CA will risk their root program participation, and the managers their jobs to keep your certificates unrevoked for months:
> we have met a lot of resistances and compliances from the enterprise customers, mainly public entities, to whom we have wrongly responded by giving more time. Four months to replace some certificates is more than enough.
> As managing director I have just decided some organization and role changes with immediate effect: I have removed the SSL infrastructure and Operation managers
The practice of issuing certificates with a (sometimes very) long lifetime, from one year and up, results in a situation where such automation is not strictly required, and complex bureaucratic processes can be put in place to replace certs, which becomes a major issue when 'emergency' revocations are necessary. I'd argue such bureaucratic processes don't even increase 'security', because in the end they rely on people performing manual operations (often with more rights granted than strictly required), whilst an automated system can be more easily vetted, tested, and locked down.