On Safety Critical Software

roymurdock · on Nov 8, 2019

Bait and switch. Author starts with one sentence on safety critical systems then transitions entirely to discussing challenges with insecure, non safety-critical software such as mobile phone OSs, email services, and web browsers for important people (e.g. politicians and policy makers). Yes safety and security are intertwined, no they're not the same and it's just confusing the matter to talk about functional safety-criticality in the context of secure communications. People who need secure communication channels often do their research or have teams to recommend or build systems for them, I don't think it's necessary for Google Chrome to tell the general public what tradeoffs they are making between usability and security (nice to have somewhere in the documentation? sure. but not necessary).

tptacek · on Nov 8, 2019

Your "bait and switch" is the author's whole point: that we intuitively understand the safety criticality of embedded infrastructure components, and have expectations about how they're designed, implemented, and maintained, but don't recognize when those expectations are implicitly imputed to commodity software that happens to serve those roles for some users.

If every once in a while a nuclear power plant wound up using an Internet-connected Mac Mini for its control systems, we'd immediately see the problem. But we don't see that problem when browsers and email systems get embedded deep into our political system, thus becoming (perhaps unintentionally) critical infrastructure.

frenchyatwork · on Nov 8, 2019

> People who need secure communication channels often do their research

That only applies to the fraction of people who know they need secure communication channels or that insecure communication channels are even a thing. The article lists references to several cases where people didn't have adequate security and it caused them real harm.

To be fair though, I'm not sure Google or Mozilla are capable of communicating the security trade-offs they're making without putting a huge amount of marketing spin on it, but we should still hold them accountable.

maqp · on Nov 9, 2019

I couldn't agree more. It's a rare sight to see a secure communication tool be completely open about the complex threat model of modern world.

When I started working on TFC (a messaging system that utilizes high assurance architecture like hardware-enforced TCB splitting and isolation), I wanted to be completely transparent about the threat model:

https://github.com/maqp/tfc/wiki/Threat-model

I knew not everyone would read it but not including it could do more damage to the user. I could've just reasoned "well, it's the most secure alternative out there, so why bother", but I assumed the system would be surpassed at some point, so I wanted people to know when that would happen.

Perhaps, in a way I realized in order to distinguish the project from others, and to help people see the benefits, I would have to teach people about the threat model.

In a way even completely unencrypted apps like Palringo could be safe if they would communicate the threat model 100% transparently, i.e. they would teach about the different attacks, and with every step, they would tell you that the app does not protect from such or such an attacker. One problem here is of course, people don't really use the app for its security features, but to ease their lives. It's not that they don't care about security, they just don't evaluate it proactively, but switch to something else, reactively. So I no longer think transparent threat model is enough: the application should do what it can to protect its users.

A reasonable limitation isn't "group chats can't have end-to-end encryption" or "multi-device app can't have end-to-end encryption", like Telegram shills/fanboys tout, this is just lack of good design.

A reasonable limitation is "networked TCB can use E2EE for everything, but can't protect from remote key exfiltration with zero days".

One responsible thing to do, would be to include recommendations to alternative products with security-convenience trade-offs that can't be solved with intelligent design. A kind of "If you're concerned with repeated code delivery problem our Protonmail web client suffers from, please consider using the native mobile apps we have. If OTOH you're concerned with endpoint security, look into traditional PGP with airgapped computer. Refer e.g. to what hak5 did with QR-code based ciphertext transfer".

This alone would set protonmail among the best applications, but instead they choose to lie by omission that the JS web-client has equivalent security to their native clients. The threat model between the two apps is large enough to necessitate some kind of notification, but unfortunately they not only haven't addressed the issue, they refused to address the issue when it was pointed out to them. IMO we should consider such companies selfish and greedy.

Spooky23 · on Nov 8, 2019

The point is, people lives are often in the hands of fragile software that doesn’t meet the standard of a safety critical system.

CivBase · on Nov 9, 2019

The article mostly focuses on software security. "Safety critical" software standards generally focus more on stability than security. Security is part of it. However, "safety critical" standards are overkill if you just want to improve security.

Regardless, I doubt the most popular consumer-focused software will ever meet the same "safety critical" standards as medical or avionics software. I work on safety critical software and the bureaucracy that comes with it drives innovation to a snails pace. As long as the average user is willing to give up a little stability for cutting-edge features, developers who prioritize "safety critical" standards simply wont be able to compete.

WalterBright · on Nov 9, 2019

The article is about security, not safety critical. A couple articles I wrote on safety critical software:

Safe Systems from Unreliable Parts https://www.digitalmars.com/articles/b39.html

Designing Safe Software Systems Part 2 https://www.digitalmars.com/articles/b40.html

glitchc · on Nov 8, 2019

This is an article written by someone who’s never actually worked on a safety critical system. Such systems have an exhaustive threat model with likelihood estimates developed and verified well before a single line of code is written.

tptacek · on Nov 8, 2019

As someone who has (perhaps like the author) done security assessment work for safety critical systems, I sure would like to hear more about these exhaustive threat models with likelihood estimates, because they sure didn't seem to have much to do with preventing me from popping a shell.

Tomte · on Nov 9, 2019

Security is still dismal in typical industrial applications that require safety.

Typically, safety field buses handle the threat of sabotage with „the field bus is inside the factory, so nobody has access“. Of course, people hook the production net up to some other network, so that managers can view process information from their office.

You can easily impersonate network participants, just by sniffing the unencrypted telegrams and generating a new telegram with the correct sender id, consecutive number and CRC. Suddenly your emergency stop doesn’t work anymore.

Security gets more important, though. Customers demand certifications and pen tests. So manufacturers „harden“ their systems. This often means not much more than updating used open source software to reasonably new versions and getting rid of MD5 or SHA1 password hashing.

It‘s a total shitshow, and a very juicy target for pentesters and other security professionals. I‘ve often thought about entering that field myself, but at least with my current employer, I have zero opportunity to do so.

djcapelis · on Nov 9, 2019

I agree with you but I think it’s useful to walk through the traditional model to understand why so many people think this, and why it totally fails in software in practice. In traditional safety engineering a wide ranging fault tree analysis (FTA) or other analysis models detail every potential fault in a system and provides estimated percentages of what types of faults occur over a total system lifespan and the system isn’t something that is considered safe until sufficient mitigations are put in place to allow the whole system to drop below the required threshold for a certain criticality of incident over its lifespan. This work is legitimately awesome and has allowed an amazing amount of reliability engineering that keeps us safe in so many weird ways each day as we do simple things like walk through an automatic door or drive down a freeway or enter any large structure.

Unfortunately none of these models assume the presence of an attack or malicious entity. So they painstaking provide a model for making sure your bridge doesn’t fall down on its own but are generally pretty useless for ensuring someone can’t blow it up. This is considered fine in a lot of safety critical engineering work which is closer to reliability engineering than security engineering.

Malicious behavior in the “digital realm” (I hate this phrase but here we are) has such a low barrier to entry that every single one of these models fall apart there and a threat model which doesn’t assume potentially malicious actors and behavior is something we’ve come to realize is basically negligent. The pure explosion in complexity and states possible in the FTA when analyzing software security with actual adversaries makes them... not particularly useful. In addition the sharply binary outcomes (if you change the bit you just outright win) in so much of software design make it really hard to apply anything resembling traditional engineering controls or mitigations. Where with a bridge you can beef up some concrete and the numbers which affect the outcome change, in software you can’t really beef up security in a real sense by just removing some but not all of the software weaknesses.

We could decide that none of these things work well together, but I actually think the futility of trying to construct an FTA model for software leads to some very supportable and radical conclusions about how to actually build secure systems. Many of which involve just distrusting most software pieces entirely (or carefully and selectively trusting which tasks) and looking at high level FTAs to understand how to compose larger systems that maintain some key properties or state without ever trusting say, the entire codebase of an operating system kernel.

pkaye · on Nov 8, 2019

What kind of safety critical system was it?

tedunangst · on Nov 8, 2019

You popped a shell, therefore it wasn't a true safety critical system.

NovemberWhiskey · on Nov 8, 2019

Although we cannot preclude the GP's experience, as a former developer and verifier of safety-critical systems, my experience (from the 2000s) agrees with the parent post.

Safety-critical systems (at the time: DO-178B design assurance level A in airborne systems) were about custom processor boards based on mature technologies (PowerPC was popular; caches often disabled for maximally predictable peformance), quadruplex redundancy, software with 100% MC/DC test coverage, verification of traceability between source code and object code, zero-runtimes, no dynamic memory allocation, extensive time budgeting analysis for control loops, worst-case stack utilization analyses, hardware watchdogs, extensive built-in-test and multiple independence objectives (verification team had to be independent of implementation team etc) as well as formal methods like data and information flow analysis.

These systems are virtually nothing like conventional "information technology" systems.

tptacek · on Nov 8, 2019

This is approximately what people used to say about EAL4+ certified products, until everyone realized you can just look at the list of EAL4+ products and the CVE database and compare.

Look, I don't doubt that there are avionics systems that are secure by design (largely because their inputs and their functionality are extremely constrained). But that's a question-begging definition of "safety critical", because --- my expertise isn't in avionics but I've done a fair bit of utilities and medical work, for example --- those "safety critical" systems tend to be embedded in larger distributed compute systems that are riddled with vulnerabilities, with an ultimate systemic security result of "some doofus on the Internet can blow up a transformer".

So if the argument is "you can in fact built safety-critical systems by massively expanding the cost in money and time to built drastically simplified and less useful components", sure, you win. But if your argument is "the safety critical industries all know how to field software-based infrastructure that is safe from attack", then, based simply on personal experience, I'm going to call "shenanigans", because no, they cannot.

pnako · on Nov 8, 2019

Safety-critical systems and secure systems are completely orthogonal. Although the article makes a good point, I don't think it was very sensible to use the expression "safety-critical software", for this reason.

In a safety-critical system (which might include software), you are trying to protect yourself against mother nature. Which is tricky but doable, because we can predict it. Generally, the security assessment is minimal because the system is assumed to be "behind the firewall": in a secure cabinet in a plane, in an access-controlled building or factory, etc.

Of course, all hell breaks loose the day someone decides to connect that system to the Internet, where thousands of smart humans with bad intentions try to break it.

I think the author would have helped his point better if he simply talked about secure software, not safety-critical software.

killjoywashere · on Nov 9, 2019

> Safety-critical systems and secure systems are completely orthogonal

This strikes me as the most true statement in this conversation. I feel like I need this as a bumper sticker. I think a significant issue in this space is figuring out how to get the security people to put as much skin in the game as the safety-critical people have. The security people often get to continue paying their mortgage as long as they say "no". And as a physician leading a development team, I really struggle with this. If the budgets need to go up to solve the compliance problems, that's fine, but the compliance people seem to always have one more tissue-thin layer of requirements to add. It's like trying to drill your way out of a growing onion.

tptacek · on Nov 9, 2019

Compliance people and software security people aren't the same people. Software security people aren't generally in the business of saying "no"; they start with the requirements of the system, established elsewhere, and try to ensure that the implementation of those requirements doesn't cough up calc.exe.

irundebian · on Nov 10, 2019

Saying "no" because of compliance requirements is imho pretty similar to saying "no" because a software bug was revealed. Both don't help much in building secure systems.

At least with certain compliance requirements you can focus on avoiding certain bad practices or bug classes while with finding some bugs, in most times, you can't be sure you have found all relevant bugs. Ideally bug hunting acts as a validation of compliance requirements, to check that these are fulfilling their overall goal, and to improve them, if not.

killjoywashere · on Nov 9, 2019

Poor control of nouns on my part, fair enough. I'm happy to hire SWEs with an interest in secure systems. But how do I move the compliance piece away from static pre-defined standards and toward something that supports a devops development paradigm? What's the mutually agreed trust-but-verify layer? Testing? And how do you get that "MVP" out the door in that case?

NovemberWhiskey · on Nov 8, 2019

I absolutely concede that last point - we (the software engineering profession) do not know how to make software that is robust in extremis at prices that are considered reasonable, and we do not know how make some types of software robust at all.

Veserv · on Nov 9, 2019

Anybody who said that about EAL4+ systems mischaracterized the standard. The standard only specifies resistance to casual and inadvertent attack at that level which is consistent with the observed outcomes. It is only at EAL5 that the standard specifies resistance against moderately skilled attackers.

Note that this is not disagreement with your later point. Most safety critical industries deploy software that is recklessly inadequate with respect to security and have no idea how to achieve the required level of security.

kahlonel · on Nov 9, 2019

It’s not “safety critical” unless there’s a chance of someone dying as a direct result of the system’s failure. Stop trying to mix two totally different ideas together.

Tomte · on Nov 9, 2019

Injury or environmental damage also counts.

Mnemonic: Safety is concerned with whether the system can harm its environment. Security is concerned with whether the environment can harm the system.