Memory Issues: Episode III - Revenge Of The Buffer Overflow (2016)
In all seriousness, repeated occurrences such as these should make people consider other languages for new projects. Despite being quite a C fanboy, I have to admit that even I think manual memory management should be a thing of the past now.
I'd almost go so far as to say you need to do a security review before you write one line of code. You need to evaluate if C is a requirement or a preference, and if the security of the project can withstand being written in C (e.g. edge code that may be attacked).
The argument that C (and C++) can be written well or flawlessly is tiresome. "Perfect code" is a fallacy that has been proven over and over again to be untrue and dangerously untrue. You should assume the code quality is less than optimal as your starting point.
I am not promoting any specific alternative (Rust, Java, Go, etc) I am just saying that C/C++ are insecure by design and should be avoided unless they're a project requirement.
To be fair, most of the vulnerabilities found in C codebases are preventable with static analysis, bound checking, etc. If you are ok with it getting as slow as Go, you can get similar safety guarantees as well.
They and Cambridge CHERI team have hardware modifications, too, if you want to run C apps on FPGA CPU with inherent safety. Software-only, though, has significant performance penalties.
Specifically, Rust does runtime bounds checking when you index into an array. LLVM is theoretically capable of optimizing these checks out if they're unnecessary, but it's far from guaranteed. It would take a dependent type system to perform compile-time bounds checking, and Rust isn't that sophisticated.
However, indexing is far, far less prevalent in Rust than it is in C, due to the existence of iterators. I've asked the Servo team on multiple occasions whether the cost of bounds checking shows up in performance profiles, and they never have, likely because they just aren't doing very much indexing to begin with.
Since Rust does not provide sec-libs in standard distribution nor they are available in larger ecosystem, Rust benefits seems theoretical to me. As with other things in theory C/C++ code could also avoid memory/security issues.
I don't see any other way to read this other than "there's no point in rewriting security code in Rust because security code in Rust hasn't been written yet". That, obviously, doesn't make sense to me.
You do not see any other way to read because you are looking to argue over innocuous statement which is not even false. Security code in Rust could be great and I will be happy if and when it is available for general use.
I agree with pcwalton, that's how I read it too. The context of this thread is security vulnerabilities in OpenSSL, so it's quite reasonable to read your parent comment exactly as pcwalton did.
They do exist in the larger ecosystem, they are just very young. Also, we devs have had "don't build your own crypto" beaten into our heads so often lately that I think people are generally wary of starting new, or using new, projects that haven't already been battle hardened.
I've looked at the OpenSSL code, and I have to say that it's not well documented in many cases and easy to make mistakes in it's usage. The rules on when locks are needed vs. not are not always clear, and then the ownership of memory is as confusing as it's always been in C.
This is where something like Rust would shine just by cleaning up these interfaces. The Rust OpenSSL bindings, http://sfackler.github.io/rust-openssl/doc/v0.7.10/openssl/ , for example are very sane, and it's difficult to screw up the usage.
Do you mean Rust-openssl bindings will not require patching openssl for Rust apps in case vulnerabilty? In that case Rust surely helps applications written in it.
What are "sec-libs"? From the other reply it seems "sec-libs" are the hip mounted pouches of enchanted crypto dust so you can sprinkle it around and get "security".
> To be fair, most of the vulnerabilities found in C codebases are preventable with static analysis, bound checking, etc.
You can work in an ad-hoc dialect of C based on particular tools that will be a bit safer than standard C, sure (though probably still not as safe as an actually memory-safe language). But at that point most of the advantages of C no longer apply: you don't have a huge library ecosystem, you don't have a supply of experienced developers, you don't have a bunch of standard automated tools that work with your dialect.
> If you are ok with it getting as slow as Go, you can get similar safety guarantees as well.
There is no native/standard/supported-by-tools way to do tagged unions, so whatever you do C will always be less safe than languages that have native tagged unions.
I don't think "most of the advantages of C no longer apply [if you use static analysis or bounds checking]" is fair.
Even if your claims about losing libraries, experienced developers, or automated tools was true (which think is false), you still get portability, close-to-the-metal code, full control of your processes, direct access to the native ABI, full access to the platform facilities, fast compilation, small executables, standalone libraries, memory layout control, ...
C code is less portable than most of the alternatives. In the rare cases where the things you list are hard requirements there are still safer options e.g. Ada.
Plenty of alternatives are self-hosting, but that's neither here nor there. Even if it's a major effort to port e.g. the Python interpreter to a new platform (particularly the first new platform - once you have a C codebase that enforces portability over many platforms it's much easier to add a new one). But that major effort has by and large been done for even relatively obscure languages and relatively obscure platforms. Whereas if you write in C you get to do that major effort for yourself for each program.
More portable means "can be ported to new platforms more easily", not "has been ported to new platforms by someone else".
A C program (in general) is more portable than anything written on top of C because to port, say, Python to a new platform, one has to have C there first.
It has nothing to do with whether the work has been done by you or someone else.
Of course, one can write code in either language which is not portable, but that is also beside the point.
> A C program (in general) is more portable than anything written on top of C because to port, say, Python to a new platform, one has to have C there first.
Port C compiler -> port your C program vs port C compiler -> port Python interpreter -> port your Python program. The latter can still end up a lot easier and cheaper, because the Python interpreter is already multi-platform and Python programs tend not to have much if any platform-specific code by the nature of the language.
> It has nothing to do with whether the work has been done by you or someone else.
It has everything to do with that. Ultimately via Turing equivalence it's possible to port anything to anywhere, so when we're talking about "portability" we must be talking about how much it costs to port a program to a new or existing architecture.
> Of course, one can write code in either language which is not portable, but that is also beside the point.
Again, it's very much the point, because it can't possibly be a yes-or-no thing. How costly is typical/idiomatic C code to port? How costly is typical/idiomatic Python code? Those are the questions that matter when we talk about language portability.
By definition, the C language is more portable than the Python language, because in order to run Python on a system, C must run there first.
Unless you are talking about a Python not implemented in C, or a C compiler that was created to only compile the Python interpreter (and not all of the C language), then there is no other way around this simple fact.
A standard-conforming program should be part of the topic here, not "typical/idiomatic" non-objective software. Just because most people may not write portable C code doesn't make the language itself more or less portable. It just makes those particular programs less portable.
Plenty of C code is highly portable. Plenty is not. Same can be said of Python. It is moot. The language is under discussion here, not any set of specific programs written in the language.
Also, beside that simple logical conclusion, it is 2-3 orders of magnitude easier for me to port a C program I wrote to a new platform than it is for me to port the Python interpreter just to support my Python program on a new platform. Nevermind all of the features I would have to port (or neuter) inside Python which my program may not even use.
> By definition, the C language is more portable than the Python language, because in order to run Python on a system, C must run there first.
What definition are you using, and what practical use is your definition? You seem to be defining "portable" in some absurd, irrelevant way so that your preferred language "wins", regardless of what that actually means.
> A standard-conforming program should be part of the topic here, not "typical/idiomatic" non-objective software. Just because most people may not write portable C code doesn't make the language itself more or less portable. It just makes those particular programs less portable.
> Plenty of C code is highly portable. Plenty is not. Same can be said of Python. It is moot. The language is under discussion here, not any set of specific programs written in the language.
The standard is just some words on a page. The programs and tools are what give it meaning. If 90%+ of the things we call "C programs" don't conform to the standard, it's not reasonable to treat the standard as the definition of a "C program". And for any practical, real-world decision like "should I write my program in C or Python", the typical/idiomatic is the question that matters.
> it is 2-3 orders of magnitude easier for me to port a C program I wrote to a new platform than it is for me to port the Python interpreter just to support my Python program on a new platform. Nevermind all of the features I would have to port (or neuter) inside Python which my program may not even use.
How often do you port things to a platform on which the Python interpreter doesn't already build? And how large are these programs for the 2-3 orders of magnitude? (I guess small in any case if you're talking about writing a program yourself rather than in a team). The Python interpreter is actually pretty small.
There are C compilers for N platforms. There are Python interpreters for M platforms. Since the Python interpreter is written in C, N >= M.
The programs I have ported are more than a million lines. Not the largest, sure, but not trivial. And they run on platforms where Python does not.
The size of the C code or the Python interpreter code does not matter. What does matter is what subsystems are required. Python is general purpose and has a wide set of requirements (networking, file systems, process manipulation, dynamic loading, the list is quite long). Oh, and the Python list includes compiling C programs (for extensions). A second reason backing my absurd definition.
The C programming language requirements placed on the hosting environment are much smaller. So the language is more portable for that reason as well. It is beneficial to only port what you use, instead of having to port a monolithic interpreter, including the parts you don't need (which may not even run on the target platform easily or at all).
> There are C compilers for N platforms. There are Python interpreters for M platforms. Since the Python interpreter is written in C, N >= M.
Sure, but existence of a compiler is only a small piece of portability. As a FreeBSD user I'm very used to downloading a random program and finding it won't run on my platform. Happens a lot more often when the program's written in C than when it's written in Python.
I think portability is best understood as a measure of how difficult/expensive it is to port a typical program in that language to a new platform, because that's the question which is likely to be relevant in practice.
> The portability of a language is best understood as how difficult/expensive it is to port a conforming program written in the language in question.
What proportion of the things that are referred to in ordinary, everyday language as "C programs" would you estimate are conforming? Maybe 0.01%? If I'd meant "conforming C code" I would've said "conforming C code".
Writing conforming C is not a realistic choice for most use cases. E.g. there are very few experienced conforming C developers available.
I don't think any of your assertions are true at all.
There are lots of conforming C programs (just look at the huge list of software which compiles on a huge list of platforms).
There are tons of experienced C developers. Who do you think writes all of that conforming software?
And none of this is related to the topic at all.
If you really can't get your head around the simple idea that the portability of the language is different than the portability of some arbitrary program written in the language, then just look at it this way: If you want to write a program in a language, and you pick C, your program will run on more platforms than if you pick Python.
This is true, regardless of any other argument you might raise, simply because Python won't run on a platform until C runs on that platform (since Python is written in C).
We've been over this. Ad nauseum. Please stop trolling.
> There are lots of conforming C programs (just look at the huge list of software which compiles on a huge list of platforms).
And look at how much of that list breaks when a new version of GCC introduces a minor improvement in optimization.
> just look at it this way: If you want to write a program in a language, and you pick C, your program will run on more platforms than if you pick Python.
That is the question to ask. And your answer simply isn't true. It will (in the overwhelmingly likely case) run on more platforms if you pick Python. Certainly if you hold costs constant, which is surely the only way to compare. If you're willing to spend unlimited time and effort on portability then your C program will run everywhere, but so will your Python program (since you can just port the Python interpreter).
> This is true, regardless of any other argument you might raise, simply because Python won't run on a platform until C runs on that platform (since Python is written in C).
Not actually true (Jython exists and could run on platforms that don't run C), but it doesn't matter. It is overwhelmingly likely that the Python interpreter will run on more platforms than your C program will.
> We've been over this. Ad nauseum. Please stop trolling.
Exploitation of some of those issues can be prevented using the new RAP GCC plugin by grsecurity. Unfortunately, it is only available to paying customers.
Does anyone know how much it costs to be a "paying customer" at the individual level? Im not OVH with a sea of rack mounts, I am just interested in covering a handful of machines.
I'd only go so far as to agree that vulnerabilities which are bug related maybe _detected_ by static analysis.
Preventing vulnerabilities is an entirely larger problem not addressed by static analysis alone. Architectural security flaws are outside of the scope for static analysis. I'm not trying to nitpick semantics but in this case I think it's important to understand that, in this context, prevention and fixes as well as bugs and flaws need to be differentiated.
A lot of C++ projects aren't written in modern C++ and out of the few that are, even then, programmers will mix in older less secure C++ because they lack experience with modern C++. So habits can actually hurt security.
Ultimately when you have a language which mixes secure and insecure practises and lets the programmer decide, as an outside observer you have to assume the worst unless shown otherwise. C++ can be written very well, but C++ can also be written no better than C, it is project by project for which is which.
Other languages don't have this issue. If you see a Rust, Go, Java, C#, etc block of code you can make certain assumptions about what classes of security issues it won't have.
> Most of the problems with C you are implying here are non issues in modern C++.
No, they aren't. Use after free is just as exploitable and is a severe concern in C++. In fact, it's worse in C++ than in C, due to the ubiquity of vtables.
Everywhere. We have been using smart pointers for a long time.
Modern C++ doesn't add any memory safety protection beyond that which was already available with custom smart pointers in earlier versions of C++. In fact, I think modern C++ is less safe than earlier versions, due to new classes of potential bugs like use-after-move and the ease with which closed-over variable references in lambdas can become dangling.
Most the horrible security bugs in Java show up in the sandbox, where attackers can supply arbitrary code for you to run.
In contrast, Java as a server language has an excellent security record IME. The last public patch panic I can remember was in 2011 with the denial of service bug regarding parsing of floating points. There has been other security bugs regarding cryptography etc, I'm sure, but in general you can feel very secure running Java on your servers.
It is a shame that security bugs for both are bundled together, making every sandbox compromise a "Remote exploitable" vulnerability. The "applet" use case should probably just die, there is no indication that Java sandboxing will ever be secure, the design is unsound.
Java as a server language has a record of nasty serialization-related RCE vulnerabilities. Of course, they're in popular Java libraries used on the server rather than the language itself, just like this bug was in a popular C library rather than the language itself - but Java makes it very easy to accidentally write that kind of vulnerability. In fact, just loading two unrelated libraries that are individually safe sometimes create an exploitable RCE condition in Java. That's worse than even C.
No disputing that bugs can be written in any language. But by avoiding C/C++ you're excluding a specific class of bugs which have historically proved harmful.
You can write exploitable code in Java. But you'd actually have to try if you wanted Java to be able to write arbitrary memory or execute arbitrary code.
Essentially any bug that can be written in Java/Go/Rust/etc can be written in C/C++. But some C/C++ bugs are extremely uncommon in other languages, or you have to actually TRY to introduce them.
> But you'd actually have to try if you wanted Java to be able to write arbitrary memory or execute arbitrary code.
Depends on your definition of arbitrary. Higher level languages have higher level exploits. While injecting x86 shellcode into a java process is probably hard, many java applications have been vulnerable to serialization bugs which result in the execution of arbitrary bytecode.
It also needs to be said that this is generally not a reasonable reason to pick C over Rust. Memory-safe languages are effective defenses against these flaws.
>Bugs can be found in code written in all languages.
But not all languages frequently produce security vulnerabilities as a result of common types of bugs that are due to error-prone humans having to do things that should be done for us automatically in the year of our Lord 2016
Java applets have security issues today. That's a situation where you are allowing random websites to execute arbitrary code on your computer. Flash has the same issues. So don't do that.
Don't confuse Java applets (and the lack of security thereof) with the JVM as a development platform. I'd bet on the security of a Java application over that of a C/C++ application any day.
To be clear, are you referring to security bugs in the Java standard library (written almost completely in Java), or those in the JVM itself or the browser plugins (written almost entirely in C++), or in Java code bases?
The vast majority of the high profile Java security bugs have been in the second, which would be more of a ding against C++ than Java the language, wouldn't it?
I think it would be against Java in sense Java does not support writing high performance code like Java runtime / security code etc. Now it may not have errors as much as openssl but that argument will be about implementation quality not against C/C++.
To be clear, I am not a security researcher, and I haven't verified the severity of these issues. But in 2016 alone there are 16 CVEs which is 4 per month.
I'd say C is more of a symptom and exacerbating factor of the cultural failing. C makes it easy to achieve performance and time-consuming to achieve correctness.
When the people funding the work (and this includes people donating their own time) can see performance issues but not correctness issues, guess which ones get fixed?
There is absolutely no doubt in that. But people also join the Marines, so....
I'll speak this heresy - I think "people donating their time" has a bunch of problems, not least of which losing the data/information that would be gained from pricing that labor.
Yeah, it's possible to write flawless code. The problem is that it doesn't happen in practice. You know, where people are actually relying on this code that "could be flawless but isn't" so they can run businesses, maintain privacy, etc.
Which is designed to validate TLS certificates. This is doing the ASN.1 parsing and signature verification in a zero-copy, memory safe way, built on top of ring for the core crypto primitives.
Now, this isn't a full implementation of what you would need for replacing OpenSSL for certificate handling (and also isn't yet complete); in particular, this is extremely limited in scope as only being for client-side certificate verification. This particular OpenSSL issue was when parsing and re-encoding certificates, which is out of scope for webpki. But it is a good starting point for demonstrating memory-safe and efficient handling of complex tasks like parsing and verifying of TLS certificates.
The sad part is that already in the mid-60's and 70's, in computers much less powerful than the PDP-11, bounds checking wasn't a problem as such.
In the few cases where it was a real problem for the application being written, it could be selectively disabled. Which according to C.A.Hoare in his 1981 speech, most people didn't want to do anyway.
It only became a problem in the industry thanks to C.
No, it became a problem when 1) microprocessors evolved into being able to do real work 2) Vast hordes of the great unwashed (I'd say even including me ) were vacuumed into the resulting void and 3) tools vendors for languages with better safety furniture were found wanting. Delphi existed but was found wanting. Ada ads were in the back of oh so many magazines. But...
And I'd throw in 4) - the CS industrial complex failed to address this except to rail against it. The first CACM I ever got was the one with "Reflections on trusting trust." If a Haskell or Rust is the answer, it needs to be more interesting. And if you say "but Java", name your second; I will meet you on the field of honor at dawn :)
I personally found "nobody is going to save you; you are on your own" very liberating but perhaps that is unusual. "In order to live outside the law, one must be honest." - Bob Dylan.
Having written C for a living for about two years, I have to admit that a large part of me really likes C. But also, I would have commited unspeakable acts just to have a compiler switch to enable bounds checking, no matter how badly that would have affected performance.
Even when one knows what one is doing - or thinks so, anyway - these errors are so easy to sneak in and such a pain to track down. (Especially since we were using OpenWatcom whose debugger did not show useful data when the program crashed, which is usually the time one needs it the most...)
> I would have commited unspeakable acts just to have a compiler switch to enable bounds checking
-fsanitize=bounds in GCC6 (probably recent Clang also). Runtime overhead isn't even that high.
Lately I've been trying to use Ada for things I would use C for—it's quite a nice language that gives you all the power of C but safe by default (i.e. you generally have to try to shoot yourself in the foot). Also concurrency in Ada doesn't consist of “sacrifice kittens to Cthulhu and pray that it works”, which is a nice feature.
I had used valgrind before on a toy project to find memory leaks, but I had not used any of the other tools it offers. I remember that back then it did not work terribly well for bounds checking, but that was years ago. 2009-ish.
Electric Fence I have heard of before, but I never tried it. I will definitely come back to these links when I am working in C the next time.
Just curious - what do you do for a living that means you have to write C?
I'm interested in really learning C or C++ more indepthly. I feel like I am very proficient in high level programming knowledge (other than design patterns).
Basically, when or if the web dev industry falls out, I want to have a good back up of knowing low-level languages.
Anything in embedded systems. I write code for high performance satellite terminal (ground side) and also point-to-point radio systems (ultra-low latency, millimetre wave stuff). 90% of the code I write is in C, about 60% of the time on microcontrollers with <256KB of RAM, and the rest in a cut-down Linux running on an ARM Cortex A8.
I'd actually prefer to use C++ for the Linux stuff (std::string, smart pointers and STL containers would save me so much time), but we don't have the C++ standard library in our custom distro.
I imagine that, like with Modula's, you could probably turn off safety features where you needed for performance or real-time. I haven't confirmed it, though.
I used to work at a small company whose main business was building waterjet cutting machines. My job was maintaining the software used to mark bad spots on leather hides and placing cutting patterns on them. (Unfortunately, I spent most of my time trying to come up with a better way of placing cutting patterns on hides which involved a lot of computational geometry and basically went way over my head. Well either that, or it was just a really tough problem.)
I got the job kind of by chance, I put out an ad in a local newspaper that basically said "programmer / sysadmin looking for work".
If you want to learn C/C++, trying to look at and understand some of the open source code floating around might be a good starting point, since there is a lot of it. Pick up some project you find interesting and try to make a few modifications. Alternatively, try building a simple project of your own and take it from there.
C has its fair share of problems, and one needs to be aware of them, but it can also be a very fun language. I cannot say much about C++ either way, I find it a bit intimidating, but that's just me.
(Of course, if you are interested in lower-level languages, Rust is getting a lot of attention these days... it claims to offer much of the performance of C/C++ while avoiding most of their problems. I have not looked at it myself, though.)
Web dev will probably die down, and with robots and automation and IoT becoming more prevalent, embedded programming will probably become popular.
But hopefully by then embedded devices won't force us to us languages like C or C++. If the manufacture of embedded platforms scales up enough (more than it already has) then maybe we'll get devices with gigabytes of ram and insanely fast processors, so we can just write code in whatever language we like and have it be performant.
A pipe dream maybe, but it wasn't that long ago that people were exclusively writing in assembler when performance was critical.
There are Basic, Pascal, Oberon and Java compilers for embedded systems, but it is a niche market.
As for C and C++ being fast. They are now, but once upon a time their compilers generated very bad code for home computers, those that are nowadays used as embedded ones.
Yet mainframes less powerful than the PDP-11, already had better languages available.
So attack that from the EE side of industry. As long as it lasts, embedded has been pretty good hunting for a long time, but the culture shift you might be exposed to looks to me to be profound.
Maybe the machine is at fault? Maybe the problem is not with C, but with a hardware architecture that does not offer finer granularity than page allocations? What if everything returned from malloc() would be bound checked by hardware? What if, instead of overwriting a malloc_chunk or the saved value of the instruction pointer, a hardware interrupt would be raised? Maybe the return stack and the parameter stack can be separate? Maybe someone solved these problems in hardware before the majority of us was born?
The Intel 432 could do that (1982). Unfortunately, the Intel 432 was also slow and very complex (it's the only architecture I'm aware of that was object oriented at the microcode level).
The Intel 80286 could also do that (some of the features of the Intel 432 found their way into the 80286) but no one who lived through those days wants to return to those days (small, compact, medium, large and huge memory models anyone?).
The Intel 80386 (and really, any Intel CPU that can still execute 16-bit code) can do that. But then you are programming a glorified 80286 and well ... see above.
But okay, you want more details? You have segment registers, CS, DS, ES, SS (and the 80386 give you a few more) that define the base address and limit. To get byte granularity, you are limited to 65536 bytes and you are dealing with a very odd-ball 32-bit address (16-bit segment, 16-bit offset) or a 48-bit address (16-bit segment, 32-bit offset limited to 65,535). If you want more memory per segment, you lose the byte granularity (promoting sloppy coding, etc).
How the fuck would that work? Today, the page table is cached in a TLB and modern Intel processors only have 1024 entries!
If you have a new entry in the page table for every malloc, dereferencing virtual memory will require a page walk for pretty much every memory access. You can't cache millions of entries.
To those downvoting lmm, look up C11 Annex K. There are standards for adding bounds checking and the like to C, but compiler makers don't implement them because they're "too slow." We need to demand more of our tooling makers.
Those "features" would be built into the platform. Instead of having the return stack and the parameter stack share memory space, the stacks would be kept separate by design. Instead of having malloc_chunks, the MMU would keep allocation metadata separate from the data presented to the user space. Any process overwriting an allocation would result in a segmentation fault, not on a per-page basis but on a per-allocation basis. It would be enforced by the architecture. It would not be optional. It would be by design.
And I think the difficulty of manual memory management is vastly overrated. A bug is a bug is a bug. If OpenSSL is ... safety critical, then it should be treated as safety critical going forward. Tools don't make bugs, people make bugs.
Of course, someone's free to write a replacement for OpenSSL in Rust and see how far that goes.
FWIW, there used to be a rich suite of very respectable ASN.1 verification tools, at least the subset of ASN.1 used in SNMP.
> I think the difficulty of manual memory management is vastly overrated.
And your line of thinking is why we're going on 50 years of empirical evidence that people are terrible at manual memory management, and as with politics there's a pretty small common subset between those who believe themselves competent and those which should be trusted with it.
> A bug is a bug is a bug.
An error message is not a misprint is not a denial of service is not a DB penetration is not a crypto break is not a network ownage. "A bug is a bug is a bug" as long as your code is not used by anybody or trusted with anything.
Manual memory management can be done safely and economically. This really is as simple as "the thing in the loop which has agency is the human, so the buck stops there."
And a bug is still a bug is still a bug. This is not even a single point of failure; it's an ecosystem failure.
> Manual memory management can be done safely and economically.
There is very little evidence of that, and extensive evidence to the contrary.
> This really is as simple as "the thing in the loop which has agency is the human, so the buck stops there."
Not only is that exactly the opposite of one of the few groups which did manage to get somewhat good at this (the on-board shuttle group), it's also the incorrect and inane thinking which led e.g. surgeons to resist checklists. Again, your line of thinking has only led us to half a century of failure.
Agency is irrelevant, people are good at creative elements but terrible at systematic ones, yet you're pushing more systematic work onto the one piece of the chain least suited for it, then blaming it for its failure.
> This is not even a single point of failure; it's an ecosystem failure.
> Manual memory management can be done safely and economically.
Citation? You will need evidence to back this claim up, and the evidence shows that C and C++ apps are far more vulnerable to these bug classes than apps written in memory safe languages.
> This really is as simple as "the thing in the loop which has agency is the human, so the buck stops there."
This is like saying "we don't need instruments in our planes' cockpits because the thing in the loop that has agency is the pilot, so the buck stops there".
Memory safe languages are tools that help programmers not write these kinds of vulnerabilities. We use tools because we as humans are imperfect.
While C (and its love-child C++) bizarrely appears high in most synthetic programming popularity rankings, I personally doubt more than 5% of developers (if that) ply their days in it, or have more than a passing competency in it.
Everyone is programming in Java, C#, JavaScript, and so on. Aside from myself, I haven't a single professional peer who develops on C (anecdotal, of course, but this is a pretty big net crossing multiple cities and industries) in any real way.
It just happens to be that much of the most important software is written in it. Maybe there's something in that.
I wouldn't minimize the amount of C(++), or the amount of mis-counting for jobs where C(++) is a desired skill, but not primary in the job role. There are a lot of jobs out there working on embedded systems where C/C+ are king. Though I do hope that Rust gains more traction in that space to avoid certain classes of bugs.
I would guess that most development involves JS, as I would say that most development is directed in web applications of some kind. Though there are backend languages as well. For the types of development jobs I'm used to looking for, I see a lot more C# and Java, with some uptick in Node and Python. Excluding PHP (because shiver).
I have several friends who work in the embedded space, and that is not small by any means.
I developed most of my hobby projects in C until recently. It's not as rare as you think. C is a good language to think in.
Lately I've been switching to Ada, which I actually quite recommend if you like C. Shame it never really caught on, but using C libraries from Ada is trivial so there's not much of a library issue in spite of lack of popularity.
~5% of developers, while small compared to the whole, still encompasses about a million of the estimated 20 million developers worldwide.
The other poster rightly mentioned the embedded space, and that is absolutely true (and indeed it is where I gained my affinity for C and C++), however there are easily 20 middleware / web / mobile developers for every embedded developer.
I'm not sure whether you are sarcastic or not, but to me the seemingly endless amount of this kind of bug in high profile projects like OpenSSL is pretty good proof that nobody can write secure C.
Of course I am being sarcastic, you just need to check my other posts. :)
The point being, that in spite of what everyone says, not even with the help of the best tools to detect memory corruption issues in C, developers write memory corruption free code in it.
Are the openssl developers using the "best tools to detect memory corruption issues in C"?
In my experience, most open source projects don't devote the money or time to use, understand, deploy, and ensure such tools are used and the issues they find are resolved.
Actually, I think they're using the best tools to create memory corruption issues in C. There could in fact be a slow-moving fuzzer in their build system that transforms safe C constructs into unsafe ones before release. It's random and hits one thing at a time. Probably borrowed from mutation component of a genetic algorithm toolkit. I won't speculate if lack of focus or quality could indicate rest of OpenSSL code was produced by genetic algorithms.
No, it just takes a good developer to write secure C.
See: Qmail [1], which in 20 years only four bugs have been found, and only one of those was a potential security bug. Or djbdns [2]; similar lack of security holes despite being faster and safer than Bind.
It's far easier to write code in C than it is to write good code in C. It's a downside of the security of open source projects that the contributors are those who care enough to volunteer who can contribute code, rather than restricting the team, or at least those who can approve code going into the core, to vetted experts.
I'm a good developer, but far from the best, and glancing through the OpenSSL code that I've seen, I would never have approved most of it in code review. There needs to be someone at least as good as I am reviewing every last line of code submitted to OpenSSL. Even better if it were someone much better than I.
All the more reason not to rely on security properties of code written in C - I don't have time to read and understand the source of every piece of software I use, and popularity is completely useless as a proxy for code quality. Being written in a memory-safe language isn't a silver bullet for all problems, but it actually _is_ a silver bullet for memory corruption problems.
Memory corruption problems are a bit like infant mortality in the history of the medical profession - people that live past infancy still died, but infant deaths overwhelmingly impact the historical life expectancy statistics. Let's get our industry past the dark ages so we can start to live longer and begin to tackle more interesting causes of death.
EDIT: fixed very silly typo (which I'm really surprised i wasn't instantly called on - it basically made me say I wasn't willing to understand my own code. thanks for the generous reading!)
There are plenty of codebases (including large ones) written in C that have a low number of vulnerabilities, even compared to projects written in higher level languages (setting aside Rust et al because they aren't that popular yet). It requires discipline and care, but it's not impossible or even that hard if you're a skilled C programmer. OpenSSL has louder vulnerabilities than most software because it's (1) very old, (2) very bad code, and (3) relied upon by heaps of software. There are lots of things wrong with OpenSSL, and those are the reasons that it's vulnerable. We should address the real issues instead of using C as a scapegoat.
> It requires discipline and care, but it's not impossible or even that hard if you're a skilled C programmer.
I have been reading about this mythical "skilled C programmer" since I got to learn C in 1993 and eventually joined C++ ranks instead, leaving my favorite Turbo Pascal behind.
Never met one to this day, but fixed lots of memory corruption issues left behind by not so skilled ones, during the days I used to work daily with C and C++ codebases.
We know they exist. Bernstein and Cutler are two probables based on reviews of their code. Thing is, these people are mentally not even human: it's like Human++ in terms of technical proficiency. So, maybe skilled C programmers only exist among the super-humans. ;)
There are plenty of freeways with speeding/puddle jumping drivers that have a low number of crashes, even compared to slower drivers (setting aside AI drivers, as they aren't that popular yet). It requires discipline and care, but it's not impossible or even that hard if you're a skilled driver at speed. Old cars have more accidents than not because they are (1) very old, (2) very poor parts, and (3) there are lots of them out there. There are lots of things wrong with old cars, and those are the reasons that they have accidents. We should address the real issues instead of using speeding drivers as a scapegoat.
I thought I was being clever. This is so well-written I can't tell if you're agreeing with C opponents, mocking them, or baiting someone like me into writing a comment like... Moving on.
I think the problem is partly with unsafe equipment, and partly with people that in seeking to go just a little faster, or get a little more of a thrill, make things quite a bit less safe for all those around them. To some degree, we all do this in different parts of our lives. People tend to vastly overestimate their ability to sustain high output, if not in one area (defensive coding) then in others (driving, procrastination, etc). Some of these affect the people around you more than others.
To clarify my original comment somewhat, I was talking less about general speeding of a few miles over the speed limit, and more about those people that are going significantly faster than surrounding traffic and weaving in and out of it to advance (puddle jumping). I do not enjoy having my chance of an accident increased my many orders of magnitude because of someone else's (impossible!) sense of competency, and I think that's very relevant in these discussions.
That's very interesting. I agree that effect is there. Passing down incorrect cultural knowledge about C is another I counter with my Pastebin. There's others like SPARK, Rust, and ATS countering concept that safety equals too slow or nothing low-level. Many things.
Back to your analogy, it seemed to start with both types of drivers: IBM vs Burroughs; C vs Wirth languages. The puddle jumper style got pretty popular with most roads and Interstates being built between their cities. The safer drivers have to be on those roads, too, but fewer in number since "Move Fast and Break Things" wasn't popular with small town folk. Got to point that majority of traffic and causes of accidents are puddle jumpers who mostly don't see their cars and driving styles are the problem.
And now we gotta find a way to re-design roads and cars to make their style less damaging while encouraging others to drive more wisely. Wow, put it in your analogy and I suddenly have less hope. ;)
The analogy breaks down the deeper you go, but there are some interesting parallels when you consider driving habits in other cultures. In the US (and probably most western countries) we have highly standardized and policed roads to prevent injury and accidents. I would argue this provides for a more efficient system overall, where in the end more people are able to get to their destination not just safer, but also faster, because the lack of accidents and assurance about other's likely actions on the road allow for mostly smooth operation. With computers, how much less (less, because it would never be none) hardware, software, CPU time and memory would need to be devoted security (firewalls, IDS, malware/virus detection and cleanup) if we had sacrificed a small amount of performance to ensure a more secure operating environment most the time?
We've achieved this with some aspects of society by making rules that hamper some for the benefit of all (traffic laws, employment laws, etc), because we've recognized in some places the failure of individual people and groups to be able to correctly assess risk and danger at wider levels and at longer time frames. These laws and regulations can go overboard, but they're needed to some degree because people are very poor stand ins for rational actors with good information, which in a pure market economy could make these decisions correctly. We've had little to nothing like this for software engineering, which has led to great advancements in short times, but I think we are nearing the point (if we haven't already passed it) when our past decisions to prioritize efficiency and performance over safety are resulting in a poorer relative outcome than if we have made different choices in the past.
"but also faster, because the lack of accidents and assurance about other's likely actions on the road allow for mostly smooth operation."
Yes, yes. It would seem so. There was a counter-point here that showed eliminating traffic controls reduced crashes and congrestion because people paid more attention. Was done in quite a few cities. I'd agree some standardization on behavior and design definitely improves things, though, as you know what to expect while driving.
"how much less (less, because it would never be none) hardware, software, CPU time and memory would need to be devoted security (firewalls, IDS, malware/virus detection and cleanup) if we had sacrificed a small amount of performance to ensure a more secure operating environment most the time?"
BOOM! Same message I've been preaching. My counterpoint to Dan Geer gave a summary of examples with links to more specific details.
"We've had little to nothing like this for software engineering, which has led to great advancements in short times, but I think we are nearing the point (if we haven't already passed it) when our past decisions to prioritize efficiency and performance over safety are resulting in a poorer relative outcome than if we have made different choices in the past."
Total agreement again. It's time we change it. We did have results with Walker's Computer Security Initiative and are getting them through DO-178B. Clear standards with CompSci- and industry-proven practices plus financial incentives led to many products on the market. The Bell paper below describes the start, middle, and end of that process. Short version: NSA killed market by competing with it and forcing unnecessary re-certifications.
Anyway, I've worked on the problem a bit. One thing is a modern list of techniques that work and could be in a certification. Below, I have a list I produced during an argument about no real engineering or empirical methods in software. Most were in the Orange Book but each were proven in real-world scenarios. Some combo of them would be a nice baseline. Far as evaluation itself, I wrote up an essay on that based on my private experience and problems with government certifications. I think I have solid proposals that industry would resist only because they work. :) The DO-178B/C success makes me think it could happen, though, because they already do it piece by piece with a whole ecosystem formed aroud re-usable components.
Of course, I'd be interested in your thoughts on the empiracle stuff and security evaluations for further improvement. For fun, you might try to apply my security framework or empirical methods to your own systems to see what you find. Only for the brave, though. ;)
> Of course, I'd be interested in your thoughts on the empiracle stuff and security evaluations for further improvement. For fun, you might try to apply my security framework or empirical methods to your own systems to see what you find. Only for the brave, though. ;)
If only I worked in an environment where that was feasible. I write Perl for a very small off-market investment firm (event ticket brokerage) as the only software engineer. My work time is split between implementing UI changes to our internal webapp (which we are thankfully going to be subbing out), reporting and alerting tools for the data we collect, manipulating and maintaining the schema as data sources are added, remove or change and maintaining the tools and system that streams the data into the model. While getting to a more secure state would be wonderful, I'm still working to reduce the number of outright blatant bugs I push into production every day due to the speed at which we need to iterate. :/
> Counter point to Dan Geer: Hardware architecture is the problem
This is interesting, and aligns quite well with the current discussion. We live with the trade-offs of the past, which while they may have made sense in the short term, are slowly strangling us now.
> Bell Looking Back
An interesting paper on the problems of security system development and how market changes have helped and hampered over time. I admit I skimmed portions of it, and much of the later appendix sections, as my time is limited. The interesting take-away I got from it is that our government security standards and certification are woefully inadequately provided for, where they aren't outright ignored (both at a software level and at an organizational policy level). I now feel both more and less secure, since I wasn't aware of the specifics of government security certification, so seeing that there are many and are somewhat well defined encourages me to believe the problem at least received rigorous attention. Unfortunately it looks like it's a morass of substandard delivery and deployment, so there that. :/ It is a decade old though, so perhaps there have been positive developments since?
> Essay on how to re-design security evaluations to work
and from that your "nick p on improving security evaluations" and
> List of empirically-proven methods for robust, software engineering
These all look quite well thought out, from my relative inexperience with formal computer security and exploitation research (I've followed it more closely at some times than others, and it's a path I almost went down after college, but did not). The only thing I would consider is that while these are practices for developing secure systems, and they could (and should in some cases) be adopted for regular systems, I think there is a place for levels of adherence to how strict you need to be, and how much effort needs to go into your design and development. Just as we require different levels of expertise and assurance for building a small shed, a house, an office building, and a skyscraper, it would be useful to have levels of certification for software that provided some assurance that specific levels of engineering were met.
"While getting to a more secure state would be wonderful, I'm still working to reduce the number of outright blatant bugs I push into production every day due to the speed at which we need to iterate. :/"
Interesting. What do you think about the tool below designed to handle apps like yours with low defect?
It originally delivered server parts on Java I think. Switched to Node due to low uptake, client/server consistency, & all RAD stuff being built for Node. Main tool written in ML language by people that take correctness seriously. I doubt you can reboot your current codebase but it seems it should be applicable for a similar set of requirements or a spare-time app. Also, not a write-only language. ;)
Jokes on Perl aside, you might find it fascinating and even ironic given current usage (eg UNIX hackery) to know that Perl only exists due to him working... on the first, high-assurance VPN for Orange Book A1 class (highest). Took me 10-20 minutes of Google-fu to dig it out for you:
I'm sure you'll find his approach to "secure," configuration management entertaining.
" We live with the trade-offs of the past, which while they may have made sense in the short term, are slowly strangling us now."
Yep. Pretty much. Outliers keep appearing that do it better but rejected for cost or lack of feature X. Some make it but most don't.
" I now feel both more and less secure..."
"I think there is a place for levels of adherence to how strict you need to be"
Very fair position. :) Early standard did that to a degree by giving you levels with increasing features and assurance: C1, C2, B1, B2, B3, A1. Really easy to understand but security features often didn't match product use case. ITSEC let you describe the features, security features, and assurance rating separately to fit your product. It was less prescriptive on methods, too. Common Criteria did that but knew people would screw up security requirements. So, they added (and preferred) Protection Profiles for various types of product (eg OS, printer, VPN) with threats specified, baseline of countermeasures, and minimal level of assurance applicable. You could do "augmented" (EAL4 vs EAL4+) profiles that added features and/or assurance. CIA's method more like Orange Book where it's simple descriptions on the cover but like this: Confidentiality 5 out of 5, Integrity 5 out of 5, Availability 1 out of 3. Specified level of risk or protection corresponding to each number with methods to achieving it up to manufacturer & evaluators.
So, your expectation existed in the older schemes in various ways. It can definitely be done in next one.
"different levels of expertise and assurance for building a small shed, a house, an office building, and a skyscraper"
Nah, bro, I want my shed and house build with the assurance of an office building or skyscraper. I mean, I spend a lot of time there with some heavy shit above my head. I just need the acquisition cost to get to low six digits. If not, then sure... different levels of assurance... grudgingly accepted. :)
> I doubt you can reboot your current codebase but it seems it should be applicable for a similar set of requirements or a spare-time app.
No kidding. At 85k+ LOC (probably ~70k-75k after removing autogenerated ORM templates) in a language as expressive as Perl... well, I wouldn't look forward to that. And really, if it didn't have a DB abstraction layer at least approaching what I can do with DBIx::Class, I'm not going to contemplate it. Mojolicious takes care of most my webapp needs quite well. As for type checking, what I have isn't perfect, but it's probably worlds better than what you are imagining. I'll cover it below.
> Also, not a write-only language. ;)
use Moops; # [1]. Uses Kavorka[2] by default for funciton/method signatures
role NamedThing {
has name => (is => "ro", isa => Str);
}
class Person with NamedThing;
class Company with NamedThing;
class Employee extends Person {
has job_title => (is => "rwp", isa => Str);
has employer => (is => "rwp", isa => InstanceOf["Company"]);
method change_job ( Object $employer, Str $title ) {
$self->_set_job_title($title);
$self->_set_employer($employer);
}
method promote ( Str $title ) {
$self->_set_job_title($title);
}
}
# Now to show of Kavorka's more powerful features
use Types::Common::Numeric;
use Types::Common::String;
fun foo(
Int $positional_arg1 where { $_ % 2 == 0 } where { $_ > 0 }, # Composes a subset of Int on the fly
Str $positional_arg2,
ArrayRef[HashRef|MyObject] $positional_arg3, # Complex type
DateTime|NonEmptySimpleStr :$start = "NOW", # Named param with default
DateTime|NonEmptySimpleStr :stop($end)!, # Named param with different bound variable in function, which is optional, so may be undef (which composes Undef into the allowed types)
) {
...
}
It's not at compile time checking, but man is it useful. If you've been following Perl 6 at all, it's mostly a backport of those features. Particularly useful is the ability to define your own subtypes and use those in the signatures to keep it sane. E.g. declare HTTPMethod, as Str, where { m{\A(GET|POST|PUT|PATCH|DELETE)\Z} }; Perl 6, where at least some of this is compile time checked (obviously complex subtypes cannot be entirely), fills me with some hope. I'm not entirely sold though, that's one beast of a language. It really makes you put your money where your mouth is when it comes to espousing powerful, extensible, expressive languages. I guess time will tell whether all that rope can be effectively used to make a net instead a noose more often than not. :)
> I'm sure you'll find his approach to "secure," configuration management entertaining.
Ha, yeah. I think secure is a bit of a misnomer here though, as it is a fairly novel way to do authorized configuration management, for the time.
> So, your expectation existed in the older schemes in various ways. It can definitely be done in next one.
And after you've filled me with such confidence that they are capable of both speccing a good standard and incentivizing its use at the same time! ;)
> Nah, bro, I want my shed and house build with the assurance of an office building or skyscraper. I mean, I spend a lot of time there with some heavy shit above my head.
I'm a little disappointed that I just image searched for "overengineered shed" and all the results ranged from a low of "hmm, that's what I would probably do with the knowledge and time" to "oh, that's a job well done". The internet has failed me...
" If you've been following Perl 6 at all, it's mostly a backport of those features. "
Definitely an improvement over my Perl days. Not sure how well they'll eventually get it under control or not. It is better, though.
" I think secure is a bit of a misnomer here though, as it is a fairly novel way to do authorized configuration management, for the time."
Definitely haha. Quite a few things were improv back then as there existed no tooling. Secure SCM was eventually invented and implemented in various degrees. Wheeler has a nice page on it below. Aegis (maintenance mode) and esp OpenCM (dead link) implemented much of that.
" I just image searched for "overengineered shed""
Bro, I got you covered. I didn't subscribe to Popular Science for the layperson-oriented, science news. It was for cool shit like the SmartGarage: a truly-engineered garage. They took it down but I found article and vid of its construction. Enjoy! :)
That is absolutely true and I don't disagree that it is possible to write C securely. But I would contend that the statistics you mention reflect differences in programmer attitudes and skill, and should be controlled for in any comparison. The same level of care and skill in another language will very likely lead to even fewer vulnerabilities, as long as that language doesn't have its own even worse foot-guns.
The priorities of the language are just different. When I use C, I get fairly decent performance basically for free, but I have to work and maintain discipline to obtain safety, high-level abstractions, etc. I'd much rather work in a language where I get memory safety and basic type-sanity checks for free and have to work for performance. This should really be the default, as it's much more in line with the end-user's needs. When I vet software, it's extremely easy to tell whether it is performant, but it takes serious auditing in C to tell whether it is even memory-safe, let alone actually correct and relatively free of side-channels, etc.
The thing about memory safety is that it can be compromised almost anywhere in your code. Your attack surface compared to what _should_ be exposed to attackers is incredibly large. If I can trust the memory-safety of a codebase I am auditing, I don't have to spend nearly as much time on it because I can focus on the parts that actually deal with the intended purpose of the code rather than having to go through every allocation and every memory access in the entire codebase with a fine-toothed comb.
Why do we continue to prefer a language where it's easy to achieve the only goal that is obvious when not achieved, at the expense of requiring great care and discipline 100% of the time in order to not instantly and fully compromise the harder-to-evaluate goals that actually matter (especially in security-oriented software such as openssl)? In this case, bad buffer handling in a data structure deserializer gave attackers a free ride past man-years (perhaps man-centuries) of work writing and auditing security-critical code.
Surely it'd be better if these kinds of failures were at least interesting rather than an endless parade of careless mistakes that could literally have been caught by trivial automation in the compiler. Especially when there are so many good compilers for so many good languages that already do it.
Oops, forgot to make the explicit connection to your comment - the TL;DR is really the last paragraph.
OpenSSL is a beautiful case study in this. If it were written in a memory-safe language, you're right that many things would still be wrong with it - the author(s) very well may have botched "heartbleed" anyway with their custom allocator, but the parade of other buffer handling errors would not have been exploitable except potentially as denials of service. The weaknesses would have at least been interesting (and less numerous, all other things being equal).
I think that we need an improved C compiler more than we need new languages. You put it quite eloquently:
>Surely it'd be better if these kinds of failures were at least interesting rather than an endless parade of careless mistakes that could literally have been caught by trivial automation in the compiler.
You're right - they CAN be caught by trivial automation in the compiler, so let's add that.
You can generally respond even if you don't have the link by clicking on the timestamp ("1 hour ago" or whatever) to go directly to my comment.
My comment was not meant to point out that libsodium doesn't make a general OpenSSL replacement. It was meant to point out that if you restrict the problem domain enough, of course it's possible for people to write secure C. I can write a secure "Hello, World!" program; even a secure networked "Hello, World!". But being able to write a secure "Hello, World" does not mean that I am capable of writing secure C in general.
Pointing out one small piece of code that is apparently secure isn't what a claim like "nobody can write secure C" is really about. "Nobody can write secure C" means that when programming in the large, implementing standardized network protocols with all of their warts, having codebases that evolve over time, that no one can consistently and reliably write secure C.
OpenBSD is likewise a minimalist system with a heavy emphasis on security, but even with that approach, they have had to change their slogan to "Only two remote holes in the default install, in a heck of a long time."
Now, of course, this does point out a few things. Minimalism is important for security; and more minimalist approaches and care in writing code can help reduce the frequency and severity of critical vulnerabilities. But even given some of the most careful, security conscious approaches, people still make mistakes.
That's why, when practicing responsible security, you should use defense in depth. In addition to all of the care, review, minimalism, principal of least authority, isolation, etc, you should also use tools that can prevent whole classes of bugs at compile time.
It also doesn't interoperate with any widely deployed crypto standards.
Yes, NaCL is a better design for cryptographic primitives, being much more limited in scope and reducing the number of primitives supported. It's great, if you can control both sides of the protocol so you can use something like this, and if you can also write your own safe, secure protocol code on top of it that doesn't introduce any errors of its own.
The problem being solved by OpenSSL is much harder; for interoperability, you need to support a much wider range of standards, and they are always changing. It also supplies the full protocol stack to implement TLS, including multiple versions of TLS, rather than just the crypto primitives.
It's when you have code that is trying to do all of this, and evolve over time, with contributions from multiple people, that it's easy for such mistakes to creep in. Writing a single piece of code in C with no vulnerabilities is hard, but not impossible; maintaining such a piece of code, implementing complex and changing standards, over time, over a variety of platforms, and so on, without introducing vulnerabilities, does seem to be impossible in C.
Almost always by super-human's, though. These people that can do in programming what most people can't do across the board. Including write C code with almost no defects.
Now, what about mere humans with average or above average intelligence? What about us!? We can't be basing our expectations of what most can do with garbage like C on what the mental elite can pull off.
This is the thing - it's NOT super human. It's not even difficult. It's nothing more than learned behavior. I, for example, just had the advantage of having employers pay me to learn this stuff on the job.
Debugging a C++ template crash is difficult. Finding out why the Windows TCP stack does certain things is difficult. Writing clean and non-memory-over-writey 'C' isn't even close to those.
I have no idea what kinds of apps you're writing where it's that easy. Most apps with widespread use do complex things with code integrated in from many styles and skill levels of programmer. They have to get code cranked out quickly to meet deadlines. In the process, someone slips up with a pointer, array, whatever.
This has been true for many projects done by experienced C programmers. Hence, if not superhumans, then writing safe/secure C in such circumstances takes unusual talent. Whereas, it's effortless for people using something like Modula-3 to avoid most of those problems. By design. And checking for fools that turned it off is easy to automate: text search for "UNSAFE."
Hi nick! I don't know how to describe what it is I do. For lack of a better rubric - "embedded". Frequently, large things are involved, large things with diesel engines.
Shipping bugs is always an option. I'm pretty adamant about not doing that any more than I have to. And I work fast enough that this isn't a problem - most of that is that I know how to code & test very quickly. But there's furniture in the code to check for the usual 'C' bug suspects.
And yes - "hell is other people's code." This is why I try very very hard not to leave these sort of bugs around, build test fixtures and do other things to defend myself. I have used static analysis now and again. It's kind of nice.
Again, and again, and again, dysfunctional organizations abound and it's possible to avoid them altogether. But first you must earn to identify them.
It would be absolutely fascinating to work with a security pro to see how I'm doing. Right now, that's not in the cards.
"For lack of a better rubric - "embedded". Frequently, large things are involved, large things with diesel engines."
That's actually pretty cool as I'm gradually learning more about embedded systems. The problem with subversion and INFOSEC was the hardware on up. Got ASIC methodology done. Gotta learn embedded. You do control systems, CAN's, and dashboards for construction hardware and 18-wheelers or something? Generators at datacenters? Ok, I'm running out of ideas as I only think about diesels so much.
re rest
It looks like your an outlier in my overall claim. The reason is that you've picked a field and specific companies that let you do pretty custom work that's mostly your code at the quality level you prefer. Good for you. It just doesn't negate anything I said about getting C right in general. Your circumstances and effort just make you an exception to the rule. :)
We'd chatted before - for some reason your handle is mnemonic :)
I'd just as soon not be specific if you don't mind. And in the past it's been wireless, data collecting, even point of sale. Longer past, databases and such.
Your last paragraph is spot on. I'm a tiny mote in the overall cost structure, there's a healthy risk aversion and lead times favor good practice. The downside is - I don't have much cover when things don't work.
I'm sympathetic to the plight of people dealing with this, and have the conceit ( probably misplaced ) that I can offer encouragement.
"We'd chatted before - for some reason your handle is mnemonic :)"
I figured but memory looses stuff. I tried to make my email handle mnemonic for lay people and technical alike. Rare success. It was accidental if it happened for nickpsecurity handle.
"I'd just as soon not be specific if you don't mind. And in the past it's been wireless, data collecting, even point of sale."
Wireless collection of emissions for Volkswagon[1]. Yes, I understand the need for confidentiality and keeping emails from the person giving the orders. We'll move along.
"there's a healthy risk aversion and lead times favor good practice."
Well, that's good. The lead times being an enabler for software QA is a good thing to remember. The "release early, release often" companies might be harder to get onboard with QA. Don't need glacial cycles but at least a few months to work out bugs on significant features.
Come on, if it were that easy we wouldn't have so many security bugs that can be directly traced to some memory corruption issue.
Do you think that for example the Firefox developers are either too stupid to do it right or that they just don't care about security or maybe that it's a harder problem than you claim it to be.
The Firefox developers are very good professionals that came to the conclusion that all the tools they devised to write memory safe C and C++ aren't enough for writing a safe browser and a new start in the form of Rust and Servo is needed.
The Firefox roadmap already has plans for incrementally add code written in Rust.
That's good news as I still use Firefox. Btw, are there any good docs or blog articles on how it integrates into C or C++ code so well? I don't often see that in new languages. Leads me to wonder if there's lessons to learn there for another project.
A lot of it is simply that Rust has an equivalent amount of runtime to C and C++, so you don't have two runtimes fighting with each other. For example, when Rust had green threads, interop was much worse, due to needing to switch to a C stack, as well as the actual initialization of the runtime itself. Without it, it's as straightforward as https://news.ycombinator.com/item?id=11622257
That's what I'm saying. I refuse to call people that build things like Firefox incompetent. Apparently, there's intrinsic difficulty to using the language both productively and carefully at the same time.
Note: Firefox is coded in C++, though. Rather than contradictory, that what you say is still true given C++ is a safer, more organized C they still can't use safely.
I am prepared to call anyone who ships bad code incompetent. We are ALL incompetent at some level, and perhaps on some days.
Narrow is the way. You have to look them in the eye and say "It's not ready. You can force me, but you will first provide documentary evidence of that force ( an email prints off just fine ) and by the way here's what you're risking and do try to keep up."
And I am dead serious, folks. This is what it will take. It will take each and every contributor making it his/her personal mission as if it was their Klingon honor. The culture right now doesn't even know how to ask for that.
Right now, it looks like we can barely even have a conversation about it.
Id love to see Modula-<n>, or Rust, or anything else used. But the logistics of that are daunting.
> each and every contributor making it his/her personal mission as if it was their Klingon honor
> Id love to see Modula-<n>, or Rust, or anything else used. But the logistics of that are daunting.
I'm curious to know why you apparently think the logistics of the first are feasible but the logistics of the latter daunting. To me there's a clear winner in logistical feasibility, and it's definitely not what you're suggesting.
I may be completely wrong, but the mental picture I have is all thse Linux distros and Windows installs in the whole world, and switching them over to OpenRustSSL from OpenSSL.
I'll stick with the Klingon honor thing - which is fully intended to be utterly hyperbolic. That's just what you can do personally - really polish that thing before it escapes. I know it's painful. But you have to tell your ego to sit that one out.
Well, at least Microsoft is doing their little bit by making C++ and .NET Native the way to go forward in the UWP world, with driver verification tools derived from theorem provers (Z3).
Apple by pushing Swift down developer throats that wish to target their platform (last year only Objective-C specific talks had Objective-C code on their slides).
Google, by making the Android NDK so anemic, that unless one really needs to use it for portable code or that extra performance step missing from ART, no one does it.
But I do concede that this will take a few decades to sort out, even if the IT world suddenly decided to go Ada/Spark/Rust today.
See my comment to parent with Astrobe link. That company put Oberon and IDE into embedded systems. Aside from integration with C libraries, it seems that a combo of rapid compiles, better safety, and better interface checks would lead to less cost in development and easier maintenance. Wait, we already know that with the likes of Go: a modernized Oberon. :)
"We are ALL incompetent at some level, and perhaps on some days."
I lost my memory in an accident. I can relate to the statement. Last time I tried to code something I cheated by defaulting to a subset of FreeBASIC with Cleanroom-style development. It worked the first time it ran. I didn't feel very competent, though, as I saw plenty of room for improvement. :)
""It's not ready. You can force me, but you will first provide documentary evidence of that force ( an email prints off just fine ) and by the way here's what you're risking and do try to keep up.""
Hell yeah! I've got piles of THAT. You can believe it. I try not to even do it coercively. More like telling them it's going to be a problem we can avoid together. It will be a mess if it happens. If forced, I want it in writing so the source of the problem is clear. Otherwise, just let me do it the right was as cost-effective as I can. As usual.
Something like that... can't remember...
"Right now, it looks like we can barely even have a conversation about it."
We can. It's just that the conversation has shifted. It was originally how likely a person could make arbitrary C applications probably sourcing 3rd-party components in haste without severe bugs coming from language weaknesses. That's kind of the default outside some jobs like yours. I was arguing it wasn't going to happen outside super-humans or at least high talent.
This tangent is about what a C programmer with a reasonable scope or chance of high-quality code can do in certain situations to make quality happen. We can have that conversation esp as each embedded person develops on tricks and tooling to eliminate problems. I'd be interested in any resources you have in terms of books, guides, or tools on robust C or embedded systems. I keep lists to pass along to people here and elsewhere that need them along with small chance I might use them.
"Id love to see Modula-<n>, or Rust, or anything else used. But the logistics of that are daunting."
Ada and SPARK already have serious, long-term deployment. Link below that's good even if you don't use them as it shows many problem areas in systems programming along with their solution. Many can be emulated in C with effort. The other is a port of Oberon to embedded systems with ARM Cortex MCU's and Oberon's benefits of course. Be interested in what you think of it at a glance or with hands-on trial to assess if more tools like that should be built. One person even put Ocaml on PIC's. Have fun with these. :)
I am very sorry to hear about your accident. That's terrible.
In your case, I'm sure it's in an advisory capacity/collegial and not an "or else." Just saying - don't ship until youre sure. The cost-gods favor it.
I should probably think about compiling a set of book-style resources to recommend. Seems a bit pretientious, though.
I do think the embedded community should embrace better tools. But I dunno. This will be difficult - sort of a "science only improves one funeral at a time" thing...
> I do think the embedded community should embrace better tools. But I dunno. This will be difficult - sort of a "science only improves one funeral at a time" thing...
To add to the list from nicksecurity, there are other vendors that keep Basic and Pascal compilers alive all the way down to PICs.
"I am very sorry to hear about your accident. That's terrible."
Appreciate it. Totally sucks. Oh well. I'm at least helpful here and elsewhere if not fully operational in INFOSEC market. Working in that direction.
"I should probably think about compiling a set of book-style resources to recommend. Seems a bit pretientious, though."
Ganssle's Embedded Muse has been my main source. He's published a few of my recommendations like I/O coprocessors (or MCU cores) to aid real-time by soaking up interrupts. A few others suggested some books I got. One person made a nice list of blog posts covering various entries. It's really scattered.
So, I wouldn't say pretentious if you were merely sharing resources that helped you in case they help others. Along with what specifically was helpful about each one.
They're not incompetent, they're just developing a web browser. Which means using all kinds of gnarly bleeding-edge optimization techniques that introduce massive complexity just to get Twitter to display 140-byte messages with reasonable performance, not to mention dozens of nasty parsers for nasty data formats. Basically, what I'm saying is that the web sucks. I think web browsers are probably more complex than operating systems at this point.
I totally agree. That's the kind of complexity and BS that developers often have to deal with which I'm referring to. Hard to imagine an unsafe by default language not breaking eventually.
> is pretty good proof that nobody can write secure C.
I came to that same conclusion. I used to almost romanticize C programmers years ago, but years and years of reading things like this have changed my mind.
That doesn't look like a C specific error to me. In the release notes CVE-2016-2107 is listed as a padding oracle attack. Padding oracles are a logic/design error that is not easily caught by any language feature.
CVE-2016-2108 on the other hand looks like a typical C style memory corruption bug.
Maybe someone should forward that to OpenSSL team and suggest they pay Galois to crank out a bunch of encoders/decoders for them for anything requiring that in OpenSSL. Or LANGSEC people behind HAMMER toolkit. Or even the Cap n Proto guy.
Seriously, the number of preventable failures in this software, esp memory, is a strong indicator of why you should never use it.
Lol. I thought they've had money all this time and not spent it on stuff that works. Meanwhile, LibreSSL did more in a few weeks for their code quality than its owners did in a year. Why would I give them money expecting something different?
The right course, if we're talking Galois, is for them to try to ask Galois to donate some time or tooling to improve critical aspects of a widespread-deployed library. They already make money with grants and stuff for code they often open-source. They might even find thr deliverables useful for them. So OpenSSL team can feel free to give them a call.
ASN.1 parsing in implementations of several different standards have had this class of issue for well over a decade. ip phone and software stack vulnerabilities in, for example in asterisk, specifically for ASN.1 handling have probably been used by our intelligence community for a long time. The safe implementations are probably ran in virtual machine languages.
I think we should all agree to move from ASN.1 as quickly as possible. Many implementations don't include some of the wacky features (like recursive serialisation of structures), because nobody needs them and they cause bugs. But there is so much more weirdness in ASN.1 that it would honestly be much nicer if we used JSON (as an example of the other extreme). Maybe there's a nice binary, typed format which doesn't resemble the 80s we should be using.
This is why the SPKI guys, seventeen years ago, made one of their design goals not using ASN.1: https://tools.ietf.org/html/rfc2692 states 'No library code should be required for the packing or parsing of SPKI certificates. In particular, ASN.1 is not to be used.'
They came up with a simple, beautiful representation for certificates. They also came up with a simple, logic, understandable way to think about what certificates can and cannot do.
The world ignored them. RFCs 2692 & 2693 stand as a remarkable example of what could have been.
The cool thing about protobuf is that it allows automated parser generation (that isn't some horrific hack built on bison). The downside is that if there's a bug in someone else's code, it becomes a bug in your code. :D
OpenSSL announced several issues today that also affect LibreSSL.
- Memory corruption in the ASN.1 encoder (CVE-2016-2108)
- Padding oracle in AES-NI CBC MAC check (CVE-2016-2107)
- EVP_EncodeUpdate overflow (CVE-2016-2105)
- EVP_EncryptUpdate overflow (CVE-2016-2106)
- ASN.1 BIO excessive memory allocation (CVE-2016-2109)
Thanks to OpenSSL for providing information and patches.
Saw a tweet from one of the OpenBSD folks saying LibreSSL was also affected by these issues and fixes would be available today. Sorry, not on my laptop at the moment so I can't post a link.
ETA: Source code [0] and binary [1] patches (via M-Tier) are available for OpenBSD.
My understanding of the padding oracle vulnerability so far, mainly from the patch [0]: it's possible to make a "valid" message that has the pad length byte higher than maxpad (total_length - digest_length - 1) [1]. If you make it high enough, I think you'll push the offset used in the constant time comparison masks enough below 0 to entirely skip the MAC check [2]. But then your entire plaintext will have to match the pad byte to pass the check, and I think you can use that as a padding oracle.
In other words, constant time programming, which by definition means that you have to behave normally even if your offsets go negative, is HARD. (And when you don't put comments ANYWHERE, it's even harder.)
For the record, there is nothing Go or Rust could have done here. The bug is caused by having to write code that runs in constant time, by employing masks and behaving in exactly the same way irrespective of the pad value. If you want to blame the design of something, blame Mac-then-Encrypt in TLS 1.0.
EDIT: Interestingly, this was found with TLS-attacker [4], a new framework to analyze, test and attack TLS libraries. Finding it was probably "just" a matter of sending a plaintext made entirely of the same right byte value, and noticing that the MAC check passes when it should not. However, so far we didn't have any tool or testsuite (which I'm aware of) to perform this kind of checks.
IMPACT [EDIT]: to sum up, if a client uses AES-CBC to connect to a server with AES-NI support, a MitM can recover at least 16 bytes of anything it can get the client to send repeatedly, together with attacker-controlled data (think Cookies or such, using Javascript cross-origin requests).
That's a second-order padding attack, in the same sense that an integer overflow most often creates a second-order buffer overflow: a first bug transitions your program to a state where a second bug --- very exploitable, and one the program tried to guard against --- is revealed.
What's interesting about crypto to me is the prospect that every common crypto flaw has second- and third- order variants that we will not find out about for many years, the same way we've barely now got a grip on the interaction between C integers and buffer counting.
Does anyone know if there is an SSL specification somewhere? One could then start to write tests at various levels to ensure compliance with the specification. This test suite could then be considered a compatibility test suite as well.
These are all errors that would be impossible in a language with, oh, bounds checking.
There may be ranges of security-critical code that make sense to write in C, for example ensuring that algorithms run in constant time, although with modern compilers I have difficulty trusting that the compiler won't accidentally optimize away the constant-time-ness.
Surely ASN.1 encoding isn't part of that range, however?
"The first principle was security: The principle that every syntactically incorrect program should be
rejected by the compiler and that every syntactically correct program should give a result or an
error message that was predictable and comprehensible in terms of the source language program
itself. Thus no core dumps should ever be necessary. It was logically impossible for any source
language program to cause the computer to run wild, either at compile time or at run time. A
consequence of this principle is that every occurrence of every subscript of every subscripted
variable was on every occasion checked at run time against both the upper and the lower declared
bounds of the array. Many years later we asked our customers whether they wished us to provide
an option to switch off these checks in the interests of efficiency on production runs.
Unanimously, they urged us not to - they already knew how frequently subscript errors occur on
production runs where failure to detect them could be disastrous. I note with fear and horror that
even in 1980, language designers and users have not learned this lesson. In any respectable branch
of engineering, failure to observe such elementary precautions would have long been against the
law."
Were you alluding to how it allows worst-case behavior to throw your system way off in unpredictable ways to enable higher performance on average case? And thus contradicting his claim?
Well, he did design that before making that claim in the 80's. Apparently, he learned his lesson. :)
I was thinking of it more playfully – what would this legislation look like, how would it affect productivity of engineers, it's a curiosity, I don't actually have any stake in it happening.
> These are all errors that would be impossible in a language with, oh, bounds checking.
How would that prevent the timing leak that caused Lucky 13 or the padding oracle that resulted from not erroring when the message was too short?
The problem with CVE-2016-2107 (and the vulns that precede it) stems from TLS not using Encrypt-Then-MAC (checking the MAC before checking the padding is less error-prone than checking the padding before the MAC). You can design a cryptographically doomed protocol in any language, even memory-safe ones.
That's nice, but irrelevant. The quote I was replying to said, "These are all errors that would be impossible in a language with, oh, bounds checking."
Them: All X is Y.
Me: But here's an X that isn't Y.
That's my point. Blanket statements about the superiority/inferiority of programming languages aren't useful here.
A compiler, interpreter bug in terms of attack surface is much different than a programming language where every single string and array access might cause memory corruption.
Bounds checking would only solve half of the vulnerabilities that break the security of every computer on the Internet, not all of them, so I guess it's not worth bothering with?
Off-topic but given how many programs link against OpenSSL and that in some scenarios statically linked binaries are preferable, I've been wondering if there's a Linux distro that works like OCaml's opam, namely rebuilding everything that depends on something that just got updated. Is there such a distro?
It's not a great bug, but I think it would require a very particular set of circumstances to arise:
> ... only certificates with valid signatures trigger ASN.1 re-encoding and hence the bug. Specifically, since OpenSSL's default TLS X509 chain verification code verifies the certificate chain from root to leaf, TLS handshakes could only be targeted with valid certificates issued by trusted Certification Authorities.
However it does seem like a great exploit method for malware, using legit signed certificates that allow overwriting memory just by parsing the cert.
Again? Am I the only one calling this the rule, not the exception at this point?
This is where I want the Rust-guys (or D or whatever) to be. "Hey guys. A drop in Libre/Open-SSL replacement, written in rust. Guaranteed to be free of all those security advisories you keep getting on a monthly basis these days".
That's when people will start taking C-replacements like Rust seriously. Step up your game, guys :)
Sorry for the noob question, but how do you ensure a Debian installation is always updated with the latest security patches? Do you just run `apt-get upgrade` everyday? Isn't there a risk of breaking things?
If you're running stable, you should be able to upgrade every day, and nothing will break. that's the purpose of stable; it really only adds security fixes.
In all seriousness, repeated occurrences such as these should make people consider other languages for new projects. Despite being quite a C fanboy, I have to admit that even I think manual memory management should be a thing of the past now.