No. Do not write your webapp in C. I'm serious here. Write it in Python, write it in TCL, write it in Lisp. But write it an environment that is at least semi-manged.
FFIs are pretty good. You can call pledge from your app. But don't write your app in C. If you're doing anything serious, it's a bad idea.
If you're considering writing it in C, you probably have some reason why you don't want to use a scripting language. To that end, I'd suggest using something like Rust instead of C.
Maybe I'm writing it in C because I like the language, enjoy using it and am good with it? And I know it works everywhere I want it, even years later? I hate to be told what I should or shouldn't use.
Then you shouldn't like BCHS or pledge because the author likewise tells you that you should use it and not other things. Regardless of that, this page is all about making the web apps secure. The BCHS page talks something similar. Then, within all this, they recommend a language that goes out of its way to make apps insecure when alternatives exist that don't. Big communities behind some these days.
Recommending against C for web apps is wise in this context.
> Then you shouldn't like BCHS or pledge because the author likewise tells you that you should use it and not other things.
What a non sequitur. That I do not like being told what to (not) use has no bearing whatsoever on whether I can like or dislike the thing that someone somewhere is advocating for or against.
That was partly my point. You bring up you don't like being told what you should or shouldn't use. Yet, you use a product doing exactly that of your own free will. You probably started to after someone shared a link to affect your decision-making. So, a pointless counter in reply to up-thread recommendations.
Do we speak a different language? Because again I just do not understand how any of what you're saying follows.
Yes, I use C. Is the programming language telling me to use this or that? No, I am telling it to do this or that.
And no, I am not using it because somebody told me to. I am using it because (as I said above), I like it, I enjoy using it, I am good with it, and feel like it's a good fit for the values I hold. Opinions shared by other people may have affected my perception, but I am not using the language because somebody tells me to.
Ok. Let's assume all that in light of your previous reply. Do you like your web apps to be secure? Or do you value the personal fulfillment of using C over that? A choice between those two is likely at the bottom of this. "Avoid C if security is priority" still stands with you making a different choice for different priorities.
Yes, I like web apps to be secure, and I believe I can deliver it. Especially when I use C.
But if it really were the case that it is literally impossible to write secure code in C (you'd have to prove it), then my web application should be the least of anyone's worries. Because they are using millions of lines of kernel and library code written in C or C++ to access my web application, which again is hosted on millions of lines of kernel, daemon and library code (including all the crypto) written in C or C++.
And the hypothetical alternative language would likely be implemented in C or C++. And it would be translated to assembly or machine code, a horribly unsafe language. If bugs are inevitable, then no language is safe.
>Because they are using millions of lines of kernel and library code written in C or C++ to access my web application, which again is hosted on millions of lines of kernel, daemon and library code (including all the crypto) written in C or C++.
And which were proven to be unsecure time and time again.
>And the hypothetical alternative language would likely be implemented in C or C++.
Isn't rust self hosted now ?
>And it would be translated to assembly or machine code, a horribly unsafe language.
Not on a Lisp Machine ! ;)
But you are right. Not using C only suppress a class of vulnerabilities, not all of them (and not the most trivial to exploit ones). Unless you are prepared to go all the way and do formal verification, model checking, and NASA style development, your application will be unsecure, and it will be hacked. The question is what is (are ?) your plan to recover when it happens ?
Rust has been self-hosted for a long time. But that doesn't really matter anyway, because the language used to write the compiler has no relevance[1] to vulnerabilities in the compiled code. What is more important is the language the runtime is written in, and Rust's runtime is written almost entirely in Rust (offhand the only parts I know of that aren't are compiler-rt and libbacktrace).
[1] Reflections on Trusting Trust isn't relevant here, we're not considering someone trying to attack the compiler.
>I believe I can deliver it. Especially when I use C.
No, you can't. It's been shown constantly. Buffer overflows. UB. Stack overflow. Dangling pointers. Memory leaks. It will happen.
>it is literally impossible to write secure code in C (you'd have to prove it)
Nobody can prove that. But I can show that it's significantly more error prone. Most other languages have some protections against stack overflow or invalid memory access. Not to mention UB.
>my web application should be the least of anyone's worries.
Not true. Of all the parts of the stack, your webapp is likely the least tested, least used, and thus the most likely to have the most serious bugs. Not to mention that, unlike most of the software you describe, your webapp is likely handling direct user input, the most dangerous location for a bug. And let's not even get into what happens if you're storing sensitive user data.
>And the hypothetical alternative language would likely be implemented in C or C++. And it would be translated to assembly or machine code, a horribly unsafe language. If bugs are inevitable, then no language is safe.
Any backend that generates C can generate near-perfect C code. As for runtimes and compilers written in C, they'll have bugs. All programs have bugs. That's why C sucks. But for the most part, you only have to worry about three issues in the compiler: 1) a built-in function has a bug in it, and doesn't work right. This can happen in any language, including C. 2) the runtime has an error in it that allows user input to cause some kind of memory corruption bug, and exploit it. This is bad, but those codebases are heavily scrutinized, and thus are less likely to contain bugs than the hand-rolled versions that you'd write. 3) there was some kind of GC error. This is probably what you're thinking of.
Two men are walking in the woods. They come across a bear, which starts moving towards them to attack. The first man starts tying his shoes, getting ready to run. The second man asks him, "What are you doing? You can't outrun a bear." And the first man says, "I don't need to outrun the bear. I just need to outrun you."
The same is true of the GC. The GC doesn't have to be perfect, it just has to be better than you. And statistically, you suck at what the GC does.
>If bugs are inevitable, then no language is safe.
Of course no language is safe. But some are safer than others.
"Two men are walking in the woods. They come across a bear, which starts moving towards them to attack. The first man starts tying his shoes, getting ready to run. The second man asks him, "What are you doing? You can't outrun a bear." And the first man says, "I don't need to outrun the bear. I just need to outrun you."
The same is true of the GC. The GC doesn't have to be perfect, it just has to be better than you. And statistically, you suck at what the GC does.
Aka how to sail safely Without a Paddle. ;) Like the analogy and use of statistical argument.
Someone with say 10-20 years of experience in a language will write a more secure program then someone just started in any language ... And even if you program in a "safe" language, you still make bugs, and all those bugs passed the compiler/parser, so your language obviously didn't help you there. There are a lot of code written in C in the wild, witch means there are also a lot of bad code.
There are business logic bugs, and there are memory corruption bugs. One of those leads to RCEs a lot more often than the other. One of those is a lot easier for the compiler to prevent than the other.
Yes, but there are ways to prevent most memory corruption bugs, even, if you're using an OS with a non-terrible malloc implementation. The gnu malloc that most people know is a terrible antiquity that should have been replaced decades ago -- mallocs like OpenBSD's make a lot of memory corruption bugs a lot harder to exploit.
Someone who capably sets out to write their web application back-end in C most likely is aware of any security issues. If they exist in the finished product, their presence is going to be due to laziness, not ignorance.
You don't get points for internet pedantry. Especially when it should be obvious that you're choosing to use a definition of "significant" that doesn't match what I was using in my comment.
virtually,
adverb
[ as submodifier ] nearly; almost: the disease destroyed virtually all the vineyards in Orange County | the college became virtually bankrupt.
Not the person you replied to, but I'm actually quite conflicted on this count. I enjoy C, both aesthetically and for its performance, yet I care a lot about security, so I feel obligated to avoid C for anything important. I'm a big fan of Rust, but it's nowhere near replicating what I like about C.
Then maybe you need to do a better job of shielding yourself from other peoples' opinions? It sounds like might be a little too sensitive and temperamental to engage in mutually beneficial discourse. Please remove yourself.
Another choice is SaferCPlusPlus[1] (optionally with a framework like Wt[2]).
Using a higher level language (than C) arguably has the drawback of "hiding complexity" in exchange for better productivity and scalability. But languages like Rust and C++ (with SaferCPlusPlus) retain much of the intrinsic performance and memory efficiency of C, without introducing an additional runtime layer. That may allow you to employ extra (diverse) layers of sandboxing/jailing and other security/correctness solutions.
Well... there is at least some sense in writing a webapp in C. If you rely on operating system primitives for security (processes, Unix filesystem permissions, seccomp/pledge/capsicum) instead of language features (bounds checking, garbage collection, etc.) then just maybe you could get performance gains while still being secure.
But in reality writing applications in a hosted environment/virtual machine is more performant than writing it in C and locking it down with operating system features. And of course more secure.
The author's efforts would be better spent on improving the performance of OS-based security (processes, filesystems), than attempting to use that security as it stands. There are all kinds of tricks that could be implemented to improve things...
If you are using CGI, then you probably already don't care about performance.
On top of that, the CPU throughput overhead of using a compiled managed language is typically less than a factor of 2, which, combined with the fact that the majority of web applications are not compute bound means the number of webapps that it would make any sense to write in C is nearly zero.
Pretty much all pithy advice (such as "Don't write your webapps in C") has an implicit exception "Unless you fully understand why I made this recommendation and have good reasons for why they don't apply."
That exception is not made explicit because 1) It would make the advice less pithy and 2) Then those who would most benefit from the advice convince themselves that it doesn't apply to them.
"If you are using CGI, then you probably already don't care about performance."
So...I think you're missing a few pieces of the puzzle with this statement. The statement is a truism that is right almost always, but it has exceptions, and this is one of those exceptions.
CGI is slow when startup time for your app is slow, because of a VM, because of the library loading overhead, because of environment setup, because of some thing that isn't really the fault of CGI per se. CGI can be fast, or at least fast enough, if startup time for your app is very fast. An app written in C, statically linked, could potentially start and respond faster than many dynamic language apps could even when running in an app server. C is fast. C is small. C is also particularly error-prone, would be a little verbose for web apps, and does not have a good ecosystem of support libraries for web apps, but even if running it without an app server in a CGI mode it could be fast enough for most deployments.
The CGI interface is not why CGI apps are slow. CGI apps are slow because startup of many modern language environments is slow. C is a language that doesn't have to be slow in that context. (Though, it could be faster still by using an FCGI or other app server model and having your C web app running all the time, so it just has to answer requests over a socket or whatever...though async programming in C is also hard to get right for inexperienced programmers.)
Huh. I wonder what the start time is on the language I'm using for CGI work. (I'm using CHICKEN, which is a compiled Scheme R5RS implementation with numerous extensions).
Probably fine for some things, since it is compiled, and, as I understand it is pretty fast. I don't know a lot about the internals of chicken, but since it's a Scheme wouldn't it need some kind of VM for macros and memory management and such? That'd impose some kind of startup overhead that C wouldn't have. But, then again, Perl can start fast enough for some things (though most Perl web app developers use some sort of app server, because they rely on a bunch of libraries that would increase startup time remarkably if it had to happen for every request).
Do you write command line apps with CHICKEN? Are they super fast to start, like instant (comparable to something written in C or small Perl scripts)? If so, then CGI could probably be fast enough for some use cases. If you can perceive the startup time, then no, it'd be kinda sucky when used with CGI.
For what it's worth, SBCL (which essentially in all ways completely unrelated to chicken other than them both being native compiled dynamic languages) can startup a minimal runtime and exit in 3.8ms on my system. it goes up to about 5ms for a moderate sized application (which includes some dynamically linked libraries that must be loaded).
on the same system:
/bin/true starts up in 0.5ms
/usr/bin/env perl starts up in 2.4ms
/usr/bin/perl starts up in 1.8ms
If you want to run your own benchmarks, just use time and something like this small program (I ran with 1000 iterations):
All web apps are CPU bound. At a certain point, your backend is done and it's up to the web app to compute a return value (similarly for the part that calculates the backend request). That part of the request is cpu bound.
Not too sure about that -- Erlang/OTP-based web applications have very good performance (and transparent scaling) with a fraction of the code of a hand-rolled C-based web application. I honestly think web applications are a place where everything but assembly is better than C.
The author dismisses capsicum without much thought. Really, any application you've already restructured to be sandboxed under pledge(2) is trivial to capsicumize.
That is probably the reason for the quick uptake of pledge in OpenBSD's tree. A lot of their software had already been restructured or rewritten for privilege separation. But looking at a few capsicum diffs in the FreeBSD codebase I'm not sure I'd call those trivial. You can do more with capscium so there is more code to deal with those abilities. But even simple cases hit 20+ lines of code with multiple failure states to deal with. Compared to the usual two line pledge diffs that's quite a bit more work to do.
> Compared to the usual two line pledge diffs that's quite a bit more work to do.
I don't think you're comparing apples and oranges. OpenBSD has just already done the restructuring as a separate commit; in FreeBSD you see it all at the same time, so it looks bigger. Capsicum for simple stuff is only a few lines, especially with the "helper" subroutines.
In some cases we can get more functionality with the same restriction, or more restriction than OpenBSD due to more specific constraints. So there's more code, but it also does more. It can be a trade-off.
That statement was comparing capsicum being applied to programs that were also pledged in around 2 lines of code. A few of the programs on the wiki as examples of capsicum applications originated from OpenBSD so they were primed and ready to go. Even those best cases are much more involved than pledge.
To use the example of the unix tr utility, the change to use pledge required the standard two line diff.
if (pledge("stdio", NULL) == -1)
err(1, "pledge");
tr.c was the one of the earliest programs pledged (back when it was called tame). The original diff was a one liner before they start doing the "pledge or error out".
+ cap_rights_init(&rights, CAP_FSTAT, CAP_IOCTL, CAP_READ);
+ if (cap_rights_limit(STDIN_FILENO, &rights) < 0 && errno != ENOSYS)
+ err(1, "unable to limit rights for stdin");
+ cap_rights_init(&rights, CAP_FSTAT, CAP_IOCTL, CAP_WRITE);
+ if (cap_rights_limit(STDOUT_FILENO, &rights) < 0 && errno != ENOSYS)
+ err(1, "unable to limit rights for stdout");
+ if (cap_rights_limit(STDERR_FILENO, &rights) < 0 && errno != ENOSYS)
+ err(1, "unable to limit rights for stderr");
+
+ /* Required for isatty(3). */
+ cmd = TIOCGETA;
+ if (cap_ioctls_limit(STDIN_FILENO, &cmd, 1) < 0 && errno != ENOSYS)
+ err(1, "unable to limit ioctls for stdin");
+ if (cap_ioctls_limit(STDOUT_FILENO, &cmd, 1) < 0 && errno != ENOSYS)
+ err(1, "unable to limit ioctls for stdout");
+ if (cap_ioctls_limit(STDERR_FILENO, &cmd, 1) < 0 && errno != ENOSYS)
+ err(1, "unable to limit ioctls for stderr");
+
+ if (cap_enter() < 0 && errno != ENOSYS)
+ err(1, "unable to enter capability mode");
No one would dispute that capsicum is more capable but is significantly more complex. Pledge trades finer control over capabilities for the ability to have a "work or die" usage model. Capsicum requires that you be aware of all the potential failure cases and account for them.
I'm confused about what exactly 'sandboxing' entails here. Seems to be limiting syscalls and file system interactions. But what about process heap and cpu usage? Are these handled by either seccomp or capsicum?
No, capsicum is a security sandbox. Resource usage is not governed by capsicum. For that you can use traditional ulimits or RCTL for finer-grained control (similar to Linux cgroups).
>However, this is definitely something in the crosshairs of ksql(3): forking a process, like kcgi(3) does, that handles the database I/O and communicates with the master over pipes.
If they do this, then they can just use seccomp SECCOMP_SET_MODE_STRICT, which is the first thing described in the seccomp man page and in the Wikipedia page. And is also the original/first mode of seccomp to be available.
I believe capsicum would likewise be trivial to use: You can just immediately lock things down completely. (Perhaps just cap_enter() is sufficient?)
If a program just needs to communicate over open sockets, cap_enter() will restrict the program from opening more sockets.
But the existing fds aren't restricted. You can use caph_limit_stream(fd, CAPH_READ/CAPH_WRITE) to restrict existing sockets down to only what is needed for stdio routines.
If you detect something that can not (well, should not) ever happen, it means that state of the process is no longer within designed bounds.
The process' output can not be trusted, so it should stop processing. Detecting something you never expect to happen is almost tautologically stating the software doesn't know how to recover state.
This, of course, relies on an architecture where another process will take care of making sure an error is bubbled to the place it needs to go, pulling failsafe, etc... I will grant some environments and/or designs make this easier than others (eg. Erlang's error handling is all about letting processes die).
There's a school of thought in security that believes you should hard fail at any attempt to violate an invariant. I believe the linux grsec patches do this. They will just crash the kernel if they see something at runtime that they don't like.
Indeed; if you have the Active Kernel Exploit Response feature enabled in grsec, and a root process triggers a kernel OOPS, it will panic() the kernel.
You can combine this with the panic= kernel command-line argument to set a reboot delay, but for headed (rather than headless) boxes it may be more desirable to keep the panic message up to see what process triggered it.
The author writes from the perspective of writing a CGI-based web app. Here, every request becomes a new OS process, so killing the application is killing the request.
If an application is making a request it hasn't marked as being one it is going to make, then chances are, there is a big problem that needs dealt with. In such a situation, crashing as soon as possible means that there's a better chance that all of the relevant data is there to pore through. If you just deny the call, then it could be many hundred of functions later, possibly with relevant data being freed up, and your job of picking through your program's entrails becomes all the more difficult.
Shameless plug, inspired by pledge(2), if you're a Linux rather than BSD person: how to use elementary virus writing techniques to sandbox programs under Linux using seccomp-bpf: https://tia.mat.br/posts/2016/11/08/infect_to_protect.html
The difference between pledge and seccomp is not just a matter of fine-grainedness. There's some of that, but I'd say there's also a difference in philosophy: sandboxing resources versus sandboxing kernel attack surfaces.
At one end of the spectrum is the macOS sandbox, which pretty much only restricts how the process is allowed to interact with the outside world. It's not actually simplistic like the article claims; that's just the public API. If you look at the raw sandbox profiles, which Apple writes for daemons and such, it's quite fine-grained: sandbox profiles can specify which paths can be read and which can be written (with an arbitrary number of allow/deny rules, each of which can match on an arbitrary regular expression), which network ports can be opened or listened to, which preferences can be read or written, which IPC services can be accessed, and so on... there are a ton of categories. If you're on macOS, look in /usr/share/sandbox and /System/Library/Sandbox/Profiles to see what they look like. However, sandboxing only occurs when a given syscall's kernel implementation explicitly calls for it, and even the most locked-down process can access hundreds of syscalls and Mach kernel IPC calls, which based on their intended purpose should be harmless. And most of them really are harmless, but all it takes is a small oversight in one of them for an attacker to potentially corrupt memory in the kernel and gain control over the system. *
At the other end is Linux seccomp, which is designed to allow minimizing kernel attack surface. Chrome's renderer processes only get access to a tiny set of syscalls, and even the arguments to the syscalls are locked down, corresponding as closely as possible to what the process actually uses. (This is possible in part because Chrome has an unsandboxed master process that handles things like reading files for it. The renderer sandbox doesn't even allow open(2).) Thus Chrome is protected from most vulnerabilities in the Linux kernel, because there's just no way to get to the buggy code in question from within the sandbox. This isn't perfect: I've found multiple Linux kernel bugs (only one exploitable so far) that are accessible from the sandbox. And of course it doesn't make Chrome immune to privilege escalation, since attacking the kernel directly isn't the only option; IPC communication with the master process is a much larger attack surface. But that's basically unavoidable, and the situation for Chrome is still a big improvement over other operating systems. Unfortunately, as the article says, it takes quite a lot of work to adapt to new applications.
OpenBSD pledge seems somewhere in between. Unlike the other two, I don't have personal experience using it, but based on the manpage - an empty "promises" set restricts the process to _exit(2), which like seccomp should protect against kernel flaws. However, if you supply "stdio", suddenly the process gets access to 69 system calls, which is not terrible (many of the calls in the set are very basic and unlikely to contain bugs) but not as minimal as one might want. There are a bunch of random functionality lockdowns, which is good - for example, ioctl operations are locked down, "setsockopt(2) has been reduced in functionality substantially"... on the other hand, the promise categories generally seem more aligned to 'resources the program might want to access' than 'kernel API surface the program might need to use'.
Overall, definitely not bad, and I'd like to see something comparable on Linux, but I'd be more comfortable if OpenBSD also supported a more fine-grained sandbox.
* (Okay, that's a little unfair to macOS, since there are some sandbox filters that lock down kernel attack surface, such as the ioctl filter and the IOKit filters - the latter of which has been tightened over time. But that still leaves a ton of surface exposed.)
Yes; I'm not too familiar with Windows but I know there's no syscall-by-syscall lockdown. However, they were able to disable all Win32k syscalls starting in Windows 8, which is a significant attack surface reduction compared to the past:
Go (golang) is a great language for web applications and has comparatively small syscall coverage making it easy to sandbox with seccomp(2). It also has most of the protections of a fully managed language, makes making raw syscalls easy, and can be used without libc. If you don't use cgo or SWIG (pure Go and Go assembly are fine), building statically linked pure Go binaries without libc is just a couple of commandline flags. If you don't import the net package or a few other problematic packages, no commandline flags are required.
Apart from the 'easy to sandbox' (also debatable, as that depends way more on the functionality of the application than on what it is written in, but let's ignore that) does that matter much? pledge isn't only, or even primarily, a defense against you accidentally making syscalls you don't want to make, but a defense against malware that, having invaded your application's process, attempts to make such syscalls pretending to be you. That's code you didn't write, making the programming language choice moot.
Not really. An effective sandbox is restrictive. Managed languages tend to use a more limited set of system calls than non-managed languages because most libraries don't make any system calls of their own. The Go runtime and standard library use a more limited set of system calls than most alternatives, so a more limited set of sandbox filters are required.
Injected instructions are more likely to invoke their own system calls, so you are more likely to catch it if you have a more restrictive set of filters.
"Managed languages tend to use a more limited set of system calls than non-managed languages because most libraries don't make any system calls of their own"
I have a hard time understanding how that could be true. A library either needs to use OS functionality, or it doesn't. If it needs to, managed languages cannot avoid doing so, and if it doesn't, why would non-managed languages make them, more so since a major argument for the most popular of them, C, is performance?
The only argument I can see is that managed language libraries could use (fewer) other libraries or the runtime to make such calls. Neither decreases the number of system calls made by any program.
too bad you can't do it from within the application itself. some of the security-related system calls (e.g. unshare) expect the application to be single-threaded or at least require acute awareness of threading. go's automatic threading makes this difficult.
Does pledge(2) still enable its simple programming interface by allowing certain default actions based on a filesystem hierarchy layout, statically defined in the kernel/libc/somewhere? The first presented versions of tame(2) had this iirc.
If I followed the tech@ list properly, I believe OpenBSD has added a specialized socket type for the "dns" pledge. That allows removing the code that permit certain socket calls iif /etc/resolv.conf has been opened.
There are other things in the works related to pledge. For example, they're going to ship a native local stub resolver that will be a functional extension of libc, permitting them to remove and simplify code from libc. That would allow sandboxing networked services without necessarily requiring any filesystem access after invoking pledge, while still permitting DNS to seamlessly work even after configuration changes. (The trick is that most client resolvers, third-party and in libc, default to 127.0.0.1:53 if /etc/resolv.conf is unavailable.)
No. Do not write your webapp in C. I'm serious here. Write it in Python, write it in TCL, write it in Lisp. But write it an environment that is at least semi-manged.
FFIs are pretty good. You can call pledge from your app. But don't write your app in C. If you're doing anything serious, it's a bad idea.