An unoptimized research kernel using GC (instead of ref-counting or Rust-style lifetimes) manages to get within 5-15% of a traditional kernel for the things they measured. That is impressive.
Lots of asterisks and caveats apply I'm sure, but it bolsters the argument that writing kernels and device drivers in a high-level safe language wouldn't impose the 2x or worse perf penalty detractors like to claim. I know I'd trade 10% perf for the elimination of whole classes of security vulnerabilities.
(For the same reason, I want the standards committee to produce a bounds-checked dialect of C where I can choose to pay some perf cost to get real dynamic and bounds-checked arrays)
The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise-and-pound-foolish programmers, who can't debug or maintain their "optimized" programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering.
> In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal
That's right, but "improvement" doesn't necessarily mean performance. Adding a GC would yield more than 12% reduction in vulnerabilities and bug count, which are also "improvements".
Does anyone is using Redox OS ? I don't see how you can compare OS used in hundreds of millions servers to one that is used nowhere in case of CVE. No one is probably searching for CVEs in Redox OS compared to Linux Kernel.
Sometimes it is hard to understand what a relative 1% can be in absolute terms. Imagine Amazon or Google running 10% slower software in their cloud stacks. That would translate immediately to one or more additional data centers that they would need to build and run and pay for in order to compensate for that.
If you make this argument about any single component, you end up with low performance across the whole system because it leads to a culture where everybody can point out that their code uses only a few percent of the CPU.
A valid point, as we should be frustrated that (in desktop computers, say) real user-facing performance hasn't been keeping pace with improvements in hardware.
Every decade, hardware performance skyrockets upward, but developers adopt libraries/frameworks/languages/low standards/habits which give rise to woefully inefficient software. It's so atrocious that it seems to roughly counterbalance the hardware advances. Heavyweight single-page web applications seem to be the endgame of this trend, for now.
See Wirth's law: What Intel giveth, Microsoft taketh away.
That said, the 80-20 rule is a real thing. It's possible that you'll get better real performance by writing 80% of your code in Python and the performance-critical 20% in carefully optimised C, than by spending the same amount of time writing it all in Java.
The authors would probably not call it "unoptimized". From the last paragraph in Section 8.8:
"It took effort to make Biscuit achieve this level of performance. Most of the work was in understanding why Linux
was more efficient than Biscuit, and then implementing
similar optimizations in Biscuit. These optimizations had
little to do with the choice of language, but were for the
most part standard kernel optimizations (e.g., avoiding
lock contention, avoiding TLB flushes, using better data
structures, adding caches)."
I agree that it's impressive, and better than I would have guessed.
> I know I'd trade 10% perf for the elimination of whole classes of security vulnerabilities.
Firstly, I would like to know if that's an average of 5-15% slowdown, and if that means some operations are multiple slower and some are close to equivalent. Average means nothing if your workload triggers the pathological cases.
Secondly, you might be giving up classes of security vulnerabilities and gaining classes of kernel DOS bugs. I don't write kernels, but my understanding (mostly from comments of people here that do), is that kernels rarely allocate or de-allocate after book, as they immediately reserve the memory needed for their limits and then work within those limits.
Believe me, I'm not a proponent of C. I'm critical whenever it comes up, as I think it's caused a huge amount of harm. That said, of the places where C has a well deserved reputation of being the right tool, I think kernel development is one of them. That isn't to say other languages don't have something to offer, but this is one area where you actually seem to need all the crazy unfettered access that C gives you.
People are also writing low-level, efficient software in Ada/SPARK, Rust, Oberon (Astrobe Oberon), Pascal/BASIC (MikroElektronika), Haskell-based DSL's (eg Ivory Language), and ATS. They've written OS's in everything from C to Java/C# to Ada/Rust to LISP/Haskell. There's usually some point where they have to call unsafe or wrap assembly. Even then, they often do it behind a type-safe interface with some protections which might also have input validation. One can also use other tools to verify the unsafe code.
So, I'm not sure C is the right tool today given anything it can do can be done in safer languages plus assembly or sometimes even invoking/extracting-to C. In the latter case, the C part becomes incidental to where history went with who wrote or optimized what in what language. The person developing uses as little of C as necessary in that case. Software can also be incrementally ported to something safer, more maintainable, and so on.
>> That isn't to say other languages don't have something to offer
> People are also writing low-level, efficient software in Ada/SPARK, Rust, Oberon (Astrobe Oberon), Pascal/BASIC (MikroElektronika), Haskell-based DSL's (eg Ivory Language), and ATS.
I fully think other languages can be used to quite good effect. I was specifically thinking of Rust and Ada/SPARK, and for the latter one coming to my mind so quickly you can only blame yourself. ;)
> So, I'm not sure C is the right tool today given anything it can do can be done in safer languages plus assembly or sometimes even invoking/extracting-to C.
Yeah, I'm not so much advocating for C as a good choice, as much as excusing it as a somewhat adequate choice given the specific circumstances. A rigorous set of usage guidelines, which I think are likely easier to implement and stick too given the type of project, mean that C is only slightly more dangerous, but also slightly easier to work in given the needs, than some other options. Even this small advantage is quickly being eroded by advances in other languages though, so I doubt it will persist (even if only in my mind) much longer.
It's on a quad core machine with a tiny heap to manage, even OSes like OpenBSD do alright with Giant kernel locking at that level of concurrency.
This would turn into a massive research project on GC and language runtimes where the commodity server market already is and especially where it is going considering a dual socket AMD Rome will have 256 threads and a single socket POWER10 would have the same. To consider that project in terms of time scale and cost, one can look at the history of JVMs. Go's type system might make some things slightly easier but there is no implicit scalability codex they've unlocked beyond state of the art JVMs that have been evolving for 15 or more years.
This is why Rust brings something novel to the table for systems programming. With the borrow checker, memory safety can be added in many cases without runtime overhead and follow on contention.
An extended though, we need to be moving _toward_ sympathy with the on chip network and implicit message passing nature of cache coherency policy. This is mostly a library/ecosystem shift rather than language.
I don't think a lot of people will see this the same way you do. I think a lot of people will care about even a 1% degradation in performance. And people will prefer a tried and true operating system for security over something that hasn't been tested in the real world, even if it's designed to eliminate one class of security vulnerabilities.
Nobody suggested using untested code, parent said they’d trade a small amount of perf for more security. I do think a lot of people would make that trade, all else being equal, once a viable tested replacement actually exists. HLLs exist in the first place because a lot of people are willing to trade a little perf for more security, faster development, and lower costs.
That said, nobody’s disputing that a “lot” of people care about performance. It’s worth taking a look at the last slide of the presentation, which states clearly that if you care about performance, you should use C:
Performance is paramount ⇒ use C (up to 15%)
Minimize memory use ⇒ use C (↓ mem. budget, ↑ GC cost)
Safety is paramount ⇒ use HLL (40 CVEs stopped)
Performance merely important ⇒ use HLL (pay 15%, memory)
Companies managing large houses with many thousands of computers care about efficiency gains in the sub-percent range - so even a 5% performance penalty is entirely a no-go. I doubt the kernel is nearly as featureful as Linux, either.
Now, could one of these academic kernels have sheer engineering effort put into optimizing that 5-15% regression away? Probably.
Didn't the recent CPU bug mitigations add 10% perf penalties to a lot of big server farms?
We constantly pay this cost for other security-related issues, but when it comes to a systemic, one-time penalty that almost entirely eliminates a class of bug we freeze up.
Obviously it's a lot of engineering effort to transition but we don't think twice about other security issues
Sure, they added around that performance penalty. But you only need to compile your kernel with that flag if you run untrusted code on the machine, given the nature of the exploits... I'm sure many companies have weighed the pros and cons of that decision, and said "by the time we've got foreign code running on this machine, we're already hosed, so the performance cost isn't worth it."
In general, these types of decisions aren't uncommon.
Whether reference counting or GC is more efficient depends on your workload. There are common workloads where reference counting is better. And others where GC is better. In general neither can win.
So saying "just don't use GC" as a answer to improving throughput shows ignorance.
So what are the real tradeoffs?
GC is less work. Is easier to get right. And lets you handle complex self-referential data structures more easily.
Reference counting lets you handle real-time constraints more easily. Is simpler to understand. And results in faster freeing of potentially scarce external resources.
Having built reference counted systems, they're anything but easy to understand unless you're greatly restricting the graph of objects and the references they can have to each other. And once you've restricted who can point to what, you've lost a lot of the flexibility that a RC system affords.
"So saying "just don't use GC" as a answer to improving throughput shows ignorance."
It could also mean they use Rust, memory pools, some kind of checker that catches those errors, or separation logic. Probably that order due to popularity.
As a counterpoint to "don't use GC," there's also low-latency or real-time implementations of RC/GC to consider as well. A lot of developers using RC/GC languages don't know they exist. Maybe we need a killer app that uses it to spread awareness far and wide. More open implementations of the concept, too.
> there's also low-latency or real-time implementations of RC/GC to consider as well
These exist but tend to have poor throughput, frequently set a maximum amount of memory you're allowed to use, and typically don't perform well when compared to commonly used garbage collectors.
It would be more impressive if this wasn't an uphill battle since the late 70's against beliefs.
We would be further down this path if bigger companies really cared to push down this path no matter what, but politics and short term profits always win in the end.
The higher level the language the more code is actually generated by the language for the purpose of implementing services provided by the language and the more the underlying system is abstracted.
There is therefore a trade off and obviously the lower level the code the less suited high level languages are.
While there is a trade off, Burroughs B5000, Xerox Star, IBM 801, Multics, Lilith, Solo are all good examples of what was possible to achieve with 1960 - 70's hardware.
Thanks to Internet and IoT it seems that we might be finally at the turning point where OS security starts to be taken seriously, so lets see how long that resistance keeps to uphold.
It was refreshing to see that a large set of Linux Security Summit 2018 talks were all about how to fix C related issues on the Linux kernel.
>(For the same reason, I want the standards committee to produce a bounds-checked dialect of C where I can choose to pay some perf cost to get real dynamic and bounds-checked arrays)
In fact, I'd like my C compiler to use bounds-checked arrays by default, with an option to disable it in performance-critical code. Safe by default, performant only when needed. That's the only way to build secure software.
Since nothing in the C language prohibits bounds-checking for arrays, we can retrofit this to existing insecure languages rather than forcing all programmers to rewrite their software in some other language, which realistically won't happen for a decade or longer.
IIRC https://cyclone.thelanguage.org/ did this (among other things -- I think it's also where Rust lifetimes began their, err, life) and ended up having rather poor performance for pointer-manipulation heavy code.
I don't know if things would be better now with more optimizing compilers, but I remember being somewhat surprised by this at the time, since you'd expect most of the overhead from passing fat pointers around to get optimized away.
AT&T made Cyclone as C replacement, GCC has an extension for decades, MSVC++ has had pointer coherency debug libraries, SPARC has hardware validation for pointers.
There are many other examples of people trying to band aid and improve C's security history.
The biggest problem is that the developers that are supposed to actually adopt those practices don't really care, because performance trumps security.
Those that actually care end up moving to C++, or some other language, so we end up with this selection bias.
Thankfully they are being forced to change, as being 100% connected to the Internet does wonders to security.
Interesting. Regarding developers not caring about security, that's why I suggested it should be the default, with an option to turn it off for performance or other reasons, rather than the other way around.
Agreed, however that is against the culture of those that decide to stick with C no matter what.
Secure by default with option to turn those features off for performance, has been the way other systems programming languages have been designed since 1961.
That C culture is now being forced to change, because fixing memory corruption CVEs in devices that are permanently under security exploits attacks does not scale.
> I want the standards committee to produce a bounds-checked dialect of C where I can choose to pay some perf cost to get real dynamic and bounds-checked arrays
If you don't want to wait on the standards committee, the SaferCPlusPlus library provides memory-safe substitutes for commonly used unsafe C/C++ elements (including pointers). These memory-safe elements require a C++ compiler, but because they are largely compatible with their native counterparts, you can write programs that can switch between using memory-safe (C++) elements and (unsafe) straight C elements with a compile-time directive.
That is, you don't have to choose between safety and performance when authoring the code. You can make the choice at build-time. So you could, for example, provide both "memory-safe" and "non-memory-safe" builds of your software. There was even a (now long neglected) tool[1] to automatically replace arrays/buffers in C source code with (optionally) memory-safe substitutes. (Not just bounds-checked, but safe from use-after-free.)
> I know I'd trade 10% perf for the elimination of whole classes of security vulnerabilities.
Full memory safety may sometimes cost a bit more than that [2]. But a nice thing about this approach is that you can choose to use (faster) unsafe code in critical inner loops, and memory-safe code everywhere else.
Surely that's a practical position to take, but as an idealist, thinking over the long term... wouldn't you rather have that extra performance in addition to a lack of security vulnerabilities? There's no fundamental trade-off between this kind of performance and security.
History seems to disagree with your basic premise. There are entire classes of security vulnerabilities eliminated by high level language features. (A trivial example: buffer overflows are a non-issue with runtime bounds checking.)
In theory, it's possible to write bug-free code in C or assembly. In practice, it just doesn't happen. The rule of "worse is better" ensures that the rare projects that put in the necessary effort to manually eliminate the bugs that HLLs catch automatically will get steamrolled by projects that don't (witness the adoption rates of Linux versus OpenBSD).
That said: I think that tool developers have only scratched the surface of the possibilities for compile time checks. I'm always amazed how little the industry has learned from projects like Ada. There's an opportunity in there somewhere to have that cake and eat it too.
I doubt other mainframes were much cheaper, even PDP-11 wasn't properly a bargain.
Also Burroughs B5000, developed in 1961, already took security as major OS feature.
The problem is that UNIX designers never took security as serious as other OS designers were doing on their OS implementations.
Had UNIX been a commercial OS sold at the same price levels as VMS, OS/360 and others, instead of given to universities with a symbolic price due to Bell Labs commercial restrictions and history would have been quite different.
I would love to see a production ready kernel written in Rust adopted by the industry. I am exclusively using Go and Rust at home and at work, and I find myself equally productive in both. Go lacks a lot of 'high level' features for a GC language, so I don't find the GC to be that advantageous. Rust is safe, fast, and a joy to write. I'm not surprised by these numbers, I would be interested to see a breakdown of compile times for similarly sized kernels in Rust, Go, and C.
It would be really interesting to see a Linux ABI layer implemented on top of redox, a la IllumOS, to allow software designed to run on linux to run.
Given that Bryan Cantrill is gung-ho about rust, and he's got quite a lot of knowledge in this area, maybe someone can woo him to help (even if in an advisory role).
The paper contributes Biscuit, a kernel written in Go that implements enough of POSIX (virtual memory, mmap, TCP/IP sockets, a logging file system, poll, etc.) to execute significant applications. In experiments comparing nearly identical system call, page fault, and context switch code paths written in Go and C, the Go version was 5% to 15% slower.
I would expect that slowdown to be more a consequence of optimizer implementations than anything else, given the quality of C compilers code about 30 years ago.
1. Your skepticism for using Rust cites [26], the PLOS paper from the same people who did [27]. They've essentially recanted, in that their PLOS talk was right after Niko's keynote, and he then talked them through how interior mutability works in Rust. Other than [26], are there other lingering concerns? I could imagine so, but it would be great to take stock.
2. You don't seem to page to disk, and do OOM shootdowns instead. If you did enable disk paging, how comically bad an idea does a tracing GC become? :D Naively, I could imagine RC-GC being less horrible, but perhaps that's my built-in (and absolutely for real) bias.
1. I wouldn't say we are skeptical of Rust, I'm sure it could be made to work well and we would love to see such a Rust kernel! I do wonder whether it would be harder to implement highly-concurrent data structures in Rust though, such as the directory cache.
2. GC performance would be basically unaffected by paging, since kernels don't page their own heaps out to disk and thus a kernel heap access would never have to wait for a disk IO. Or maybe you had something else in mind?
Re 2., I assumed (incorrectly, it seems) that virtual memory mappings can eventually spill to disk. If you grab 1/32 of RAM For the kernel, then you could only virtually map 16x RAM if the page tables have 512 entries. Admittedly, I took my OS class in the previous millennium, so I might be out of date here...
For the ”C kernel vs go kernel” benchmarks, the paper writes ”The benchmarks allocate no heap memory in steady-state, so Biscuit’s garbage collector is not invoked.”.
Is that a fair way to make this comparison?
Also, ”This results in heap bounds of less than 500kB for all system calls but rename and fork (1MB and 641kB, respectively).”
How does that compare to the C-based kernel?
Finally, how does this system hold up after days of uptime?
Concerning your first question, that experiment is not intended to measure the cost of GC, which can be made small by trading memory. The point of this experiment is to measure the unavoidable costs of the language.
Maybe sometimes the mental effort of writing a commit message just isn't worth it.
At work, I try to be a good boy and write proper commit messages, always.
On personal projects, I often just use "." or something equally meaningless. Sometimes I can't be bothered explaining myself, and on a personal project nobody is going to be burned by it but me.
"Production" - they won't commit to actually supporting it for production for the rest of us (currently got app engine flex burning a hole in our pocket at work because the old runtime was too problematic and sans it being out of beta we can't move to the new one).
Considering all of the stuff that Go does for you this is hugely impressive. It is especially impressive that a research kernel is within 10% of Linux's performance.
GC enabled system programming languages like D, Active Oberon or Modula-3, among others, allow for many ways to allocate memory, it is not GC all the way down.
Thats the sad state of (i guess) automatic pdf text extraction, consequence of research papers being exclusively in pdf, consequence of (la)tex coming from another age. I love and praise tex for what it allows to do, but my opinion is that now is the time to get past it, learn everything it has done right and apply the new knowledge we have in language design to get a better surface language (lower friction syntax, higher-level semantics allowing to separate structured content from typography and extract other stuff than a visual document from a source file). Tex being so good means it has such a monopoly that this kind of project have to be tremendously good to have a chance (which is probably a good thing).
POSIX itself an outdated standard. Some things we're designed without understanding of modern security and portability problems. Either new iteration of the standard required, or a new standard from scratch.
Last time I looked (which was a while back) both Go and Rust binaries for "Hello world" were over a megabyte with out-of-the-box settings, compared to just a few kilobytes for C and slightly more for C++. I should qualify that they were the simplest of Hello world programs, I didn't add any fancy options to try and reduce the size at all.
I'd be keen to see what the smallest binaries the reference implementations of these languages can produce are.
Rust's standard library is statically linked, C's is not. You have to compare the total size. Additionally, Rust includes jemalloc by default on many platforms.
Thanks - yes I meant to mention, that statically linked (and stripped of symbols) I get 745k for the C hello world with gcc 7.2, so the gap isn't as big as all that. I understand that shared libraries/dynamic linking is still a WIP for rust so it's not exactly a fair comparison yet. With a shared standard library you also get the benefit of having the RAM shared across multiple processes.
The 145 byte example also not being a fair comparison, since you've left out the standard library! Obviously C can do that too, but it's pretty neat to see the progress rust is making.
(anyway, I'm going off topic since we're discussing kernel programming here).
This is covered in the paper. They use static analysis to know the memory costbof a kernel function and rather than optimistically run the kernel function and then unwind it if they run out of memory, they delay it until memory is available.
Though implementing fork() in a kernel is entirely distinct from using fork() in user space. Implementing fork() is mostly "just" manipulating a bunch of data structures the right way. Copy the right bytes and set the right values.
It's like how you can implement Java, C++, python, Haskell and any other (practical) language in C, despite C having no support for e.g. classes or lambdas. Or how humans can design and build combustion engines, despite us not having the capability to ingest gasoline in order to rotate an axle at 6000rpm ourselves.
Yeah, I get the difference in levels of abstraction. Still interesting, at least to me. Among other things, it limits how "thin" the POSIX emulation can be. So, for example, if I had a POSIXy super-CPU, an implementation of fork() in C, running on that CPU, could be rather thin. An implementation of fork() in Go seems like it'd necessarily have to be considerably thicker, as fork() is more foreign to Go.
But yes, I understand that this noodling has little to do with what the authors were going for, nor really with current reality.
I fail to see how the implementation of fork in C would be any thinner than an implementation of fork in Go. I expect the implementations would look very similar.
That sounds more like some kind of unusual “emulation layer” in user space than any “kernel”, and given your decision to use the host environment’s fork(), apparently one that just uses the host operating system’s (that’s what that “Super CPU” is, not a CPU at all but a full operating system) process model and everything associated with it, so you wouldn’t be able to emulate much.
5-15% seems like a bargain compared to the massive slowdown everyone noticed after the first meltdown patches (not that this would have fixed that problem, of course).
If this approach took off, maybe it would light a fire under the chip industry to put a little more thought into hardware-assisted garbage collection.
Lots of asterisks and caveats apply I'm sure, but it bolsters the argument that writing kernels and device drivers in a high-level safe language wouldn't impose the 2x or worse perf penalty detractors like to claim. I know I'd trade 10% perf for the elimination of whole classes of security vulnerabilities.
(For the same reason, I want the standards committee to produce a bounds-checked dialect of C where I can choose to pay some perf cost to get real dynamic and bounds-checked arrays)