The monolithic kernel vs microkernel debate has always come down to performance vs simplicity & reliability.
Simplicity I'll grant, but the reliability argument doesn't really strike me as relevant. Having overseen many thousands of server years worth of uptime, it's almost never the kernel's fault when something breaks. Linux is pretty solid. Most of us are far more limited by the reliability of our applications.
There are niches where higher-assurance kernels are worth it, and maybe that's where microkernels can shine.
The Linux kernel has a ton of attention dedicated to it. In particular, enterprises that prefer reliability to newness often are a few versions behind, making Linux for their purposes the most heavily acceptance-tested software there is.
This doesn't mean its design is inherently more reliable. Anything can be made reliable with enough eyeballs. I think a design goal of Minix is to increase the reliability per eyeball ratio, particularly when it comes to extending the kernel. Reliability, modularity, performance, and testing are all trade-offs. It's also pretty easy to find a configuration that one would think "should work", but actually causes Linux to suffer, complain, and crash.
Sure, but we already have Linux (and FreeBSD and NetBSD and...). So if your argument for something new is reliability, you're arguing inherently-potentially-more-reliable vs in-practise-already-quite-reliable and haven't shown us what we gain by going with you vs them.
The usual arguments in a language or OS flame war are relevant here. Do you choose the allegedly superior design, or the more popular and practiced one? The answer depends on your use case, love of tinkering, tolerance for productivity risk. But were it not for people trying new designs, we'd all be writing code in assembly language on single-user systems.
As of Ubuntu 11.10 my netbook randomly kernel panics with the default Wi-Fi drivers. It sure would be great if it didn't take out everything I was looking at because of one bad driver. Just saying.
On a site called 'Hacker News', it's not entirely unreasonable to expect that a user may have the skills and inclination to contribute back to a project like the Linux kernel. Especially when it's a problem that affects them directly.
Ironically, as I was reading this, my Chrome crashed and shortly after Windows blue-screened and I had to reboot... granted this happens very rarely, but it was kinda funny it should happen exactly when I was reading about reliability.
Yeah, that happens to me all the time. I'm in the habit of running hardware with a faulty PCI bus for very long periods of time so I don't really need anything more complex than Windows 95 since the kernel doesn't matter when you have a faulty PCI bus.
The scary thing is, is that was precisely the state of affairs with cheap desktops in the heyday of win95. My understanding is that the hardware got cheaper to match the software, then there was no motivation to improve the software, because the hardware would've crashed the machine anyhow.
Many software engineers strive for modularity and component decoupling, sacrificing some things (e.g. performance, to a varying degree) and gaining others. I agree with this line of thinking, and I see no reason not to apply it to OS kernels as well. The MMU is a really good friend to have, and I think most systems in this world should appreciate reliability more than performance, especially when it comes to the kernel. I always get a bit sad as I think of the general state of things, remembering that a decade ago I thought today everything would be properly divided and sandboxed with a minimum of necessary privileges. I guess user experience trumps it in many cases.
It's interesting (to me at least) that we have largely dispensed with the Unix privilege model in production and replaced it with running an entire unix system for each application, virtually hosted on the real one. I wonder if, had there been more emphasis historically on reliability and decoupling, we would nowadays be running more than one service on a host instead of running them in individual VMs hypervised by that host.
I suspect the answewer is "no, not entirely" due to other limitations of the model: ports under 1024 are root-only, regular users can't call chroot(), etc etc - but there have been solutions proposed/designed/implemented for most of this stuff , they just haven't had much uptake.
I think it's entirely possible to admin a multi service box, but it requires more skill and effort. Putting everything in a distinct VM makes all your problems look more like nails. Also, who wants to say they admin the server when they can say they admin the server cluster?
Good point. I heartily agree. The hardware is the resource the software is utilizing, and the better thought through and perhaps the more uniform (thus receiving much scrutiny and work) the tools for delegating and regulating access to those resources in order to preserve reliability, the better and more efficient the utilization of said resource, I should think.
One server one app - we do this in production and dev, for a number of reasons.
The biggest reason is that it's just easier. Easier to build a new host, install services. If you need to bring the vm down it only affects one application. And so on.
I don't understand if Git has anything to with it. Simply because Linux and Minix differ on their views on kernel design, it doesn't mean Minix shouldn't use a better scm :)
I thought it was interesting that tanenbaum's operating system is now using something that Linus made. I think it is a neat detail to an old story. Circle of life, small world whatever cliche you want to attach to it.
It was not meant as a jab at minix, i thought others would appreciate the continued interconnection between Linus and AST. I apologize that I did not explain my comment in greater detail.
I have heard this over and over, "wohoo Linus won because Tanenbaum is using git", and frankly this is just ridiculous. Their disagreement was over how the kernel should be architected, not over personal issues. Not using a certain vcs because you disagree with its author on a completely unrelated subject would really be childish.
Moreover, vcs used to manage Minix'es source has zero impact on the merit of using one kernel architecture over the other.
[EDIT]: I understand that the "kernel war" argument is not what you meant, nevertheless Minix using Git is hardly a twist in Linux vs Minix history.
Hardly a twist? So twenty years ago you think it most people would have found it likely that Linus's operating system would lead to the creation of the SCM that minix uses?
Twenty years ago, I doubt many people gave much thought to anyone creating a new SCM. Is your premise that twenty years ago people would have accurately predicted other software trends? Was ruby on rails an inevitable development?
No. My premise is that I do no think that many people would have thought it likely that the hobby project by that "kid on c.o.minix" would bring about a software development tool that Minix would one day use...
Actually, when you put it like that, it does sound likely. I am speaking from the point of view of a person who knows very little about these "Kernel wars".
Your premise is bogus and entirely based on hindsight. The initial spat between Linus and Andrew was based on their approaches to system design not on the merits of their programming or design abilities.
What is bogus about being interested in such a confluence of unlikely events? I can not imagine a human interest piece in the NYT not including this in the narrative.
What does hindsight have to do with anything? It seems that any discussion about the relationship of two individuals is always retrospective...
It's mildly interesting. I guess it shows that PERSON_A is not an angry wingnut intent on ignoring everything from PERSON_B, but can make rational decisions about tools and software.
With some of the personalities involved in various high profile OS projects, and the numerous flame-wars, that's a useful detail.
As a student I did an OS design module in 1987-88. The main set text was Tannenbaum's book Operating Systems Design and Implementation. I remember it had the full Minix source as an appendix, and I don't remember it being a particularly thick book.
Would I be right in assuming that Minix has put on a bit of weight since then?
Yes, it has changed a lot since then and also put on some heavy weight in kernel and userland. The goal of Minix 3 is to become a highly reliable, flexible, and secure OS which can be used as a product in a serious way. Visit http://en.wikipedia.org/wiki/MINIX_3 for a comprehensive information.
You are right to some extent. Yet, there has been many improvements in the kernel since 1988, especially after Minix 3 was announced. And yes, the code base has significantly grown in all of drivers, user-space servers and userland.
I want to like Minix. I had a copy of Design and Implementation and was going to download Minix when I heard of this new Linux thing, filled a bunch of floppies with an early Linux distro and never looked back. One of AST's arguments against monolithic was that it would be too hard to port to other systems. Just checked Minix 3 requirements - x86 only, not even x86-64. No ARM. Hard to see how it has progressed from being an academic thing.
I don't know it at all. Minix to me has a reputation as the punch line of a joke about how to design kernels for the real world. (I don't claim that it's true, just that that's what I associate it with.)
That was 20 years ago, things change in that amount of time. It was also a lot of propaganda and not a lot of evidence.
But hey, it's not like programmers need to update their Internal Information Hashmap. Just put something in there once and leave it alone since that thing is delicate and updating it can sometimes crash your mind.
Most youngsters (20's and under) will read this as a random snide comment. Unfortunately it (still) is a key observation about how our knowledge update heuristics lag behind progress in tech.
"Ten years ago, most computer users were young people or professionals with lots of technical expertise. When things went wrong – which they often did – they knew how to fix things. Nowadays, the average user is far less sophisticated, perhaps a 12-year-old girl or a grandfather. Most of them know about as much about fixing computer problems as the average computer nerd knows about repairing his
car. What they want more than anything else is a computer that works all the time, with no glitches and no failures. Many users automatically compare their computer to their television set. Both are full of magical electronics and have big screens. Most users have an implicit model of a television set: (1) you buy the set; (2) you plug it in; (3) it works perfectly without any failures of any kind for the next 10 years. They expect that from the computer, and when they do not get it, they get frustrated. When computer experts tell them: "If God had wanted computers to work all the time, He wouldn't have invented ‘Reset’ buttons" they are not impressed.
For lack of a better definition of dependability, let us adopt this one: A device is said to be dependable if 99% of the users never experience any failures during the entire period they own the device. By this definition, virtually no computers are dependable, whereas most TVs, iPods, digital cameras, camcorders, etc. are. Techies are willing to forgive a computer that crashes once or twice a year; ordinary users are not. Home users aren't the only ones annoyed by the poor dependability of computers. Even in highly technical settings, the low dependability of computers is a problem. Companies like Google and Amazon, with hundreds of thousands of servers, experience many failures every day. They have learned to live with this, but they would really prefer systems that just worked all the time. Unfortunately, current software fails them.
The basic problem is that software contains bugs, and the more software there is, the more bugs there are. Various studies have shown that the number of bugs per thousand lines of code (KLoC) varies from 1 to 10 in large production systems. A really well-written piece of software might have 2 bugs per KLoC over time, but not fewer. An operating system with, say, 4 million lines of code is thus likely to have at least 8000 bugs. Not all are fatal, but some will be. A study at Stanford University showed that device drivers – which make up
70% of the code base of a typical operating system – have bug rates 3x to 7x higher than the rest of the system. Device drivers have higher bug rates because (1) they are more complicated and (2) they are inspected less. While many people study the scheduler, few look at printer drivers.
The Solution: Smaller Kernels
The solution to this problem is to move code out of the kernel, where it can do maximal damage, and put it into user-space processes, where bugs cannot cause system crashes. This is how Minix 3 is designed."
I really do hope that user's demand more reliability from their computers (and various computing devices). However I believe that since the birth of the PC we've been training user's to tolerate a much higher rate of failure and a massive backlash is unlikely.
People have varying tolerance levels depending on what they're using. We have an insanely low tolerance level for jet failure (the safety checks and expense that goes into airfare is extremely high) due to the public nature of the failures. We have higher tolerance level for car failures even though they claim the lives of far more people every year. We have an extremely high tolerance level for personal computer failure.
I'd like to be wrong. Contrary to your statement, I find myself, as a techy, to be far more critical of computer failure than the average user. I will discontinue use of poorly written software much quicker than my non-techy family or friends.
The high tolerance for PC failure is practical and logical. Failure doesn't generally cost a whole lot compared to cars and jet planes, and the upside to being tolerant of failure is a greatly accelerated pace of development.
It's just another classic risk/reward tradeoff. End users tolerate more risk from computers in exchange for the benefits.
> the average user is far less sophisticated, perhaps a 12-year-old girl
Argh. The author just had to specify that the unsophisticated 12-year-old is a girl. Because, hey, a 12-year-old boy might be a larval hacker, right?
> or a grandfather
Old people is another category of people who hopelessly "unlike" the presumed Linux Magazine reader. They certainly aren't interested in microkernels, but let's make sure they feel suitably old and marginalized if they ever try to change that.
microkernel doesn't do much to solve this problem. "A device is said to be dependable if 99% of the users never experience any failures". Users don't care if the kernel doesn't crash. If the driver crashes, then the user still experiences a device failure, since a device without a functional driver is not functional.
As mentioned by DennisP (but I can't reply on his post for some reason), one of the design goals of Minix is to have drivers seamlessly restarted so the user can continue uninterrupted.
The notion that drivers can just seamlessly restart is as much a fairy tale as the bug free monolithic kernel. What does your filesystem do when the disk driver crashes? What does your app do? You're fucked all the way up the stack. Complex operations are going to smear their state across a variety of modules. Net result: you only have one big module.
I guess that magic pixie dust must be a secret ingredient in HP's NonStop* architecture (runs air traffic control, stock exchanges, etc.)? I suggest actually taking a look at Minix 3, and other fault tolerant operating systems. Disk drivers infecting filesystems is a disease of the monolithic PC world.
* I have a friend who was an engineer for Tandem (now HP) in the 90's. They tested their servers in a demonstration for the government/defense department by taking them to a shooting range and running a benchmark test while firing indiscriminately with automatic weaponry. The story goes that the transaction processing declined precipitously as chips, blades, and motherboards were shattered. It went from millions, to thousands, to just a few dozen transactions per second with no data loss when a bullet clipped the serial jack they were using to log the benchmark. They got a very large order afterwards from the government/military.
I don't know if it actually happened (a Google search doesn't show anything), but having been shown by him the redundancy built into all levels of their architecture, and heard the stories about real failures in exchanges, air traffic control, and other critical never-turn-off deployments they do, I believe it could have. Reliable computing is possible.
Whatever magic pixie dust is in minix, I'm pretty sure it's not going to suddenly make redundant CPUs sprout up in my laptop. You're talking about something else entirely. I could just as easily say that if half of Google's data centers were nuked, they could still serve searches, just slower, and therefore prove linux is utterly reliable.
Anyway, if you like anecdotes, I saw with my very own eyes the network cable between two OpenBSD firewalls chopped with an axe to no detrimental effect. So there. Monolithic kernels are superior to motherfucking axes.
The less-destructive version of this demonstration when I first encountered one in the early 80s was for someone to walk up to the machine, open a cabinet, and randomly pull out a (coffee table book sized) card. No magic smoke, no screams of anguish, no sudden chatter from the console printing messages of lament from the operating system.
I managed Tandem Nonstops and also Stratus FX machines.
Multiple redundant hardware paths, mirrored ram etc.
God they were awful. The conservatism of design meant that although the hardware was fine and redundant and reliable, the software was crap; user hostile and buggy.
They would have been far better off making reliable clusters rather than make a machine internally redundant.
And expensive. Something around a million dollars for a 75 MHz machine (Stratus) in 1997.
I agree with tedunangst, it's really a game of all or nothing. I cannot think of any apps which acheive high stability by systematic fault recovery. Fault recovery is nice in itself, but it is never a good strategy for stability. Good code quality is.
If your goal is whole-system reliability, there's way more low-hanging fruit than the kernel. In the last five years I've had two kernel panics. It's so rare that I remember both times it's happened. But hardware failures (at least one hardware failure of some kind a year) and application crashes (once a week or more) happen all of the time. Hardware is a complex beast but many application crashes are significantly improvable by some other low-hanging fruit (say, better crash reporting for developers)
But if your goal is absolute, near-100% reliability, you will eventually have to do something about the kernel.
Also, if you look at the design of Minix 3, they do address many of the concerns you mention. There's an infrastructure for checkpointing applications, and a “resurrection” server that acts as a configurable watchdog service for the entire software stack from device drivers to web servers.
The real goal of the microkernel architecture is to make these watchdog services as reliable as possible (there's only a few thousand lines of heavily audited code running beneath them). That, combined with user-space device drivers (so faulty hardware or driver code doesn't bring down the whole system) would address most of your concerns.
No surprise, that's the path they are headed down. I even see that this release includes a "block device fault injection driver" for simulating hardware failures.
Their roadmap shows someone working on a Firefox port.
If they could get full-screen webkit/chrome with auto-updating working, there's a lot of potential use-cases that open up. A truly reliable, always-on web device with low power requirements? Sign me up.
When collecting repositories for searchco.de I literally have spent hours hunting around on many project pages looking for the page which points you at how to get the source for a project. Its usually buried in a wiki page in the developer subdomain. A simple link from the homepage with "Get Source" would save myself and im certain a lot of others a great deal of time.
Its just a semi replacement for Google code search. To be honest, its just something I play with. If something comes from it great, if not im not terribly worried about it.
True. I have been using it a lot recently to learn how the Guava and Guice API's work in the real world. So perhaps that's its target market. I also tend to use it when trying to remember how some feature of a language works EG lambdas in python http://searchco.de/?q=lambda+ext%3Apy&cs=on
If something is very successful, it doesn't mean it is very good and nothing can replace it. The world is currently divided into many school of thoughts, and only time will tell which kernel design is going to win the war. Also, there has been growing hatred towards Linux with its bloated kernel and eating all the RAM!
If anything is bloated, it's typical linux userland. The linux kernel, with uclibc, busybox, dropbox, and other userland tweaks, can fit onto an embedded router with 4MB flash and 16MB ram.
Linux caches in RAM pages of files it reads. It will evict them at the drop of a hat if it needs to.
Sadly, no matter how many blog entries appear on the subject, people still expect to see free ram, and think linux is bloated when they don't see any.
If you don't like that behavior, I don't know if any OSes leave "free" ram untouched. They might call it free though, even when it's used by the OS for caching.
On the issue of microkernels vs monolithic, I prefer microkernels for their elegance in theory. If one appears with performance in the ballpark of linux, that glibc supports and that will run linux userland with no problems, I'll switch. I will not, however, give up functionality in the name of ideology.
I'm given to understand that Debian and Arch linux have GNU Hurd live CDs out there for people who want to run it. I don't really know much about the OS, but does it meet your requirements?
If you're think of this as a Linux vs Minix 'war' then you are looking at it the wrong way. Minix is intended for research/education. Its not going to replace Linux because they do different things.
"It was only with the third version, MINIX 3, and the third version of the book, published in 2006, that the emphasis changed from teaching to a serious research and production system, especially for embedded systems. A few of the many differences between MINIX 2 and MINIX 3 are given here.
Going forward, we are making a serious effort to turn MINIX 3 in an industrial-grade system with a focus on the embedded market, especially for those applications that need high reliability and availability."
Okay, I wasn't aware of the Minix 3 redesign, so I stand corrected on that. Thanks for updating me!
But I'm fairly skeptical about how much use (if any) Minix is getting outside research and education. Aren't there already widely deployed HR/HA OSs that target the embedded market? What real advantages does Minix have over something like VxWorks?
I didn't intend to sound the way it seems i sounded. Sorry for that! Btw, the focus of Minix 3 has changed and it is no longer intended just for research/education. Minix 3 was publicly announced on 24 October 2005 by Andrew Tanenbaum and it is comprehensively redesigned to be usable as a serious system on resource-limited and embedded computers and for applications requiring high reliability.
Simplicity I'll grant, but the reliability argument doesn't really strike me as relevant. Having overseen many thousands of server years worth of uptime, it's almost never the kernel's fault when something breaks. Linux is pretty solid. Most of us are far more limited by the reliability of our applications.
There are niches where higher-assurance kernels are worth it, and maybe that's where microkernels can shine.