Minix 3.2.0: One step closer to the promise of highly reliable OS

ef4 · on March 14, 2012

The monolithic kernel vs microkernel debate has always come down to performance vs simplicity & reliability.

Simplicity I'll grant, but the reliability argument doesn't really strike me as relevant. Having overseen many thousands of server years worth of uptime, it's almost never the kernel's fault when something breaks. Linux is pretty solid. Most of us are far more limited by the reliability of our applications.

There are niches where higher-assurance kernels are worth it, and maybe that's where microkernels can shine.

cynicalkane · on March 14, 2012

The Linux kernel has a ton of attention dedicated to it. In particular, enterprises that prefer reliability to newness often are a few versions behind, making Linux for their purposes the most heavily acceptance-tested software there is.

This doesn't mean its design is inherently more reliable. Anything can be made reliable with enough eyeballs. I think a design goal of Minix is to increase the reliability per eyeball ratio, particularly when it comes to extending the kernel. Reliability, modularity, performance, and testing are all trade-offs. It's also pretty easy to find a configuration that one would think "should work", but actually causes Linux to suffer, complain, and crash.

ketralnis · on March 14, 2012

Sure, but we already have Linux (and FreeBSD and NetBSD and...). So if your argument for something new is reliability, you're arguing inherently-potentially-more-reliable vs in-practise-already-quite-reliable and haven't shown us what we gain by going with you vs them.

cynicalkane · on March 14, 2012

The usual arguments in a language or OS flame war are relevant here. Do you choose the allegedly superior design, or the more popular and practiced one? The answer depends on your use case, love of tinkering, tolerance for productivity risk. But were it not for people trying new designs, we'd all be writing code in assembly language on single-user systems.

anamax · on March 14, 2012

> Anything can be made reliable with enough eyeballs.

The relevant measure isn't the number of eyeballs, but whether they're the correct eyeballs. The wrong eyeballs can decrease reliability.

hetman · on March 14, 2012

As of Ubuntu 11.10 my netbook randomly kernel panics with the default Wi-Fi drivers. It sure would be great if it didn't take out everything I was looking at because of one bad driver. Just saying.

wolf550e · on March 14, 2012

Bug reported to LKML? Did you bisect? Did someone else? Did you at least test with latest upstream kernel to see if it was fixed already?

stephank · on March 14, 2012

You're asking this of a casual user.

davidw · on March 14, 2012

On a site called 'Hacker News', it's not entirely unreasonable to expect that a user may have the skills and inclination to contribute back to a project like the Linux kernel. Especially when it's a problem that affects them directly.

hetman · on March 15, 2012

Are you implying any of these activities would get back the stuff I was working on at the time? Because if not I think you sort of missed the point.

dkersten · on March 14, 2012

Ironically, as I was reading this, my Chrome crashed and shortly after Windows blue-screened and I had to reboot... granted this happens very rarely, but it was kinda funny it should happen exactly when I was reading about reliability.

pjmlp · on March 14, 2012

QNX and VxWorks are two commercial operating systems that are microkernel based.

rwmj · on March 14, 2012

Don't forget reliability (or otherwise) of PC hardware. No kernel is going to save you if your PCI bus locks up.

zedshaw · on March 14, 2012

Yeah, that happens to me all the time. I'm in the habit of running hardware with a faulty PCI bus for very long periods of time so I don't really need anything more complex than Windows 95 since the kernel doesn't matter when you have a faulty PCI bus.

stcredzero · on March 14, 2012

The scary thing is, is that was precisely the state of affairs with cheap desktops in the heyday of win95. My understanding is that the hardware got cheaper to match the software, then there was no motivation to improve the software, because the hardware would've crashed the machine anyhow.

wolf550e · on March 14, 2012

http://redhill.net.au/b/b-bad.html

parley · on March 14, 2012

Many software engineers strive for modularity and component decoupling, sacrificing some things (e.g. performance, to a varying degree) and gaining others. I agree with this line of thinking, and I see no reason not to apply it to OS kernels as well. The MMU is a really good friend to have, and I think most systems in this world should appreciate reliability more than performance, especially when it comes to the kernel. I always get a bit sad as I think of the general state of things, remembering that a decade ago I thought today everything would be properly divided and sandboxed with a minimum of necessary privileges. I guess user experience trumps it in many cases.

telent · on March 14, 2012

It's interesting (to me at least) that we have largely dispensed with the Unix privilege model in production and replaced it with running an entire unix system for each application, virtually hosted on the real one. I wonder if, had there been more emphasis historically on reliability and decoupling, we would nowadays be running more than one service on a host instead of running them in individual VMs hypervised by that host.

I suspect the answewer is "no, not entirely" due to other limitations of the model: ports under 1024 are root-only, regular users can't call chroot(), etc etc - but there have been solutions proposed/designed/implemented for most of this stuff , they just haven't had much uptake.

tedunangst · on March 14, 2012

I think it's entirely possible to admin a multi service box, but it requires more skill and effort. Putting everything in a distinct VM makes all your problems look more like nails. Also, who wants to say they admin the server when they can say they admin the server cluster?

parley · on March 14, 2012

Good point. I heartily agree. The hardware is the resource the software is utilizing, and the better thought through and perhaps the more uniform (thus receiving much scrutiny and work) the tools for delegating and regulating access to those resources in order to preserve reliability, the better and more efficient the utilization of said resource, I should think.

bdunbar · on March 14, 2012

One server one app - we do this in production and dev, for a number of reasons.

The biggest reason is that it's just easier. Easier to build a new host, install services. If you need to bring the vm down it only affects one application. And so on.

dfc · on March 14, 2012

I thought this was an interesting twist in the history of Linux and Minix:

  Other important differences:
    * The MINIX project now uses `git` as its version-control system

vivekprakash · on March 14, 2012

I don't understand if Git has anything to with it. Simply because Linux and Minix differ on their views on kernel design, it doesn't mean Minix shouldn't use a better scm :)

dfc · on March 14, 2012

I thought it was interesting that tanenbaum's operating system is now using something that Linus made. I think it is a neat detail to an old story. Circle of life, small world whatever cliche you want to attach to it.

It was not meant as a jab at minix, i thought others would appreciate the continued interconnection between Linus and AST. I apologize that I did not explain my comment in greater detail.

marekmroz · on March 14, 2012

I have heard this over and over, "wohoo Linus won because Tanenbaum is using git", and frankly this is just ridiculous. Their disagreement was over how the kernel should be architected, not over personal issues. Not using a certain vcs because you disagree with its author on a completely unrelated subject would really be childish.

Moreover, vcs used to manage Minix'es source has zero impact on the merit of using one kernel architecture over the other.

[EDIT]: I understand that the "kernel war" argument is not what you meant, nevertheless Minix using Git is hardly a twist in Linux vs Minix history.

dfc · on March 14, 2012

Hardly a twist? So twenty years ago you think it most people would have found it likely that Linus's operating system would lead to the creation of the SCM that minix uses?

tedunangst · on March 14, 2012

Twenty years ago, I doubt many people gave much thought to anyone creating a new SCM. Is your premise that twenty years ago people would have accurately predicted other software trends? Was ruby on rails an inevitable development?

dfc · on March 14, 2012

No. My premise is that I do no think that many people would have thought it likely that the hobby project by that "kid on c.o.minix" would bring about a software development tool that Minix would one day use...

Sapient · on March 14, 2012

Actually, when you put it like that, it does sound likely. I am speaking from the point of view of a person who knows very little about these "Kernel wars".

epo · on March 14, 2012

Your premise is bogus and entirely based on hindsight. The initial spat between Linus and Andrew was based on their approaches to system design not on the merits of their programming or design abilities.

dfc · on March 14, 2012

What is bogus about being interested in such a confluence of unlikely events? I can not imagine a human interest piece in the NYT not including this in the narrative.

What does hindsight have to do with anything? It seems that any discussion about the relationship of two individuals is always retrospective...

DanBC · on March 14, 2012

It's mildly interesting. I guess it shows that PERSON_A is not an angry wingnut intent on ignoring everything from PERSON_B, but can make rational decisions about tools and software.

With some of the personalities involved in various high profile OS projects, and the numerous flame-wars, that's a useful detail.

maaku · on March 14, 2012

Maybe, but it would have been funnier if they used monotone ;)

andyjohnson0 · on March 14, 2012

As a student I did an OS design module in 1987-88. The main set text was Tannenbaum's book Operating Systems Design and Implementation. I remember it had the full Minix source as an appendix, and I don't remember it being a particularly thick book.

Would I be right in assuming that Minix has put on a bit of weight since then?

vivekprakash · on March 14, 2012

Yes, it has changed a lot since then and also put on some heavy weight in kernel and userland. The goal of Minix 3 is to become a highly reliable, flexible, and secure OS which can be used as a product in a serious way. Visit http://en.wikipedia.org/wiki/MINIX_3 for a comprehensive information.

maaku · on March 14, 2012

Although vivek, it's worth pointing out that the size of the kernel itself hasn't grown much.

vivekprakash · on March 15, 2012

You are right to some extent. Yet, there has been many improvements in the kernel since 1988, especially after Minix 3 was announced. And yes, the code base has significantly grown in all of drivers, user-space servers and userland.

shirro · on March 15, 2012

I want to like Minix. I had a copy of Design and Implementation and was going to download Minix when I heard of this new Linux thing, filled a bunch of floppies with an early Linux distro and never looked back. One of AST's arguments against monolithic was that it would be too hard to port to other systems. Just checked Minix 3 requirements - x86 only, not even x86-64. No ARM. Hard to see how it has progressed from being an academic thing.

llimllib · on March 14, 2012

Honest question: is the title of this article sarcastic?

vivekprakash · on March 14, 2012

No, it's not. May be you don't know MINIX 3 very well ;)

llimllib · on March 14, 2012

I don't know it at all. Minix to me has a reputation as the punch line of a joke about how to design kernels for the real world. (I don't claim that it's true, just that that's what I associate it with.)

Can you sell me on why I should be interested?

zedshaw · on March 14, 2012

That was 20 years ago, things change in that amount of time. It was also a lot of propaganda and not a lot of evidence.

But hey, it's not like programmers need to update their Internal Information Hashmap. Just put something in there once and leave it alone since that thing is delicate and updating it can sometimes crash your mind.

stcredzero · on March 14, 2012

Most youngsters (20's and under) will read this as a random snide comment. Unfortunately it (still) is a key observation about how our knowledge update heuristics lag behind progress in tech.

llimllib · on March 14, 2012

Zed, you realize that I asked because I wanted to change my IIH right?

vivekprakash · on March 14, 2012

"Ten years ago, most computer users were young people or professionals with lots of technical expertise. When things went wrong – which they often did – they knew how to fix things. Nowadays, the average user is far less sophisticated, perhaps a 12-year-old girl or a grandfather. Most of them know about as much about fixing computer problems as the average computer nerd knows about repairing his car. What they want more than anything else is a computer that works all the time, with no glitches and no failures. Many users automatically compare their computer to their television set. Both are full of magical electronics and have big screens. Most users have an implicit model of a television set: (1) you buy the set; (2) you plug it in; (3) it works perfectly without any failures of any kind for the next 10 years. They expect that from the computer, and when they do not get it, they get frustrated. When computer experts tell them: "If God had wanted computers to work all the time, He wouldn't have invented ‘Reset’ buttons" they are not impressed.

For lack of a better definition of dependability, let us adopt this one: A device is said to be dependable if 99% of the users never experience any failures during the entire period they own the device. By this definition, virtually no computers are dependable, whereas most TVs, iPods, digital cameras, camcorders, etc. are. Techies are willing to forgive a computer that crashes once or twice a year; ordinary users are not. Home users aren't the only ones annoyed by the poor dependability of computers. Even in highly technical settings, the low dependability of computers is a problem. Companies like Google and Amazon, with hundreds of thousands of servers, experience many failures every day. They have learned to live with this, but they would really prefer systems that just worked all the time. Unfortunately, current software fails them.

The basic problem is that software contains bugs, and the more software there is, the more bugs there are. Various studies have shown that the number of bugs per thousand lines of code (KLoC) varies from 1 to 10 in large production systems. A really well-written piece of software might have 2 bugs per KLoC over time, but not fewer. An operating system with, say, 4 million lines of code is thus likely to have at least 8000 bugs. Not all are fatal, but some will be. A study at Stanford University showed that device drivers – which make up 70% of the code base of a typical operating system – have bug rates 3x to 7x higher than the rest of the system. Device drivers have higher bug rates because (1) they are more complicated and (2) they are inspected less. While many people study the scheduler, few look at printer drivers.

The Solution: Smaller Kernels

The solution to this problem is to move code out of the kernel, where it can do maximal damage, and put it into user-space processes, where bugs cannot cause system crashes. This is how Minix 3 is designed."

From http://www.linux-magazine.com/Issues/2009/99/Minix-3

rapind · on March 14, 2012

I really do hope that user's demand more reliability from their computers (and various computing devices). However I believe that since the birth of the PC we've been training user's to tolerate a much higher rate of failure and a massive backlash is unlikely.

People have varying tolerance levels depending on what they're using. We have an insanely low tolerance level for jet failure (the safety checks and expense that goes into airfare is extremely high) due to the public nature of the failures. We have higher tolerance level for car failures even though they claim the lives of far more people every year. We have an extremely high tolerance level for personal computer failure.

I'd like to be wrong. Contrary to your statement, I find myself, as a techy, to be far more critical of computer failure than the average user. I will discontinue use of poorly written software much quicker than my non-techy family or friends.

sliverstorm · on March 14, 2012

The high tolerance for PC failure is practical and logical. Failure doesn't generally cost a whole lot compared to cars and jet planes, and the upside to being tolerant of failure is a greatly accelerated pace of development.

It's just another classic risk/reward tradeoff. End users tolerate more risk from computers in exchange for the benefits.

geoffschmidt · on March 15, 2012

> the average user is far less sophisticated, perhaps a 12-year-old girl

Argh. The author just had to specify that the unsophisticated 12-year-old is a girl. Because, hey, a 12-year-old boy might be a larval hacker, right?

> or a grandfather

Old people is another category of people who hopelessly "unlike" the presumed Linux Magazine reader. They certainly aren't interested in microkernels, but let's make sure they feel suitably old and marginalized if they ever try to change that.

tedunangst · on March 14, 2012

microkernel doesn't do much to solve this problem. "A device is said to be dependable if 99% of the users never experience any failures". Users don't care if the kernel doesn't crash. If the driver crashes, then the user still experiences a device failure, since a device without a functional driver is not functional.

hetman · on March 14, 2012

As mentioned by DennisP (but I can't reply on his post for some reason), one of the design goals of Minix is to have drivers seamlessly restarted so the user can continue uninterrupted.

tedunangst · on March 14, 2012

The notion that drivers can just seamlessly restart is as much a fairy tale as the bug free monolithic kernel. What does your filesystem do when the disk driver crashes? What does your app do? You're fucked all the way up the stack. Complex operations are going to smear their state across a variety of modules. Net result: you only have one big module.

maaku · on March 14, 2012

I guess that magic pixie dust must be a secret ingredient in HP's NonStop* architecture (runs air traffic control, stock exchanges, etc.)? I suggest actually taking a look at Minix 3, and other fault tolerant operating systems. Disk drivers infecting filesystems is a disease of the monolithic PC world.

* I have a friend who was an engineer for Tandem (now HP) in the 90's. They tested their servers in a demonstration for the government/defense department by taking them to a shooting range and running a benchmark test while firing indiscriminately with automatic weaponry. The story goes that the transaction processing declined precipitously as chips, blades, and motherboards were shattered. It went from millions, to thousands, to just a few dozen transactions per second with no data loss when a bullet clipped the serial jack they were using to log the benchmark. They got a very large order afterwards from the government/military.

I don't know if it actually happened (a Google search doesn't show anything), but having been shown by him the redundancy built into all levels of their architecture, and heard the stories about real failures in exchanges, air traffic control, and other critical never-turn-off deployments they do, I believe it could have. Reliable computing is possible.

tedunangst · on March 14, 2012

Whatever magic pixie dust is in minix, I'm pretty sure it's not going to suddenly make redundant CPUs sprout up in my laptop. You're talking about something else entirely. I could just as easily say that if half of Google's data centers were nuked, they could still serve searches, just slower, and therefore prove linux is utterly reliable.

Anyway, if you like anecdotes, I saw with my very own eyes the network cable between two OpenBSD firewalls chopped with an axe to no detrimental effect. So there. Monolithic kernels are superior to motherfucking axes.

bandy · on March 14, 2012

The less-destructive version of this demonstration when I first encountered one in the early 80s was for someone to walk up to the machine, open a cabinet, and randomly pull out a (coffee table book sized) card. No magic smoke, no screams of anguish, no sudden chatter from the console printing messages of lament from the operating system.

emmelaich · on March 15, 2012

I managed Tandem Nonstops and also Stratus FX machines. Multiple redundant hardware paths, mirrored ram etc.

God they were awful. The conservatism of design meant that although the hardware was fine and redundant and reliable, the software was crap; user hostile and buggy.

They would have been far better off making reliable clusters rather than make a machine internally redundant.

And expensive. Something around a million dollars for a 75 MHz machine (Stratus) in 1997.

strictfp · on March 14, 2012

I agree with tedunangst, it's really a game of all or nothing. I cannot think of any apps which acheive high stability by systematic fault recovery. Fault recovery is nice in itself, but it is never a good strategy for stability. Good code quality is.

DennisP · on March 14, 2012

Minix3 monitors the drivers and restarts them if they crash.

http://www.minix3.org/other/reliability.html

ketralnis · on March 14, 2012

If your goal is whole-system reliability, there's way more low-hanging fruit than the kernel. In the last five years I've had two kernel panics. It's so rare that I remember both times it's happened. But hardware failures (at least one hardware failure of some kind a year) and application crashes (once a week or more) happen all of the time. Hardware is a complex beast but many application crashes are significantly improvable by some other low-hanging fruit (say, better crash reporting for developers)

maaku · on March 14, 2012

But if your goal is absolute, near-100% reliability, you will eventually have to do something about the kernel.

Also, if you look at the design of Minix 3, they do address many of the concerns you mention. There's an infrastructure for checkpointing applications, and a “resurrection” server that acts as a configurable watchdog service for the entire software stack from device drivers to web servers.

The real goal of the microkernel architecture is to make these watchdog services as reliable as possible (there's only a few thousand lines of heavily audited code running beneath them). That, combined with user-space device drivers (so faulty hardware or driver code doesn't bring down the whole system) would address most of your concerns.

No surprise, that's the path they are headed down. I even see that this release includes a "block device fault injection driver" for simulating hardware failures.

dfc · on March 14, 2012

Is minix competing with things like VxWorks and QNX? I did a cursory google/wikipedia search for minix uses and the results were all for education.

vivekprakash · on March 14, 2012

Disclaimer: I am not an official member of Minix team. However, given my understanding, it certainly looks Minix is going to compete with QNX, VxWorks and similar OS used in embedded systems. Visit http://wiki.minix3.org/en/MinixGoals and http://wiki.minix3.org/en/MinixRoadmap for details.

hollerith · on March 14, 2012

I see that nedit (a text editor that usually runs on X11) and Gnu Go have been ported to Minix 3.

Do any non-toy web browsers run on Minix 3?

maaku · on March 14, 2012

Their roadmap shows someone working on a Firefox port.

If they could get full-screen webkit/chrome with auto-updating working, there's a lot of potential use-cases that open up. A truly reliable, always-on web device with low power requirements? Sign me up.

JohnQPasserby · on March 14, 2012

It's irritating when these so-called "open source" projects don't have a link to the source from the home page or anywhere else obvious.

boyter · on March 14, 2012

Why the downvote? This is an issue.

When collecting repositories for searchco.de I literally have spent hours hunting around on many project pages looking for the page which points you at how to get the source for a project. Its usually buried in a wiki page in the developer subdomain. A simple link from the homepage with "Get Source" would save myself and im certain a lot of others a great deal of time.

JohnQPasserby · on March 16, 2012

can I ask what are your imagined use cases for searchco.de? It looks interesting but I cannot think of what I could use it for

boyter · on March 18, 2012

Its just a semi replacement for Google code search. To be honest, its just something I play with. If something comes from it great, if not im not terribly worried about it.

JohnQPasserby · on March 21, 2012

It's neat, but it's hard to find something unless you can guess what people name their fn and vars for programming domain X

boyter · on March 21, 2012

True. I have been using it a lot recently to learn how the Guava and Guice API's work in the real world. So perhaps that's its target market. I also tend to use it when trying to remember how some feature of a language works EG lambdas in python http://searchco.de/?q=lambda+ext%3Apy&cs=on

dfc · on March 14, 2012

"Your job is being a professor and researcher: That's one hell of a good excuse for some of the brain-damages of Minix."

Linus Torvalds' post to comp.os.minix newsgroup.

fuzzix · on March 14, 2012

We're all well aware of the Usenet flamewar 20 years ago.

What has that to do with Minix now?

dfc · on March 14, 2012

not everyone is aware:

http://news.ycombinator.com/item?id=3683984

vivekprakash · on March 14, 2012

If something is very successful, it doesn't mean it is very good and nothing can replace it. The world is currently divided into many school of thoughts, and only time will tell which kernel design is going to win the war. Also, there has been growing hatred towards Linux with its bloated kernel and eating all the RAM!

harshreality · on March 14, 2012

If anything is bloated, it's typical linux userland. The linux kernel, with uclibc, busybox, dropbox, and other userland tweaks, can fit onto an embedded router with 4MB flash and 16MB ram.

Linux caches in RAM pages of files it reads. It will evict them at the drop of a hat if it needs to.

Sadly, no matter how many blog entries appear on the subject, people still expect to see free ram, and think linux is bloated when they don't see any.

If you don't like that behavior, I don't know if any OSes leave "free" ram untouched. They might call it free though, even when it's used by the OS for caching.

On the issue of microkernels vs monolithic, I prefer microkernels for their elegance in theory. If one appears with performance in the ballpark of linux, that glibc supports and that will run linux userland with no problems, I'll switch. I will not, however, give up functionality in the name of ideology.

noselasd · on March 15, 2012

That's because people and tools look at the wrong values.

    $ free -m
                 total       used       free     shared      buffers     cached
    Mem:          2009       1491        518          0          466        363
    -/+ buffers/cache:        661       1348
    Swap:         1992          0       1992

Here, you want the number 1348, which is the "free" memory you have, not the number "518" which at first glance is the "free" memory.

Symmetry · on March 14, 2012

I'm given to understand that Debian and Arch linux have GNU Hurd live CDs out there for people who want to run it. I don't really know much about the OS, but does it meet your requirements?

duskwuff · on March 14, 2012

Hurd makes Minix look like a rock-solid production OS by comparison.

muyuu · on March 14, 2012

That's because many people have no idea how to even read the memory usage stats in Linux.

unimpressive · on March 14, 2012

This.

http://www.linuxatemyram.com/

dfc · on March 14, 2012

No "kernel design is going to win the war." Just like you will never find a "one size fits all" shoe. Different problems require different solutions.

andyjohnson0 · on March 14, 2012

If you're think of this as a Linux vs Minix 'war' then you are looking at it the wrong way. Minix is intended for research/education. Its not going to replace Linux because they do different things.

dchest · on March 14, 2012

Not anymore:

"It was only with the third version, MINIX 3, and the third version of the book, published in 2006, that the emphasis changed from teaching to a serious research and production system, especially for embedded systems. A few of the many differences between MINIX 2 and MINIX 3 are given here.

Going forward, we are making a serious effort to turn MINIX 3 in an industrial-grade system with a focus on the embedded market, especially for those applications that need high reliability and availability."

http://www.minix3.org/other/read-more.html

andyjohnson0 · on March 14, 2012

Okay, I wasn't aware of the Minix 3 redesign, so I stand corrected on that. Thanks for updating me!

But I'm fairly skeptical about how much use (if any) Minix is getting outside research and education. Aren't there already widely deployed HR/HA OSs that target the embedded market? What real advantages does Minix have over something like VxWorks?

pjmlp · on March 14, 2012

Maybe by being open source.

Does VxWorks come with source code? I guess this might be critical to some type of applications.

For me the positive is to see more microkernel architectures getting spread.

vivekprakash · on March 14, 2012

I didn't intend to sound the way it seems i sounded. Sorry for that! Btw, the focus of Minix 3 has changed and it is no longer intended just for research/education. Minix 3 was publicly announced on 24 October 2005 by Andrew Tanenbaum and it is comprehensively redesigned to be usable as a serious system on resource-limited and embedded computers and for applications requiring high reliability.

snowwrestler · on March 14, 2012

What's the point of having memory if the OS isn't going to use it?

stonemetal · on March 14, 2012

So that applications can use it?