Wednesday, April 20, 2011

Microkernel approach in minix3

In computer science, a microkernel is the near-minimum amount of software that can provide the mechanisms needed to implement an operating system. These mechanisms include low-level address space management, thread management, and inter-process communication (IPC).As an operating system design approach, microkernels permit typical operating system services, such as device drivers, protocol stacks, file systems code, to run in user space.MINIX3 has around 4,000 lines of code. Kernels larger than 20,000 lines are generally not considered microkernels.

Reflecting on the nature of monolithic kernel based systems, where a driver (which has approximately 3-7 times as many bugs as a usual program) can bring down the whole system,MINIX3 aims to create an operating system that is a "reliable, self-healing, multiserver UNIX clone".In order to achieve that, the code running in kernel must be minimal, with the file server, process server, and each device driver running as separate user-mode processes. Each driver is carefully monitored by a part of the system known as the reincarnation server. If a driver fails to respond to pings from the reincarnation server, it is shut down and replaced by a fresh copy of the driver.In a monolithic system, a bug in a driver can easily crash the whole kernel, something that is much less likely to occur in MINIX3.

Reliability factors in MINIX3 :-

Reduce kernel size
Cage the bugs
Limit drivers' memory access
Survive bad pointers
Tame infinite loops
Limit damage from buffer overruns
Restrict access to kernel functions
Restrict access to I/O ports
Reincarnate dead or sick drivers
Integrate interrupts and messages

Architecture :

The approach that MINIX 3 uses to achieve high reliability is fault isolation. In particular, unlike traditional OSes, where all the code is linked into a single huge binary running in kernel mode, in MINIX3, only a tiny bit of code runs in kernel mode about 4000 lines in all. This code handles interrupts, process scheduling, and interprocess communication. The rest of the operating system runs as a collection of user-mode processes, each one encapsulated by the MMU hardware and none of them running as superuser. One of these processes, dubbed the reincarnation server, keeps tabs on all the others and when one of them begins acting sick or crashes, it automatically replaces it by a fresh version. Since many bugs are transient, triggered by unusual timing, in most cases, restarting the faulty component solves the problem and allows the system to repair itself without a reboot and without the user even noticing it. This property is called self healing, and traditional systems do not have it.

1 comment:

  1. Hi,

    Very interesting post! I liked it, and learned some interesting things from it.

    I didn't know about the MMU encapsulation. That's very interesting. I wonder if that limits MINIX to run on only processors that have MMU's? I'm also not sure what it means to "survive bad pointers" and "integrate interrupts and messages."

    I also wonder about the servers. What would happen if the reincarnation, memory, or process scheduling servers had a fault?

    In my opinion, it would be really cool if MINIX were designed for real-time guarantees or had scheduling options that could be chosen. I guess that might be a wasteful use of code with little benefit though.

    Thanks for writing this! It's very interesting.