To most, the Linux kernel is rife with mystery, a thing of unfathomable complexity. It is with fear and great caution that most Linux users approach the kernel, daring not to disturb the beasts within. Many users are content to simply say a prayer and proceed with the occasional upgrade or superstitious rebuild. To be sure, the internals of the Linux kernel are not for the faint-hearted, but deciphering the myriad secrets which the kernel contains is not as daunting a task as one might imagine. The basic ideas are easy to grasp, and you are helped greatly by the fact that all of the source code is at your disposal. Exploring the kernel's internals, no matter how superficially, presents the opportunity to prod within the viscera of a modern operating system, thereby obtaining what we can only call "kernel knowledge." (I can hear the groans already!)
In this article, I'd like to uncover some of the mysteries of the Linux kernel and explain its innards from a very high-level perspective, and in the most intuitive manner possible. I will point out the portions of the kernel source code which correspond to the ideas described herein. Before we dive in, however, I must begin with an important disclaimer: this article greatly oversimplifies many important details!If you are a rugged and seasoned kernel hacker, I beg that you excuse this travesty (in fact, you should probably be reading something else). In order for things to make sense at this level, sometimes I must resort to a few creative lies.
What is the kernel anyway?
Put simply, the kernel is the Linux operating system itself. The term "kernel" originates from the early days of operating systems design, when the operative metaphor was one of a system which consisted of many layers. User applications were on the outside and the operating system core (hence, "kernel") was in the center. This is still a fairly accurate representation, so the term persists.
The kernel is responsible for controlling access to all of the machine's resources: CPU, memory, disk, network interfaces, graphics boards, keyboards, mice, and so forth. It's the kernel's job to make these resources available to applications (such as Emacs, the GIMP, etc.) that wish to use them (this is referred to as "multiplexing" system resources), and to prevent individual applications from interfering with one another. Many of the hardware resources controlled by the kernel are thought of as peripheral devices -- such as disks, network interfaces, and so forth; we use the term "device drivers" to describe those parts of the kernel which interface such devices to the system.
So, the kernel has two primary jobs: to multiplex system resources and to protect applications from interfering with one another's use of those resources. The most straightforward example of multiplexing is what most people call multitasking --allowing multiple applications (or "tasks") to share the same physical CPU, but giving each application the illusion that it has the entire CPU to itself. Most modern operating systems (including Linux, all variants of UNIX, and even Windows 98) provide some form of multitasking. An example of resource protection is the way in which the kernel prevents two applications from reading or writing each other's memory; for example, it shouldn't be possible for Emacs to corrupt the memory being used by the GIMP running on the same system. It's the kernel's job to ensure that this is the case.
As it turns out, the kernel gets a lot of help from the system hardware when it comes to multiplexing and memory protection. For example, the Intel x86 CPU architecture (everything from the 80386 on up) includes support for memory protection and CPU multitasking; in fact, without some hardware support, it is very difficult to do these things. The Linux kernel relies on the fact that the CPU will tell it when an application has made a bad memory reference (one which might be a protection violation). Without this signal (called a "page fault"), the kernel would have no way to enforce memory protection between processes.
What all of this technical mumbo-jumbo boils down to is that a Linux system can have many separate applications running simultaneously, sharing the system's CPU, memory, and other resources, and it's impossible for these applications to interfere with each other or otherwise cause damage to the system. All of this is very good for application programmers, who can write programs with the knowledge that their code is "fenced in" by the kernel, making it unlikely that a program which goes haywire could cause any real harm.
The kernel structure
The Linux kernel consists of a number of components working together: there's the virtual memory subsystem (which implements both memory protection and "paging", which allows disk to be used in place of physical memory); the scheduler (which multiplexes the CPU across multiple applications); the file systems (including the Linux ext2fs, NFS, ISO-9660, MS-DOS FAT, and other filesystem types); the networking code (including the TCP/IP protocol stack, as well as code for PPP, SLIP, AppleTalk, and other protocols); as well as a hoard of device drivers for everything ranging from serial ports to disk controllers to network cards.
Structurally, everything in the kernel is compiled together as one big program which is started when the system boots. Because of this, Linux (as with other UNIX systems) is sometimes referred to as a "monolithic" kernel design, as opposed to a "microkernel-" based system. In a microkernel-based system, the OS is composed of a number of separate programs, each of which is structurally independent of the others. Linux does have a mechanism by which new pieces of kernel code can be added to the system dynamically -- using so-called loadable kernel modules. However, once a kernel module is loaded, it really becomes part of the "one big program", no different than any other kernel code.
Figure 1 shows the overall structure of the kernel, which sits between the user applications and the system hardware. User applications and the kernel share the CPU and system memory; we say that the applications live in "user space" while the kernel resides in "kernel space". These "spaces" imply more than just physical separation, they also refer to the privilege that each has. In short, user applications are only able to access their limited memory space, a certain percentage of overall CPU time, and so forth, while the kernel has the ultimate privilege to access any hardware device, read or write any memory address, consume as much CPU time as it requires, and so forth. This privilege distinction is important because it is this power which gives the kernel the ability to protect and multiplex system resources between user processes. User-space code, on the other hand, is subject to the limits placed upon it by the kernel.
In Figure 1, blue lines connecting the various components within the kernel (and to hardware devices) indicate that those components directly interact in some way. For example, the TCP/IP stack sends network packets through either the TCP or UDP code path, but both types of packets are eventually handled by the IP layer. In this figure, "VFS" stands for the Virtual Filesystem layer, which abstracts away the details of the particular filesystem types (such as ext2fs and ISO-9660, as shown) from user applications. This means that applications need not know what type of filesystem is being accessed when a file is opened, read, written, and so forth. "IPC" stands for Interprocess Communication and includes various mechanisms user processes employ to "talk" to each other and coordinate their activity. The component labeled "SMP" is the shared-memory multiprocessing support in the Linux kernel, which enables the use of systems with multiple CPUs.