Brief Analysis of
Linux Kernel Architecture
At the top is the user (or application) space. This is where the user application is executed. Below the user space is the kernel space, where the Linux kernel is located. The GNU C Library (glibc) is also here. It provides a system call interface to connect to the kernel, and it also provides a mechanism for converting between user space applications and the kernel. This is very important because the kernel and user space applications use different protected address spaces. Each user space process uses its own virtual address space, while the kernel occupies a separate address space.
The
Linux kernel can be further divided into 3 layers. At the top is the system call interface, which implements some basic functions, such as read and write. Below the system call interface is the kernel code, which can be more accurately defined as the kernel code independent of the architecture. These codes are common to all processor architectures supported by Linux. Below these codes is the code that depends on the architecture, which forms part of what is usually called BSP (Board Support Package). These codes are used as the processor and platform-specific code for a given architecture.
The Linux kernel implements many important architectural attributes. At a higher or lower level, the kernel is divided into multiple subsystems. Linux can also be seen as a whole, because it integrates all these basic services into the kernel. This is different from the architecture of the microkernel, which provides some basic services, such as communication, I/O, memory, and process management. More specific services are inserted into the microkernel layer. Each kernel has its own advantages, but this is not discussed here.
As time goes by, the Linux kernel has high efficiency in memory and CPU usage, and is very stable. But for Linux, the most interesting thing is that it still has good portability under the premise of this size and complexity. After Linux is compiled, it can run on a large number of processors and platforms with different architectural constraints and requirements. An example is that Linux can run on a processor with a memory management unit (MMU) or on processors that do not provide an MMU.
The uClinux port of the Linux kernel provides support for non-MMU.
The main components of the Linux kernel are: system call interface, process management, memory management, virtual file system, network stack, device driver, hardware architecture related code.
(1) System call interface
The SCI layer provides some mechanisms to execute function calls from user space to the kernel. As discussed earlier, this interface depends on the architecture, even within the same processor family. SCI is actually a very useful function call multiplexing and demultiplexing service. You can find the implementation of SCI in ./linux/kernel, and the architecture-dependent parts in ./linux/arch.
(2) Process management
The focus of process management is the execution of the process. In the kernel, these processes are called threads, which represent individual processor virtualization (thread code, data, stack, and CPU registers). In user space, the term process is usually used, but the Linux implementation does not distinguish between these two concepts (process and thread). The kernel provides an application programming interface (API) through SCI to create a new process (fork, exec or Portable Operating System Interface [POSIX] function), stop the process (kill, exit), and communicate and synchronize between them (Signal or POSIX mechanism).
Process management also includes handling the need to share CPUs between active processes. The kernel implements a new type of scheduling algorithm, which can operate within a fixed time no matter how many threads are competing for the CPU. This algorithm is called the O(1) scheduler, and the name means that the time it takes to schedule multiple threads is the same as the time used to schedule one thread. The O(1) scheduler can also support multiple processors (called symmetric multiprocessors or SMP). You can find the source code for process management in ./linux/kernel and the source code that depends on the architecture in ./linux/arch.
(3) Memory management
Another important resource managed by the kernel is memory. In order to improve efficiency, if the virtual memory is managed by the hardware, the memory is managed in the so-called memory page mode (4KB for most architectures). Linux includes ways to manage available memory, as well as the hardware mechanisms used by physical and virtual mapping. But memory management needs to manage more than 4KB buffers. Linux provides an abstraction for 4KB buffers, such as the slab allocator. This memory management mode uses a 4KB buffer as the base, and then allocates structures from it, and tracks memory page usage, such as which memory pages are full, which pages are not fully used, and which pages are empty. This allows the mode to dynamically adjust memory usage based on system needs. In order to support the use of memory by multiple users, sometimes the available memory is consumed. For this reason, pages can be moved out of memory and placed on disk. This process is called swapping, because the pages are swapped from memory to hard disk. The source code for memory management can be found in ./linux/mm.
(4) Virtual file system
The virtual file system (VFS) is a very useful aspect of the Linux kernel because it provides a common interface abstraction for the file system. VFS provides an exchange layer between the SCI and the file system supported by the kernel.
On top of VFS, it is a general API abstraction for functions such as open, close, read, and write. Below the VFS is the abstraction of the file system, which defines the implementation of upper-level functions. They are plugins for a given file system (more than 50). The source code of the file system can be found in ./linux/fs. Below the file system layer is the buffer cache, which provides a set of general functions for the file system layer (independent of the specific file system). This caching layer optimizes access to physical devices by keeping the data for a period of time (or then reading the data in advance so that it is available when needed). Below the buffer cache is the device driver, which implements the interface of a specific physical device.
(5) Network stack
The network stack is designed to follow the layered architecture of the analog protocol itself. Recall that Internet Protocol (IP) is the core network layer protocol under the transport protocol (often called Transmission Control Protocol or TCP). Above TCP is the socket layer, which is called through SCI. The socket layer is the standard API of the network subsystem, which provides a user interface for various network protocols. From raw frame access to IP protocol data unit (PDU) to TCP and User Datagram Protocol (UDP), the socket layer provides a standardized method to manage connections and move data between various endpoints. The network source code in the kernel can be found in ./linux/net.
(6) Device driver
A lot of codes in the Linux kernel are in device drivers, which can run specific hardware devices. The Linux source tree provides a driver subdirectory, which is further divided into various supporting devices, such as Bluetooth, I2C, and serial. The code for the device driver can be found in ./linux/drivers.
(7) Code that depends on the architecture
Although Linux is largely independent of the architecture it is running on, there are elements that must be considered in order to operate properly and achieve higher efficiency. The ./linux/arch subdirectory defines the architecture-dependent part of the kernel source code, which contains various architecture-specific subdirectories (together form the BSP). For a typical desktop system, the x86 directory is used. Each architecture subdirectory contains many other subdirectories, and each subdirectory focuses on a specific aspect of the kernel, such as booting, kernel, memory management, etc. The architecture-dependent code can be found in ./linux/arch.
If the portability and efficiency of the Linux kernel are not good enough, Linux also provides some other features that cannot be classified into the above categories. As a production operating system and open source software, Linux is a good platform for testing new protocols and their enhancements. Linux supports a large number of network protocols, including typical TCP/IP, and high-speed network extensions (greater than 1 Gigabit Ethernet [GbE] and 10 GbE). Linux can also support protocols such as the Stream Control Transmission Protocol (SCTP), which provides many more advanced features than TCP (it is the successor to the transport layer protocol).
Linux is also a dynamic kernel that supports dynamic addition or removal of software components. Known as dynamically loadable kernel modules, they can be inserted by the user at boot time as needed (currently a specific device requires this module) or at any time.
One of the latest enhancements to Linux is that it can be used as an operating system (called a hypervisor) for other operating systems. Recently, the kernel has been modified and is called Kernel-based Virtual Machine (KVM). This modification enables a new interface for user space, which allows other operating systems to run on top of KVM-enabled kernels. In addition to running other instances of Linux, Microsoft Windows can also be virtualized. The only restriction is that the underlying processor must support the new virtualization instructions.
The difference between
Linux architecture and kernel structure
1. When asked about the Linux architecture (that is, how the Linux system is structured), we can answer this way: In a big way, the Linux architecture can be divided into two parts:
(1) User space: User space includes user applications and C library
(2) Kernel space: Kernel space includes system calls, kernel, and code related to platform architecture
2. The reason why the Linux architecture is divided into user space and kernel space:
1) Modern CPUs usually implement different working modes,
Take ARM as an example: ARM implements 7 working modes, and the instructions that the CPU can execute or the registers that can be accessed in different modes are different:
(1) User mode usr
(2) System mode sys
(3) Management mode svc
(4) Fast interrupt fiq
(5) External interrupt irq
(6) Data access termination abt
(7) Undefined instruction exception
Take (2) X86 as an example: X86 implements 4 different levels of permissions, Ring0—Ring3; Privileged instructions can be executed under Ring0, and IO devices can be accessed; Ring3 has many restrictions
2) Therefore, from the perspective of the CPU, Linux divides the system into 2 parts in order to protect the security of the kernel;
3. User space and kernel space are two different states of program execution. We can complete the transfer from user space to kernel space through "system call" and "hardware interrupt"
4. Linux kernel structure (note the distinction between LInux architecture and Linux kernel structure)