Understanding of the Linux Conceptual architecture (RPM)

Last Update:2016-01-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

English Original: Conceptual Architecture of the Linux Kernel

Summary

Two reasons for Linux kernel success: (1) architecture design supports a large number of volunteer developers to participate in the development process; (2) each subsystem, especially those that need improvement, supports good extensibility. It is these two reasons that make Linux kernel can evolve.

The location of the Linux kernel in the entire computer system

Fig 1-Hierarchical structure of computer systems

Hierarchy Principle: The dependencies between subsystems is from the top down:layers pictured near the top depend on lower Layers, but subsystems nearer the bottom does not depend on higher layers.

The dependencies between the subsystems can only be from top to bottom, that is, the subsystem of top of the diagram relies on bottom subsystem, and vice versa.

Second, the role of the kernel

Virtualization (abstraction), which abstracts the computer hardware into a virtual machine for use by user processes (process), and does not need to know how the hardware works at all, as long as it calls the virtual interface provided by the Linux kernel (Vsan interface) Can.
multitasking, in fact, multiple tasks in parallel use of computer hardware resources, the task of the kernel is to arbitrate the use of resources, manufacturing each process is considered to be an exclusive system of the illusion.

PS: Process Context switch is to replace the program status Word, replace the contents of the page Table base Register, replace the current point of the task_struct instance, replace the pc--> also replaced the process open files (through the task_struct can be found), Changed the execution space of process memory (can be found by task_struct Mem);

Third, the overall architecture of the Linux kernel

The overall architecture of the Linux kernel

The central system is process Scheduler (SCHED): All remaining subsystems are dependent on process Scheduler because the remaining subsystems require blocking and recovery processes. When a process needs to wait for a hardware action to complete, the subsystem will block the process, and when this hardware action is complete, the subsystem will restore the process: this blocking and resuming action depends on processor scheduler completion.

Each of the dependent arrows has a reason:

Process Scheduler relies on memory Manager: When the processes resume execution, it is necessary to rely on the memories Manager to allocate the RAM for it to run.
The IPC subsystem relies on the memory Manager: The shared RAM mechanism is a method of interprocess communication, running two processes that utilize the same block of shared memory space for information delivery.
VFS relies on network Interface: Supports NFS file system;
VFS relies on memory Manager: Supports RAMDisk devices
Memory Manager relies on VFS because to support swapping, a process that is temporarily not running can be swapped out to a swap partition on disk to enter a pending state.

Four, highly modular design system, conducive to the division of labor.

Only a handful of programmers need to work across multiple modules, and this happens only when the current system needs to rely on another subsystem;
The Hardware device driver (Hardware device drivers), the file system module (logical filesystem modules), the network device driver, and the network Protocol module Protocol modules) The scalability of these four modules is highest.

V. Data structures in the system

Task List
Process Scheduler maintains a data structure task_structfor each process; All processes are managed with a list of tasks to form a task list; Process Scheduler also maintains a the current pointer points to the process that is currently consuming the CPU.
Memory Map
Memory Manager stores the mapping of virtual addresses to physical addresses for each process, and also provides how to swap out specific pages or how to do page faults. This information is stored in the data structure mm_struct . Each process has a mm_struct structure that has a pointer in the TASK_STRUCT structure of the process that points to the mm_struct structure of the secondary process.
In Mm_struct, there is a pointer pgd, which points to the page directory table of the process (that is, the first address of the page directory), and when the process is dispatched, this pointer is replaced with a physical address, which is written to the control register CR3 (the page base register in the x86 architecture)
I-nodes
The VFS represents a file image on disk through the Inodes node, and Inodes is used to record the physical properties of the file. Each process has a files_struct structure that represents the file opened by the process and has a files pointer in task_struct. File sharing can be implemented using the Inodes node. There are two ways to file sharing: (1) Open file files to the same inodes node through the same system, which occurs between parent and child processes, (2) open files through different systems point to the same inode node, for example with hard links , or two unrelated pointers open the same file.
Data Connection
The root of all the data structures in the kernel are in the Task list linked list maintained by Process Scheduler. The data structure of each process in the system task_struct has a pointer to its memory mapping information; There is also a pointer to files that points to its open file (the user opens the File table), and a pointer to the network socket that the process opens.

VI. Subsystem Architecture 1. Process Scheduler Architecture (1) Target

Process Scheduler is the most important subsystem in Linux kernel. It is the system that controls access to the CPU-not just the CPU access of the user process, but also the CPU access of the remaining subsystems.

(2) module

Process Scheduler
Scheduling policy module: determines which process obtains access to the CPU, and the scheduling policy should allow all processes to share the CPU as equitably as possible.

The architecture-related module (architecture-specific module) designs a unified set of abstract interfaces to mask the hardware details of a particular system interface chip. This module interacts with the CPU to block and restore the process. These operations include obtaining registers and status information that each process needs to save, executing assembly code to complete blocking, or resuming operations.
The architecture-independent module (architecture-independent modules) interacts with the scheduling policy module to determine the next executing process, and then calls the architecture-related code to restore the execution of that process. Not only that, the module also invokes the memory manager interface to ensure that the ram-mapped information for the blocked process is stored correctly.
The system call Interface Module (interface) allows the user process to access the resources that the Linux kernel explicitly exposes to the user process. Decoupling the user application from the Linux kernel with a set of basically immutable interfaces (POSIX standards) that define the appropriate, so that user processes are not affected by kernel changes.

(3) Data representation

The scheduler maintains a data structure--task list, where the elements are task_struct instances of each active process, and this data structure not only contains information that is used to block and recover processes, but also contains additional counts and status information. This data structure can be accessed publicly throughout the kernel layer.

(4) Dependencies, data flow, control flow

As mentioned earlier, the scheduler needs to invoke the functionality provided by Memory Manager to select the appropriate physical address for the process that needs to be resumed, so the process Scheuler subsystem relies on the memory management subsystem. When other kernel subsystems need to wait for hardware requests to complete, they rely on the process scheduling subsystem for process blocking and recovery. This dependency is reflected through function calls and access to shared task list data structures. All of the kernel subsystems read or write the data structure representing the current running process, thus forming a bidirectional data flow throughout the system.

In addition to the core layer of data flow and control flow, the OS service layer also provides the user process with an interface for registering timers. This forms the control flow of the user process by the scheduler. The use cases that usually wake up the sleep process are not in the normal control flow range because the user process cannot predict when it will wake up. Finally, the scheduler interacts with the CPU to block and recover the process, which in turn forms the data flow between them and the control flow--CPU is responsible for interrupting the currently running process and allowing the kernel to schedule other processes to run.

2. Memory Manager Architecture (1) Target

The memory management module is responsible for controlling how the process accesses physical memory resources. The mapping between process virtual memory and machine physical memory is managed through a hardware memory management system (MMU). Each process has its own independent virtual memory space, so two processes may have the same virtual address, but they actually run in different physical memory areas. The MMU provides memory protection so that the physical memory space of two processes does not interfere with each other. The memory management module also enables swap--to swap out the temporarily unused memory pages to swap partitions on disk, which makes the process's virtual address space larger than the size of the physical memory. The size of the virtual address space is determined by the machine word length.

(2) module

Memory Management Subsystem

The schema-dependent module (Architecture specific module) provides a virtual interface for accessing physical memory;
The schema-independent modules (architecture Independent module) are responsible for address mapping and virtual memory exchange for each process. When a page fault occurs, it is up to the module to decide which memory pages should be swapped out of memory-because the memory page swap-out selection algorithm requires little change, so there is no separate policy module.
System call interface provides strict access interfaces (malloc and Free;mmap and Ummap) for user processes. This module allows processes to allocate and free memory, and perform memory-mapped file operations.

(3) Data representation

Memory management stores the mapping information for each process's virtual memory to physical memory. This mapping information is stored in the MM_STRUCT structure instance, and the pointer to this instance is stored in the task_struct of each process. In addition to storing mapping information, data blocks should also contain information about how the memory manager obtains and stores pages. For example, executable code can store an executable image as a backup, but dynamically requested data must be backed up to a system page. (This does not understand, please master doubts?) ）
Finally, the memory management module should also store access and technical information to ensure the security of the system.

(4) Dependencies, data flow, and control flow

The memory manager controls physical memory and, when page fault occurs, accepts hardware notifications (fault pages)-which means that there is a bidirectional flow of data and control between the memory management module and the memory management hardware. Memory management also relies on file systems to support swapping and memory mapping i/o--This requirement means that the memory manager needs to invoke the function interface (procedure calls) provided to the file system, to store memory pages on disk, and to fetch memory pages from disk. Because the file system request is very slow, the memory manager wants the process to go into hibernation before waiting for the memory page to be swapped in--a requirement that allows the memory manager to invoke the interface of the process scheduler. Because the memory map for each process resides in the data structure of the process scheduler, there are bidirectional data flows and control flows between the memory manager and the process scheduler. The user process can establish a new process address space and be able to perceive the fault of the pages--a control flow from the memory manager is required. In general, there is no user process to the memory manager of the data flow, but the user process can be called through the select System, from the memory manager to obtain some information.

3. Virtual File System Schema (1) target

The virtual file system provides a unified access interface for data stored on hardware devices. Can be compatible with different file systems (ext2, EXT4, ntf, etc.). Almost all of the hardware devices in a computer are represented as a common device driver interface. The logical file system facilitates compatibility with other operating system standards and allows developers to implement file systems with different policies. The virtual file system further allows the system administrator to mount any logical file system on any device. The virtual file system encapsulates the details of the physical device and the logical file system, and allows the user process to access the file using a unified interface.

In addition to the traditional file system goals, VFS is also responsible for loading new executables. This task is done by the logical file system module, which allows Linux to support a variety of executable files.

(2) module

Virtual File System module

Device driver Module (Driver module)
Device independent Interface: Provides the same view for all devices
Logical filesystem (Logical file System): For each of the supported file systems
The System independent Interface provides an interface that is independent of both the hardware resource and the logical file system, which provides all the resources through the block device node or the character device node.
System call Interface provides unified control access to the file system by the user process. The virtual file system masks all the special features for user processes.

(3) Data representation

All files are represented using I-nodes. Each Inode records the location information of a file on the hardware device. Not only that, the inode also holds pointers to logical file system modules and device-driven functions that perform specific read and write operations. By storing function pointers in this form (that is, the idea of virtual functions in object-oriented), specific logical file systems and device drivers can register themselves with the kernel without requiring the kernel to rely on specific module features.

(4) Dependencies, data flow, and control flow

A special device driver is RAMDisk, a device that opens up an area in main memory and uses it as a persistent storage device. This device drives the task using the Memory management module, so there is a dependency on the VFS with the Memory management module (the dependency in the diagram is reversed and should be the VFS relies on the memory management module), the data flow, and the control flow.

The logical file system supports the network file system. This file system accesses files from another machine like a local file. To achieve this, a logical file system accomplishes its task through the network subsystem-which introduces a dependency of VFS on the network subsystem and the flow of control and data between them.

As mentioned earlier, the memory manager uses VFS to complete the memory swap function and memory-mapped I/O. Also, when the VFS waits for a hardware request to complete, the VFS needs to use the process Scheduler to block the process, and when the request completes, the VFS needs to wake the process through the process scheduler. Finally, the system invocation interface allows the user process to call in to access the data. Unlike the previous subsystem, the VFS does not provide a mechanism for the user to register ambiguous calls, so there is no control flow from the VFS to the user process.

4. Network Interface Architecture (1) Target

The network subsystem allows the Linux system to connect to other systems through the network. This subsystem supports many hardware devices and also supports many network protocols. The network subsystem masks the implementation details of both the hardware and the Protocol, and abstracts out an easy-to-use interface for user processes and other subsystems-the user process and the rest of the subsystems do not need to know the details of the hardware Device and protocol.

(2) module

Network protocol layer Module diagram

Network device driver Module (drivers)
The device independent interface module provides a consistent access interface for all hardware devices so that the high-level subsystem does not need to know the details of the hardware.
Network Protocol Module (Protocol modules) is responsible for implementing each network transport protocol, for example: Tcp,udp,ip,http,arp and so on ~
The Protocol independent module (Protocol independent Interface) provides a consistent interface independent of specific protocols and specific hardware devices. This allows the remaining kernel subsystems to access the network without relying on specific protocols or devices.
The system invoke interface module (calls interface) specifies the network programming API that the user process can access

(3) Data representation

Each network object is represented as a socket (socket). Sockets are associated with processes in the same way as i-nodes nodes. With two task_struct pointing to the same socket, sockets can be shared by multiple processes.

(4) Data flow, control flow and dependency relationships

When the network subsystem waits for a hardware request to complete, it needs to block and wake the process through the process scheduling system-which forms the control flow and data flow between the network subsystem and the process scheduling subsystem. Moreover, the virtual file system implements Network File system (NFS) through the network subsystem, which forms the data flow and control flow of the VFS and the network subsystem nails.

Vii. Conclusion

1, the Linux kernel is a layer of the entire Linux system. The kernel is conceptually composed of five main subsystems: The Process Scheduler module, the memory management module, the virtual file system, the network interface module and the interprocess communication module. These modules interact with data through function calls and shared data structures. 、

2. The Linux kernel architecture has facilitated its success, and this architecture allows a large number of volunteer developers to work in a suitable division of labor and to make each specific module easy to expand.

Scalability One : The Linux architecture enables these subsystems to be extensible through a data abstraction technique-each specific hardware device driver is implemented as a separate module that supports the unified interface provided by the kernel. In this way, individual developers need to do minimal interaction with other kernel developers to add new device drivers to the Linux kernel.
Scalability Two : The Linux kernel supports many different architectures. In each subsystem, the architecture-related code is split to form a separate module. In this way, some manufacturers launch their own chips, their kernel development team only need to re-implement the kernel of the machine-related code, you can say the kernel ported to the new chip to run.

Reference article:

Http://oss.org.cn/ossdocs/linux/kernel/a1/index.html
Http://www.cs.cmu.edu/afs/cs/project/able/www/paper_abstracts/intro_softarch.html
Http://www.cs.cmu.edu/afs/cs/project/able/www/paper_abstracts/intro_softarch.html
Http://www.fceia.unr.edu.ar/ingsoft/monroe00.pdf
Kernel Source: http://lxr.oss.org.cn/

http://kb.cnblogs.com/page/534420/

Understanding of the Linux Conceptual architecture (RPM)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Understanding of the Linux Conceptual architecture (RPM)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Understanding of the Linux Conceptual architecture (RPM)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support