Linux0.11 Kernel Analysis-kernel architecture and linux0.11 Architecture

Source: Internet
Author: User
Tags exit in

Linux0.11 Kernel Analysis-kernel architecture and linux0.11 Architecture

A complete and available operating system consists of four parts: hardware, operating system kernel, operating system services, and user applications, as shown in:

User applications refer to the word processing programs, Internet browser programs, or various applications compiled by the user;

Operating system service programs refer to the programs that provide services to users as part of the functions of the operating system.

In Linux, these programs include X Window System, shell command interpretation system, kernel programming interfaces, and other system programs. operating system kernel programs are part of this book's interest, it is mainly used for abstraction and access scheduling of hardware resources.

The Linux kernel is mainly used to interact with computer hardware, implement programming control and interface operations on hardware components, and schedule access to hardware resources, it also provides an advanced execution environment and virtual interfaces to hardware for user programs on the computer. In this article, we first describe the basic architecture and main components of the Linux Kernel Based on the kernel source code of Linux 0.11. Then, several important data structures in the source code are described. Finally, this article describes how to build the Linux 0.11 Kernel Compiling experiment environment.

URL: http://www.cnblogs.com/archimedes/p/linux011-ubunture.html.

1. Linux Kernel Mode

At present, the operating system kernel structure mode can be divided into an integral single kernel mode and a hierarchical micro kernel mode. The Linux 0.11 kernel adopts the single kernel mode.

The main advantage of Single-core mode is that the kernel code structure is compact and the execution speed is fast. The main disadvantage is that the hierarchical structure is not strong.
In a single-kernel system, the process provided by the operating system is: The application master program uses the specified parameter value to execute the System Call Command (int x80 ), switch the CPU from the User Mode to the Kernel Model, and then the operating system calls the specific system to call the service program according to the specific parameter values, these service programs support some underlying functions as needed to complete specific functions. After completing the services required by the application, the operating system switches back from the core State to the user State and returns to the application to continue executing the subsequent commands.

Therefore, a single kernel mode kernel can be roughly divided into three layers: the main program layer for calling the service, the service layer for executing system calls, and the underlying functions supporting system calls. As shown in:
Simple Structure Model in Single Kernel Mode

2. Linux Kernel System Architecture

The Linux kernel consists of five modules: process scheduling module, memory management module, file system module, inter-process communication module, and network interface module.

The process scheduling module is used to control the use of CPU resources by processes. The scheduling strategy adopted is that each process can access the CPU fairly and reasonably, while ensuring that the internal nuclear energy can perform hardware operations in a timely manner.

The memory management module is used to ensure that all processes can securely share the master memory zone of the machine. The memory management module also supports Virtual Memory Management, this allows Linux to support processes with a larger memory capacity than the actual memory space. The file system can also be used to swap memory data blocks that are temporarily not used to external storage devices, and then swap them back when necessary.

The file system module is used to drive and store external devices. The Virtual File System Module provides a common file interface to all external storage devices, hiding different details of various hardware devices. This provides and supports multiple file system formats compatible with other operating systems.

The inter-process communication module subsystem is used to support information exchange among multiple processes.

The network interface module provides access to a variety of network communication standards and supports many network hardware.
The dependency between these modules is shown in the figure below. The line represents the dependency between them, and the dotted line and dotted box represent the unimplemented part of Linux 0.11. (The virtual file system is gradually implemented in Linux 0.95, network interfaces are only available in version 0.96 ).
Linux Kernel system module structure and dependency:

If we start from the single kernel mode structure model, we can also draw the main kernel modules into the following diagram structure based on the linux 0.11 kernel source code structure:

3. Linux kernel Process Control

For the linux 0.11 kernel, the system can have up to 64 processes simultaneously. Except for the first process being created manually, all the other processes are new processes created by using the system call fork. The kernel program uses the process ID (pid) to identify each process. A process consists of executable command code, data, and stack. The code and data sections in the process correspond to the code segments and data segments in an execution file respectively. Each process can only execute its own code and access its own data and stack zone. Processes communicate with each other through system calls. For a system with only one CPU, only one process is running at a time. The kernel uses the scheduler to schedule various processes in a time-based manner.

In Linux, a process can be executed in the kernel mode or user mode. Therefore, the linux kernel stack is separated from the user stack. The user stack is used for a process to temporarily save data such as parameters and local variables of the called function in the user State. The kernel stack contains information about the Function invocation of the kernel program. The kernel program manages processes through the process table, and each process occupies one item in the process table. In linux, a table entry is a task structure.
When a process is executed, the values in all the registers of the CPU, the status of the process, and the content in the stack are called the context of the process. When the kernel needs to switch to another process, it needs to save all the statuses of the current process, that is, to save the context of the current process, so that when the process is re-executed, it can be restored to the status when switching. In the event of interruption, the kernel executes the interrupt service routine in the kernel state in the context of the interrupted process. However, all required resources are retained so that the interrupted process can be resumed when the service is terminated.
A process can be in a group of different States within its lifetime, known as the process state. See:

When a process is being executed by a CPU, it is called running ). When a process is waiting for system resources, it is in sleep waiting state. In linux, there are two types of waiting states, namely, interruptible and non-disruptive. When system resources are available, the process is awakened and enters the ready state. This state is called the ready state. When the process has stopped running, but its parent process has not asked about its status, it is said that the process is in a dead state. When a process is terminated, it is in the stopped state. The kernel will switch the process only when the process is transferred from the "kernel running state" to "sleep state. Processes running in the kernel state cannot be preemptible by other processes, and one process cannot change the status of another process. To avoid kernel data errors during process switching, the kernel will disable all interruptions when executing the critical code.

4. Usage of memory in Linux Kernel

In the linux 0.11 kernel, in order to effectively use the system's physical memory, the memory is divided into several functional areas, as shown in:

Among them, the linux kernel program occupies the starting part of the physical memory, followed by the high-speed buffer for hard disks, floppy disks, and other Block devices. When a process needs to read data from a block device, the system first reads the data to the high-speed buffer. When data needs to be written to the block device, the system first places the data in the high-speed buffer, and then the block Device Driver writes the data to the device. The last part is the main memory area that can be applied by all programs at any time. When using the main memory area, the kernel program also needs to apply to the memory management module of the kernel before using it. For a system that contains a RAM virtual disk, the header of the primary memory area must be removed to store data in a total of Virtual Disks.
The actual physical memory capacity contained in the computer system is limited. To effectively use these physical memory, Linux uses the Intel CPU memory paging management mechanism, the ing between virtual linear addresses and actual physical memory addresses allows all concurrently executed programs to use limited memory.

The basic principle of memory paging management is to divide the entire main memory area into 4096 bytes as a page of memory pages. When the program applies for memory usage, it will be allocated on the Memory Page. When using this memory paging management method, each running process (task) can use a linear address space much larger than the actual memory capacity. For an Intel 80386 system, its CPU can provide a linear address space of up to 4 GB. For the linux 0.11 kernel, the system sets the maximum number of segment descriptor items in the GDT Global Descriptor Table to 256. Two items are idle and two items are used by the system. Each process uses two items. Therefore, the system can accommodate a maximum of 256 tasks (127-4)/2 + 1 = 256 tasks, and the virtual address range is (-4)/2) * 64 MB is approximately 8 GB. However, in the 0.11 kernel, the maximum number of tasks manually defined is NR_TASKS = 64, and the virtual address (or linear address) of each process ranges from 64 MB, the starting position of the virtual address of each process is (Task Number-1) * 64 MB. Therefore, the virtual address range is 64 MB * 64 = 4G, as shown in. 4G is exactly the same as the linear address space range of the CPU or physical address space range, so it is easy to confuse the three address concepts in the 0.11 kernel.

In linux 0.11, when performing address ing, we need to distinguish the transformation between the three addresses:. process virtual address, which is counted from virtual address 0. The maximum value is 64 mb. B. CPU linear address space (0--4G); c. the actual physical memory address. The virtual address of a process needs to be first transformed to the address in the CPU's entire linear address space through its local segment descriptor, and then the page Directory table PDT (level-1 page table) and page table PT (level-2 page table) maps to the actual physical address page. Therefore, the two transformations cannot be confused. To use the actual physical memory, the linear addresses of each process are dynamically mapped to different memory pages in the main memory area through the second-level memory page table. Therefore, the maximum available virtual memory space for each process is 64 MB. The logical address of each process can be converted to a linear address by adding the Task Number * 64 M. However, in comments, we generally refer to the addresses in processes as linear addresses.

5. directory structure of Linux kernel source code

Because the Linux kernel is a single-kernel system, almost all programs in the kernel are closely linked, and their dependencies and calls are very close. Therefore, when reading a source code file, you often need to refer to other related files. Therefore, it is necessary to familiarize yourself with the directory structure and arrangement of the source code file before you start to read the kernel source code.

Here we first list the complete source code directory of the Linux kernel, including the subdirectories. Then we will introduce the main functions of the programs contained in each directory one by one, so that the entire kernel source code arrangement form can establish a general framework in our mind, so as to facilitate the subsequent source code reading work. When we unbind the linux-0.11.tar.gz using the tar command, the kernel source code file is placed in the linux directory. The directory structure is as follows:

The source code directory of this kernel version contains 14 sub-directories, including a total of 102 code files. The following describes the content in these subdirectories one by one.
1. kernel main directory linux
The linux directory is the main directory of the source code. In addition to including all 14 subdirectories, this directory also contains a unique makefile file. This file is the parameter configuration file used to compile the make tool. The main purpose of the make tool software is to identify which files have been modified, so as to automatically determine which files need to be re-compiled in a program system that contains multiple source program files. Therefore, make tool software is the management software of Program projects. The makefile file in the linux directory is also nested to call the makefile files contained in all subdirectories. In this way, when any files in the linux directory (including subdirectories) are modified, make will recompile it. Therefore, to compile all the source code files of the entire kernel, you only need to run the make software once in the linux directory.

2. boot program directory boot
The boot directory contains three assembly language files, which are the first compiled programs in the kernel source code file. The main function of these three programs is to boot the kernel when the computer powers up, load the kernel code into the memory, and perform system initialization before the 32-bit protection running mode. Bootsect. s and setup. the s program needs to be compiled using the as86 software, using the as86 assembly language format (similar to Microsoft's), while the head. s needs to be compiled using GNU as, which is an assembly language in AT&T format. These two languages will be briefly introduced in the Code comments in the next chapter and the descriptions after the code list. The bootsect. s program is the disk boot block program. After compilation, it will reside in the first sector of the disk (Boot Sector, 0 track (cylindrical), 0 head, 1st sectors ). After you install rom bios self-check on the PC, the BIOS will be loaded to the memory 0x7C00 for execution. The setup. s program is mainly used to read the hardware configuration parameters of the machine and move the kernel module system to the appropriate memory location. The head. s program is compiled and connected to the first part of the system module. It is mainly used for hardware device detection settings and initial settings on the memory management page.

3. File System Directory fs
Is the directory of the file system implementation program, which contains 17 C-language programs. The main reference relationships between these programs are shown in the figure. Each box in the figure represents a file and is placed based on the reference relationship from top to bottom. The file names are omitted with the suffix. c. The program file in the dotted box does not belong to the file system. The lines with arrows indicate the reference relationship, and the rough lines indicate the reference relationship.


As shown in the figure, the program in this directory can be divided into four parts: high-speed buffer management, low-level file operations, file data access, and file high-level functions. For file systems, we can regard it as an extension of the memory high-speed buffer. All access to data in the file system must first be read to the high-speed buffer zone. Programs in this directory are mainly used to manage the allocation of buffer blocks in the high-speed buffer zone and the file system on the block device.
4. Main directory of the header file include
The header file directory contains a total of 32. h header files. There are 13 main directories, 4 sub-directories in asm, 10 sub-directories in linux, and 5 sub-directories in sys:

<A. out. h> // header file a. out, which defines the format of the. out execution file and some macros. <Const. h> // constant symbol header file. Currently, only the flag spaces of the I _mode field in the I node are defined. <Ctype. h> // character type header file. Defines some macros for character type judgment and conversion. <Errno. h> // header file with the error code. Contains various error numbers in the system. (Linus introduced from minix ). <Fcntl. h> // file control header file. The operation control constant symbol used for the file and Its descriptor. <Signal. h> // signal header file. Define the signal symbol constant, the signal structure, and the signal operation function prototype. <Stdarg. h> // standard parameter header file. Defines the list of variable parameters in the form of macros. It mainly describes the-type (va_list) and three macros (va_start, va_arg and va_end), used for the vsprintf, vprintf, vfprintf function <stddef. h> // standard definition header file. Defines NULL, offsetof (TYPE, MEMBER ). <String. h> // string header file. Mainly defines some embedded functions related to string operations. <Termios. h> // header file of the input and output functions of the terminal. It mainly defines terminal interfaces for Controlling Asynchronous Communication Ports. <Time. h> // time header file. The tm structure and time-related function prototype are mainly defined. <Unistd. h> // Linux Standard header file. Various symbol constants and types are defined, and various functions are affirmed. For example, if _ LIBRARY __is defined, the system call number and Embedded Assembly _ syscall0 () are also included. <Utime. h> // user time header file. Defines the access and modification time structure and utime () Prototype

Subdirectory of the system structure header fileInclude/asm
These header files mainly define some data structures, macro functions, and variables closely related to the CPU architecture. 4 files in total.

<Asm/io. h> // header file. Define functions for io port operations in the form of macro Embedded Assembler. <Asm/memory. h> // copy the header file in the memory. Embedded Assembly macro functions with memcpy. <Asm/segment. h> // segment operation header file. Defines Embedded Assembler functions related to segment register operations. <Asm/system. h> // system header file. Defines Embedded Assembly macros for setting or modifying descriptors/interrupt doors.

Subdirectory of special Linux kernel header fileInclude/linux

<Linux/config. h> // Kernel configuration header file. Defines the keyboard language and hard disk type (HD_TYPE) options. <Linux/fdreg. h> // soft drive header file. Some definitions that contain floppy disk controller parameters. <Linux/fs. h> // File System header file. Define the file table structure (file, buffer_head, m_inode, etc ). <Linux/hdreg. h> // hard disk parameter header file. Defines access to hard disk register ports, status codes, partition tables, and other information. <Linux/head. h> // head header file, which defines the simple structure of the segment descriptor and several selector constants. <Linux/kernel. h> // kernel header file. Contains some original definitions of common kernel functions. <Linux/mm. h> // memory management header file. Contains page size definitions and some page release function prototypes. <Linux/sched. h> // The scheduler header file defines the task structure task_struct, the data of the initial task 0, and some macro statements related to descriptor parameter settings and acquisition of Embedded Assembler functions. <Linux/sys. h> // System Call header file. Contains 72 C Function handlers called by the system, starting with 'sys. <Linux/tty. h> // tty header file defines parameters and constants related to tty_io and serial communication.

System-specific data structure subdirectoryInclude/sys

<Sys/stat. h> // File status header file. Contains the file or file system status structure stat {} and constants. <Sys/times. h> // defines the running time structure of the process. tms and the prototype of the times () function. <Sys/types. h> // Type header file. Defines basic system data types. <Sys/utsname. h> // system name structure header file. <Sys/wait. h> // wait for the header file to be called. Define the system call wait () Core waitpid () and the related constant symbol.

5. kernel initialization program directory init
This directory contains only one file main. c. It is used to execute all kernel initialization work, and then move to user mode to create a new process, and run the shell program on the console device. The program first allocates the buffer memory capacity based on the machine memory. If a virtual disk is set to be used, it also leaves space behind the buffer memory. Then Initialize all the hardware, including manually creating the first task (task 0) and setting the interrupt permit flag. After the execution is moved from the core State to the user State, the system calls the fork () process function for the first time to create a process for running init (). In this sub-process, the system sets the Console environment and generates a sub-process to run the shell program.

6. kernel Program Main directory kernel
The linux/kernel directory contains 12 code files and a Makefile file. There are also 3 subdirectories. The Calling relationships between codes in these files are complex. Therefore, reference relationships between files are not listed in detail here, but they can still be roughly classified ,:

Asm. s // program is used to handle the interruption caused by system hardware exceptions. The actual processing program for each hardware exception is in traps. in file c, traps will be called during each interrupt process. the corresponding c language processing function exit in C. c // The program mainly includes system calls used to handle Process Termination. Including process release, Session (Process Group) Termination, program exit processing functions, killing processes, terminating processes, suspending processes, and other system call functions. Fork. c // The Program provides the sys_fork () system call using two C-language functions: find_empty_process () and copy_process (). The mktime. c // program contains the time function mktime () used by the kernel, which is used to calculate the number of seconds from on January 1, January 1, 1970 to the current day of the boot. It is called only once in init/main. c. Panic. // program contains a function panic () that displays kernel error information and stops (). Printk. c // The program contains a kernel-specific information display function printk (). Sched. c // contains basic scheduling functions (sleep_on, wakeup, schedule, etc.) and some simple system call functions. There are also several floppy disk operation functions related to timing. Signal. c // contains four system calls for signal processing and a function do_signal () for signal Processing in the corresponding interrupt processing program (). Sys. c // The program includes many System Call functions, some of which have not yet been implemented. System_call.s // The program implements the linux System Call (int 0x80) interface processing process, the actual processing process is included in each system call the corresponding C language processing function, these processing functions are distributed throughout the linux kernel code. c // The program implements the string formatting function that is now included in the standard library function.

Sub-directory of the block Device DriverKernel/blk_dev
Generally, the user accesses the device through the file system. Therefore, the device driver provides a call interface for the file system. When using block devices, because of their high data throughput, a high-speed buffer mechanism is used between user processes and Block devices to efficiently use data on Block devices. When accessing data on Block devices, the system first reads data from Block devices into the high-speed buffer zone in the form of data blocks, and then provides the data to users. The blk_dev sub-directory contains four c files and one header file. The header file blk. h is dedicated to block device programs, so it is put together with the C file. The general relationship between these files ,:

Blk. h // defines the block device structure and data block request structure commonly used by three C Programs. Hd. c // The program mainly implements the underlying driver function for reading/writing hard disk data blocks, mainly the do_hd _ request () function; floppy. c // The program mainly implements the read/write driver function for the data block on the floppy disk, mainly the do_fd_request () function. The program in ll_rw_blk.c // implements the Low-layer block device data read/write function ll_rw_block (). All other programs in the kernel use this function to read and write data on the block device.

You will see that this function is called in many places to access block device data, especially in high-speed buffer processing file fs/buffer. c.

Character Device Driver subdirectoryKernel/chr_dev
The character device program subdirectory contains four C-language programs and two assembly program files. These files provide drivers for the serial port RS-232, serial terminal, keyboard, and console terminal devices. (Figure 2.12) shows the approximate call hierarchy between these files:

Tty_io.c // The program contains the tty character device read function tty_read () and write function tty_write (), which provides the upper-layer access interface // port for the file system. In addition, it includes the C function do_tty_interrupt () called during the serial interrupt processing. This function will be called during the processing of the interrupt type as read characters. The console. c // file contains the console initialization program and the console write function con_write (), which is called by the tty device. It also contains // The initialization setup program con_init () for the display and keyboard interrupt (). Rs_io.s // assembler is used to implement the interrupt handler for two serial interfaces. The interrupt processing program processes each of the four interrupt types obtained from the interrupt mark register (end // port 0x3fa or 0x2fa, call do_tty_interrupt () in the code that processes the interrupt type as read characters (). Serial. c // initialize the UART of the asynchronous serial communication chip and set the interrupt vectors of the two communication ports. // It also includes the rs_write () function that tty uses to output to the serial port. Tty_ioctl.c // The program implements the tty io control interface function tty_ioctl () and the read/write function for the io structure of the termio (s) terminal, and CALLS sys_ioctl () in the Implementation System () fs/ioctl. c program is called. Keyboard. S // The program mainly implements keyboard_interrupt during keyboard interrupt processing.

Sub-directories of coprocessor simulation and operating proceduresKernel/math
There is currently only one C program math_emulate.c in this subdirectory. The math_emulate () function is the C function that interrupts the call of the int7 interrupt handler. This interruption occurs when the machine does not have a mathematical coprocessor, but the CPU executes the coprocessor command. Therefore, you can use the software to simulate the features of the coprocessor. The kernel version discussed in this book does not contain any simulation code about the coprocessor. This program only prints an error message and sends a coprocessor error signal SIGFPE to the user program.
7. kernel library function directory lib
Kernel library functions are mainly used for User Programming calls and are one of the interface functions for compiling the system standard library. There are a total of 12 C language files, except for a long malloc. c program compiled by tytso, other programs are very short, and some have only one or two lines of code.
8. Memory Management Program directory mm
This directory contains two code files. It is mainly used to manage the use of the main memory area by the program. It maps the Logical Address of the process to the linear address and the linear address to the physical memory address in the main memory area. Through the paging Management Mechanism of the memory, A correspondence relationship is established between the virtual memory page of the process and the physical memory page of the main memory area.
The page. s file contains the Memory page exception interrupt (int 14) handler, which is mainly used to protect pages caused by page exception interruptions caused by page defects and access to illegal addresses.
The memory. c program includes the mem_init () function that initializes the memory, and the do_no_page () and do_wp_page () functions called by the page. s memory to handle the interrupt process. When a new process is created and a replication process is executed, the memory processing function in the file is used to allocate the management memory space.
9. Compile the kernel tool program directory tools
The build. c program under this directory is used to merge the target code connections generated by the compilation of Linux directories into a runable kernel image file image. The specific functions are described in detail later.

6,Relationship between kernel systems and user programs

In Linux, the kernel provides two interfaces for applications. One is the system call interface, that is, the interrupt call int 0x80; the other is to communicate with the kernel through the kernel library function. The kernel library function is an integral part of the basic C function library libc. Many system calls are implemented as part of the basic C language function library. System calls are mainly provided to System Software for direct use or implementation of library functions. Generally, programs developed by users access kernel resources by calling functions in libraries such as libc. By calling the programs in these libraries, application code can complete a variety of common work, such, open and Close access to files or devices, perform scientific calculations, handle errors, and access group and user ID. System invocation is the highest level of kernel and external interfaces. In the kernel, each system call has a serial number (defined in the include/linux/unistd. h header file), which is often implemented in the form of macros.

References

Linux Kernel full annotation (Zhao Yu)


Zhao Yi's "Linux kernel complete notes" and "Linux kernel complete analysis", "Linux kernel design art", Linux kernel design and implementation

The latter is equivalent to an upgraded version of the former. The former introduces version 0.11,
But there is actually no difference. If you are developing, it is recommended to read "Understanding the Linux kernel in depth".
More help for you ~~

Which of the following books about Linux Kernel Analysis is better ??

The best is Understanding the Linux Kernel 3rd. the typical functions in each knowledge field in the Kernel are analyzed in detail, including: 1. elaborate on the functions. 2. simplify the kernel source code into code that beginners can understand. With such analysis, it is time and patience to understand the Linux kernel.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.