Understand The Linux Kernel Architecture

Source: Internet
Author: User
Keywords linux linux kernel linux kernel architecture
Process management and debugging
In addition to executing code on behalf of the user program, the kernel can also be activated by an asynchronous hardware interrupt and then run in the interrupt context.
The main difference with running in the context of the process is that running in the interrupt context cannot access the user space part of the virtual address space
Minute. Because interrupts may occur randomly, any user process may be active when the interrupt occurs, because the process is basically
The reason for the failure is irrelevant, so the kernel has no right to access the contents of the current user space. When running in an interrupt context, the kernel must be
The situation is more cautious, for example, not going to sleep. Need to pay special attention to these when writing interrupt handler.
Kernel threads can be used for a variety of purposes: from data synchronization between memory and block devices, to helping the scheduler allocate processes on the CPU.
The CPU spends most of the time executing code in user space. When the application program executes a system call, it switches to the core state, and the kernel will complete its request. During this period, the kernel can access the user part of the virtual address space. After the system call is completed, the CPU switches back to the user state. A hardware interrupt will also cause the CPU to switch to the core mode, in which case the kernel cannot access user space.

In most cases, a single virtual address space is larger than the physical memory available in the system. Each process has its own virtual
When address space, the situation will not improve. Therefore, the kernel and CPU must consider how to map the actual available physical memory to the area of the virtual address space. The preferred method is to use page tables to assign virtual addresses to physical addresses. The virtual address is related to the user space and kernel space of the process, while the physical address is used to address the actual memory available. The virtual address space is divided into many parts of equal length by the kernel. These partial pages. Physical memory pages are often called page frames. In contrast, pages specifically refer to pages in the virtual address space. The data structure used to map the virtual address space to the physical address space is called a page table. Because most areas of the virtual address space are not used and are not associated with page frames, then a model with the same function but much less memory usage can be used: multi-level paging. The first part of the virtual address is called the Page Global Directory (PGD). PGD is used to index an array in the process (each process has one and only one), the array is the so-called global page directory or PGD. The array items of PGD point to the starting addresses of other arrays, which are called Page Middle Directory (PMD). The second part of the virtual address is called PMD. After finding the corresponding PMD through the array item in PGD, PMD is used to index PMD. The PMD array item is also a pointer, pointing to the next level of array, called the page table or page directory. The third part of the virtual address is called PTE (Page Table Entry, page table array), used as the index of the page table. The mapping between the virtual memory page and the page frame is completed, because the array entry of the page table points to the page frame. The last part of the virtual address is called the offset. It specifies a byte position inside. After all, every address refers to
A byte uniquely defined in the address space. The CPU tries to speed up the process in the following two ways.
(1) There is a dedicated part in the CPU called MMU (Memory Management Unit), which is excellent
Improved memory access operations.
(2) The addresses that appear most frequently in the address translation are saved in the address translation backup buffer (Translation Lookaside
Buffer, TLB) in the CPU cache. The address data can be obtained directly from the cache without accessing the page table in the memory, so
Greatly speed up the address translation.

Memory map
Memory mapping is an important abstraction method. Used extensively in the kernel, it can also be used in user applications. The mapping method can transfer data from any source to the virtual address space of the process. The address space area that is the target of the mapping can be accessed in the usual way like ordinary memory. But any modification will be automatically transferred to the original data source. In this way, the same function can be used to deal with completely different target objects. For example, the contents of a file can be mapped into memory. Processing only needs to read the corresponding memory to access the file content, or write data to the memory to modify the file content. The kernel will ensure that any changes will be automatically synchronized to the file. The kernel directly uses memory mapping when implementing device drivers. The input/output of the peripheral can be mapped to an area of the virtual address space. Reading and writing to the relevant memory area will be redirected to the device by the system, which greatly simplifies the implementation of the driver.
Memory mapping, in short, is to map a memory area of user space to kernel space. After the mapping is successful, the user's modification of this memory area can be directly reflected to the kernel space. Similarly, the modification of this area in kernel space is also Directly reflect user space. So the efficiency is very high for kernel space <----> user space and a large amount of data transmission and other operations are required.
When the kernel allocates memory, it must record the allocated or idle state of the page frame to prevent two processes from using the same memory area. Because memory is allocated and released very frequently, the kernel must also ensure that related operations are completed as soon as possible. In the kernel, it is often required to allocate consecutive pages. To quickly detect contiguous areas in memory, the kernel uses an ancient and proven technology: the buddy system. Free memory blocks in the system are always grouped in pairs, and the two memory blocks in each group are called partners. The allocation of partners can be independent of each other. But if both partners are free, the kernel will merge them into a larger memory block, as the partner of a memory block on the next level, the kernel will treat all partners of the same size (1, 2, 4, 8, 16 or other number of pages) are placed in the same list for management. When the application releases the memory, the kernel can directly check the address to determine whether a group of partners can be created and merged into a larger memory block and put back into the partner list. This is just the reverse process of the memory block split. This increases the likelihood that larger memory blocks will be available. A memory management problem called fragmentation can occur.
Slab cache: allocate memory, and implement a general cache for frequently used small objects-slab cache. (1) For frequently used objects, the kernel defines a cache that only contains object instances of the required type. Every time an object is needed, it can be quickly allocated from the corresponding cache (released to the cache after use). The slab cache automatically maintains the interaction with the partner system, and requests new page frames when the cache is exhausted. (2) For the allocation of small memory blocks under normal circumstances, the kernel defines a set of slab caches for objects of different sizes. These caches can be accessed with the same function as in user space programming. The difference is that these functions are prefixed with k, indicating that they are related to the kernel
Linked: kmalloc and kfree.
Page fault exception mechanism, this switching operation is transparent to the application. The page that is swapped out can be identified by a special page table entry. When a process attempts to access such a page frame, the CPU initiates a page fault exception that can be intercepted by the kernel. At this time, the kernel can switch the data on the hard disk to the memory. The user process can then resume operation. Since the process cannot perceive the page fault exception, the page swapping in and out is completely invisible to the process.
Page recycling is used to synchronize the modified content of the memory map with the underlying block device, which is sometimes referred to as data write-back. After the data is flushed, the kernel can use the page frame for other purposes (similar to page swapping). The data structure of the kernel contains all the information related to this. When the data is needed again, the corresponding data can be found from the hard disk and loaded according to the related information.
jiffies( ['dʒɪfi]) records how many ticks have passed since the system was started. How long a tick represents is defined in the CONFIG_HZ of the kernel. For example, CONFIG_HZ=200, then one jiffies corresponds to 5ms time. So the timer accuracy of the kernel based on jiffies is also 5ms. The timing period can be changed dynamically. Without or without frequent periodic operations, it is meaningless to periodically generate a timer interrupt, which prevents the processor from reducing power consumption and entering the sleep state. Dynamically changing the timing period is useful for systems with limited power supplies, such as notebook computers and embedded systems. Linux supports different file systems, and VFS is a layer on the file system that shields the differences of each file system.
The kernel uses typedef to define various data types to avoid relying on architecture-related features. For example, the bit lengths of standard data types on each processor may not be the same. The defined type names such as sector_t (used to specify the sector number on the block device), pid_t (representing the process ID), etc., are defined by the kernel in the architecture-specific code to ensure that the value of the relevant type falls Within the appropriate range. In some cases, the kernel must use variables with precisely defined digits, for example, when it needs to store data structures to the hard disk. To allow data to be exchanged between various systems (for example, USB sticks), no matter how the data is represented inside the computer, the same external format must always be used. For this reason, the kernel defines several integer data types, which not only clearly indicate whether it is a signed number or an unsigned number, but also specify the exact number of bits of the related type. For example, __s8 and __u8.
A special item that ordinary user space programming does not involve is the so-called per-cpu variable. They are declared by DEFINE_PER_CPU(name, type), where name is the variable name and type is its data type (for example, int[3], structhash, etc.). On a single-processor system, this is no different from regular variable declarations. On SMP systems with several CPUs, an instance of the variable is created for each CPU. The instance for a particular CPU can be obtained by get_cpu(name, cpu), where smp_processor_id() can return the ID of the currently active processor, which is used as the aforementioned cpu parameter. Using per-cpu variables has the following advantages: the required data is likely to exist in the processor's cache, so it can be accessed more quickly. If variables that may be accessed by all CPUs at the same time are used in a multi-processor system, it may cause some communication problems, and the above concepts just bypass these problems.
Many pointers in the source code are marked as __user, which is unknown to user space programming. The kernel uses this token to identify pointers to areas in the user's address space, and the areas pointed to by these pointers cannot be easily accessed without further precautions. This is because the memory is mapped to the user space part of the virtual address space through the page table, rather than directly mapped by the physical memory. Therefore, the kernel needs to ensure that the page frame pointed to by the pointer does exist in physical memory. Through explicit marking, you can support the use of automatic checking tools (sparse) to confirm that the necessary conditions are actually complied with.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.