Source :《In-depth analysis of Linux kernel source code"
Http://oss.org.cn/kernel-book/ch06/6.6.1.htm
6.6.1 basic principle of exchange
As mentioned above, each process can use a large amount of space (3 GB), but the actual space used is not large, generally not more than a few MB, in most cases, there are only dozens of K or hundreds of K. However, when the number of processes in the system reaches hundreds or even thousands, the total demand for storage space is very large. In this case, the general physical memory volume is difficult to meet the requirements. Therefore, in the development history of computer technology, we have a technology that exchanges memory content with a dedicated disk space. In Linux, we call the disk space used for exchangeSwap file or swap area.
Exchange technology has been used for many years. The first UNIX system kernel monitors the number of idle memory. When the number of idle memory is less than a fixed limit value, the swap-out operation is performed. The swap-out operation includes copying the entire address space of the process to the disk. On the contrary, when the scheduling algorithm selects a process to run, the whole process is exchanged from the disk.
The modern Unix (including Linux) kernel has abandoned this method, mainly because the context switching cost is quite high when it is switched in and out. In Linux, the switching unit is page rather than process. Although the unit of the exchange is the page, the exchange still has to pay a certain price, especially the price of time. In fact, in the operating system, time and space are a pair of contradictions, and they often need to be balanced. Sometimes they need to change time with space, and sometimes time for space, page switching is a typical change of time for space. It should be noted that page switching is a last resort. For example, in a real-time system with urgent time requirements, it is inappropriate to adopt the page switching mechanism, because of this, the execution of the program has a large degree of uncertainty in time. Therefore, Linux provides users with an option to enable or disable the switching mechanism through commands or system calls.
In page switching, the page replacement algorithm is a key indicator that affects the switching performance. Its complexity is mainly related to switching out. Specifically, four main issues must be considered:
- · Which page should be swapped out?
- · How to store pages in the SWAp Area
- · How to select the page to be swapped out
- · When to perform the page swap operation
Note that the page or page we mentioned here refers to the data stored in the page. Therefore, the so-called page exchange actually refers to the data exchange in the page.
1. Which page is swapped out?
In fact, the ultimate goal of the exchange is to recycle the page. Not all pages in the memory can be exchanged. In fact, only physical pages that have a ing relationship with the user space will be replaced, while the pages occupied by the kernel in the kernel space will be resident in the memory. We will discuss the pages in the user space and the pages in the kernel space for further classification.
You can divide a page in a user space into the following types based on its content and nature:
(1) The page occupied by the process image, including the code segment, data segment, stack segment, and dynamically allocated "storage heap" (see Figure 6.13 ).
(2) map the file content to the user space by calling MMAP ().
(3) inter-process shared memory Zone
In 1st cases, the Memory Page occupied by the Data Segment of the Process Code segment can be swapped in and out, but the page occupied by the stack is generally not swapped out, because this can simplify the kernel design.
In 2nd cases, the swap areas used by these pages are the mapped Files themselves.
In 3rd cases, the page switching is complicated.
In contrast, the pages mapped to the kernel space are not swapped out. Specifically, the memory pages occupied by kernel code and global volume in the kernel are neither allocated (loaded at startup) nor released, and the space is static. (In contrast, both the code segment and global volume of processes are in the user space, and the memory pages occupied are dynamic. They must be allocated before use and will be released at last, in the middle, it may be replaced and recycled for further allocation)
In addition, the pages used by the kernel during execution must be dynamically allocated, but are always in memory. Such pages can be divided into two types based on their content and nature:
(1) The page allocated by the kernel calling kmalloc () or vmalloc () for the temporary data structure used in the kernel is immediately released. However, because a page contains multiple data structures of the same type, the page is released only when the whole page is idle.
(2) calls alloc_pages () in the kernel to allocate pages for temporary use and management purposes, for example, the two pages occupied by the kernel stack of each process, the pages used to copy parameters from the kernel space, and so on. These pages are also useless once used, so they are released immediately.
There is also a page in the kernel. Although it is used up, its content is still valuable, so it is not released immediately. This type of page is "released" and then enters an LRU queue. After a period of buffering, it is "aging ". If the content needs to be used again during this period, it will be put into use again. Otherwise, it will continue to ageing until the condition is no longer allowed. The kernel pages for this purpose include the following:
· Dentry space in the file system to buffer and store some file directory structures
· Inode storage space for index nodes in the file system
· Buffer Used for file system read/write operations
2. How to store pages in the SWAp Area
We know that the physical memory is divided into several pages, and the size of each page is 4 kb. In fact, the swap area is also divided into blocks. The size of each block is exactly the same as one page.Page slot(Page slot) means to insert a physical page into a slot. During the swap-out process, the kernel tries its best to place the swap-out page in the adjacent slot to reduce the disk seek time when accessing the swap area. This is the material basis of an efficient page replacement algorithm.
If the system uses multiple swap areas, the process becomes more complex. A Fast swap area (that is, a swap area stored in a fast disk) has a higher priority. When searching for an idle slot, you need to start searching from the swap area with the highest priority. If there are more than one swap area with the highest priority, You Should cyclically select a swap area with the same priority to avoid overload. If no idle slot is found in the SWAp area with the highest priority, search continues in the SWAp area with the highest priority, and so on.
3. How to select the page to be swapped out
Page switching is very complicated. One of its main contents is how to select the page to be swapped out. We will discuss the choice of page switching policies in a step-by-step manner.
Policy 1, which is exchanged only when necessary. When a page missing exception occurs, it is assigned a physical page. If there are no idle pages available for allocation, try to swap one or more memory pages out of the disk to free up some memory pages. This exchange strategy is indeed simple, but there is an obvious drawback. This is a passive exchange strategy that only needs to be exchanged when necessary, and the system is bound to spend a considerable amount of time on switching in and out.
Policy 2: swap when the system is idle. Compared with policy 1, this is a positive exchange policy, that is, when the system is idle, some pages are swapped out in advance to free up some memory pages, so as to maintain a certain amount of free pages in the memory, so that there is always a free page available for use in the case of page shortage interruption. Generally, the LRU (least recently used) algorithm is used for switching out pages. However, it is difficult to implement such a policy because there is no way to accurately predict access to the page, that is, a page that has not been accessed for a long time has just been replaced, but it has to be accessed again, so it is replaced. In the worst case, it is possible that the processing capability of the entire system is affected by such a swap-in/swap-out, but it cannot be effectively calculated or operated at all. This phenomenon is called page jitter ".
Policy 3: switch out but not release immediately. When the system selects several pages for swap-out, the corresponding pages are written to the disk swap area, and the content of the page items in the corresponding page table is modified (The position of the present flag is 0 ), but it does not release immediately. Instead, it leaves its page structure in a buffer queue and changes it from active to inactive. The final release of these pages must be postponed until necessary. In this way, if a page is immediately accessed after it is released, you can find the corresponding page in the Buffer Queue of the physical page and create a ing for it again. Since the page has not been released and the original content is retained, you do not need to read the disk. After a period of time, an inactive Memory Page has not been accessed, so this page needs to be truly released.
Policy 4: postpone page switching until it cannot be postponed. In fact, Policy 3 is worth improving. When switching out a page, you do not have to write its content to the disk. If a page has not been written (such as code) since the last time it was swapped in, the page is "clean" and there is no need to write it to the disk. Second, even if the "dirty" page does not need to be written immediately, you can use policy 3. As for the "clean" page, it can be cached until necessary, because recycling a "clean" Page costs a little.
Next, we will provide a brief description of the physical Page Swap, which involves the Page Structure and free_area structure described above:
· Release page. If a page becomes idle and available, the Page Structure of the page is linked to the free queue free_area of a page management zone, and the count is reduced by 1.
· Allocation page. Call _ alloc_pages () or _ get_free_page () to allocate the memory page from an idle queue and set the count on the page to 1.
· Active status. The allocated page is active. The data structure page of the page is linked to the active page queue active_list through its queue header structure LRU, at least one page in the process address space is mapped to the page.
· Inactive "dirty" status. The Page Structure of a page in this status is linked to inactive_dirty_list of inactive "dirty" Page queues through the LRU structure of Its Queue header. The principle is that the page items of any process no longer point to this page, that is to say, disconnect the page ing and reduce the count of the page by 1.
· Write the content of the inactive "dirty" page to the swap zone, and transfer the inactive_dirty_list of the inactive "dirty" Page queue to the queue of the inactive "clean" page, prepare to be recycled.
· Inactive "clean" status. The page structure uses its queue header structure LRU to link an inactive "clean" Page queue. Each page management area has an inactive_clean_list page queue that is inactive.
· If the page is accessed again within a period of time after being transferred to the inactive status, the page is transferred to the active status and the ing is restored.
· When necessary, collect the page from the "clean" Page queue, that is, link the page to the idle queue, or directly allocate the page.
The above is the basic idea of page feed/feed and recycle, and the actual implementation code is more complicated.
4. When to perform the page swap operation
In order to avoid temporarily searching for memory pages that can be swapped out when the CPU is busy, that is, when a page missing exception occurs, the Linux kernel regularly checks whether the number of idle pages in the system is smaller than the predefined limit. Once the number of idle pages is found to be too small, several pages are replaced in advance, to reduce the burden on the system when page missing exceptions occur. Of course, because you cannot accurately predict the use of pages, even if you do so, there may also be page missing exceptions, and the memory still does not have enough free pages. However, pre-swap can reduce the probability that idle pages are insufficient. In addition, by selecting appropriate parameters (for example, how often a page is swapped out and how many pages are swapped out each time), the temporary search will rarely happen. To this end, the Linux kernel has set a Dedicated Token to regularly switch the page out of the daemon kswapd.