Introduction to the Linux kernel file Cache management mechanism-IBM

Source: Internet
Author: User
Tags sendfile

HTTPS://WWW.IBM.COM/DEVELOPERWORKS/CN/LINUX/L-CACHE/1 Preface

Since its inception, Linux has been continuously improved and popularized, it has become one of the mainstream general operating systems, widely used, and Windows, UNIX, together occupy almost all the market share of the operating system industry. Especially in the field of high-performance computing, Linux has become a dominant operating system, with 301 of the global TOP500 computers deployed in June 2005 as Linux operating systems. Therefore, the research and use of Linux has become an unavoidable problem for developers.

Let's look at the mechanism of file Cache management in the Linux kernel. This article takes the 2.6 series kernel as the benchmark, mainly describes the work principle, the data structure and the algorithm, does not involve the specific code.

Back to top of page

2 operating system and file Cache management

An operating system is the most important system software on a computer that manages a variety of physical resources and provides the application with a variety of abstract interfaces for the use of these physical resources. From an application perspective, the operating system provides a unified virtual machine in which there are no specific details about the various machines, only the logic concepts of processes, files, address space, and interprocess communication. This abstract virtual machine makes application development relatively easy: developers need to interact with the various logical objects in the virtual machine without needing to know the specifics of the various machines. In addition, these abstract logical objects make it easy for the operating system to isolate and protect individual applications.

For data on a storage device, the logical concept that the operating system provides to the application is "file". When an application stores or accesses data, it simply reads or writes a one-dimensional address space of "file", and the correspondence between this address space and the storage block on the storage device is maintained by the operating system.

In the Linux operating system, when the application needs to read the data in the file, the operating system allocates some memory, reads the data from the storage device into the memory, and then distributes the data to the application, and when the data needs to be written to the file, the operating system allocates memory to receive the user data. The data is then written from memory to disk. File Cache management refers to the management of the memory allocated by the operating system and used to store file data. Cache management is measured by two indicators: the cache hit rate, the cache hit when the data can be obtained directly from memory, no longer need to access low-speed peripherals, it can significantly improve performance, the second is the ratio of effective cache, the effective cache refers to the actual access to the cache entries, If the ratio of valid caches is low, a considerable portion of the disk bandwidth is wasted on reading useless caches, and useless caches can indirectly cause system memory to be strained, which can eventually severely affect performance.

The following describes the status and role of file cache management in Linux operating system, the data structure related to file cache in Linux, the pre-reading and substitution of file caches in Linux, and the implementation of file cache related APIs in Linux.

Back to top of page

2 file Cache status and role

The file cache is a copy of the file's data in memory, so the file cache management is related to the memory management system and the file system: On the one hand, the file cache as part of the physical memory, need to participate in the allocation of physical memory collection process, on the other hand, the data in the file cache from the file on the storage device , you need to have read and write interactions with the storage device through the file system. From the operating system's perspective, the file Cache can be seen as a link between the memory management system and the file system. Therefore, the file Cache management is an important part of the operating system, its performance directly affects the performance of the file system and memory management system.

Figure 1 depicts the relationship between the file Cache management and the memory management and the file system in the Linux operating system. As you can see, in Linux, the specific file system, such as EXT2/EXT3, JFS, NTFS, etc., is responsible for exchanging data between the file cache and the storage device, the virtual file system VFS on the specific file system is responsible for the application and file cache through read/ Write interfaces Exchange data, while the memory management system is responsible for the allocation and recycling of file caches, while the virtual memory Management system (VMM) allows data to be exchanged between the application and the file cache via memory map. As can be seen in Linux systems, the file Cache is a nexus between memory management systems, file systems, and applications.

Back to top of page

3 file Cache-related data structures

In the Linux implementation, the file cache is divided into two levels, one page cache, the other buffer cache, each page cache contains a number of buffer cache. The memory management system and the VFS interact with the page cache only, and the memory management system is responsible for maintaining the allocation and recycling of each page cache, and is responsible for establishing the mappings when accessing using memory map, which is responsible for the data exchange between the page cache and the user space. The file system typically interacts only with buffer cache, which is responsible for exchanging data between the perimeter storage device and the buffer cache. The relationship between page Cache, Buffer Cache, file, and disk is shown in 2, and the page structure and the Buffer_head data structure are shown in relation to 3. In the above two figures, it is assumed that the size of the Page is 4K and the disk block size is 1K. This article mainly refers to the management of Page Cache.

In the Linux kernel, each chunk of a file can have a maximum of one Page cache entry, which manages the cache entries by two data structures, one radix tree and the other a doubly linked list. Radix is a search tree, and the Linux kernel uses this data structure to quickly locate the Cache entry through file offsets, and Figure 4 is a Radix tree that has a fork of 4 (22) and a tree height of 4, which is used to quickly locate the 8-bit offset within the file. The fork in the Linux (2.6.7) kernel is 64 (26), the tree height is 6 (64-bit system) or 11 (32-bit system), which is used to quickly locate the 32-bit or 64-bit offsets, and each leaf node in the radix tree points to the cache entry corresponding to the corresponding offset within the file.

Another data structure is a doubly linked list, the Linux kernel maintains active_list and inactive_list two doubly linked lists for each piece of physical memory area (zone), which are used primarily for the collection of physical memory. The two lists include other Anonymous (Anonymous) memory, such as the process stack, in addition to the file cache.

Back to top of page

4 Pre-read and replace of file cache

The specific process for the file read-ahead algorithm in the Linux kernel is this: for the first read request for each file, the system reads the requested page and reads in a few pages immediately following it (not less than one page, usually three pages), which is called synchronous pre-reading. For the second read request, if the Read page is not in the cache, that is not in the previous pre-read group, it indicates that the file access is not sequential access, the system continues to use synchronous pre-read, if the Read page in the cache, the previous read hit, the operating system to expand the pre-read Group One times, And let the underlying filesystem read into the file data block in the group that is not yet in the cache, when read-ahead is called asynchronous read-ahead. The system updates the size of the current read-ahead group regardless of whether the second read request is hit or not. In addition, a window is defined in the system, which includes the previous pre-read group and the pre-read group. Any subsequent read request will be in one of two cases: the first case is that the requested page is in a pre-read window, and the corresponding window and group are resumed asynchronously, and the second case is that the requested page is outside the pre-read window. At this point the system will synchronize pre-read and reset the corresponding window and group. Figure 5 is a Linux kernel read-ahead mechanism, where a is a read operation before the situation, B is the read operation requested the page is not in the case, and C is the read operation requested page in window.

The specific process for file cache replacement in the Linux kernel is as follows: The cache that has just been allocated is put into the inactive_list header and its status is set to active, and when the cache is not enough, the system first reverses the scan from the tail active_ List and the item that is not referenced into the inactive_list's head, and then the system backwards scans the inactive_list, if the item being scanned is in the appropriate state, it is recycled until a sufficient number of cache entries are reclaimed. The algorithm for the cache replacement algorithm 6 describes the pseudo code as shown.

Figure 6 Linux Cache replacement algorithm description
Mark_accessed (b) {if b.state== (unactive && unreference) b.state = REFERENCE                          else if b.state = = (unactive && REFERENCE) {b.state = (ACTIVE && unreference)                    Add X to Tail of active_list} else if b.state = = (Active && unreference) B.state = (ACTIVE && REFERENCE)}reclaim () {if active_list not empty and scan_num<m ax_scan1 {X = Head of Active_list if (X.state & REFERENCE ) = = 0 Add X to tail of inactive_list else {X.st Ate &= ~referencemove X to Tail of active_list} scan_num               + +} scan_num = 0 if inactive_list not emptry and Scan_num < max_scan2 {                          X = Head of Inactive_list if (x.state & REFERENCE) = = 0 Return X else {x.state = ACTIVE |                    Unreferencemove X to Tail of active_list} scan_num++ } return Null}access (b) {if B is not in cache {if s                      Lot x free put B into X else {X=reclaim ()                    Put B into X} ADD x to Tail of inactive_list} Mark_accessed (X)}

Back to top of page

5 File cache related API and its implementation

There are many APIs related to file cache operation in the Linux kernel, which can be divided into two categories according to their usage: one is the related interface which is operated in copy mode, such as Read/write/sendfile, etc. The sendfile is no longer supported in the 2.6-series kernel, and the other is related interfaces, such as MMAP, that operate in the form of address mapping.

The first type of API copies data between the caches of different files or between the cache and the user space provided by the application, as shown in principle 7.

The second type of API maps the cache entry to the user space, allowing the application to access the file as if it were using a memory pointer, which is implemented in the kernel using the request-page mechanism, as shown in the process 8.

First, the application calls Mmap (Figure 1), and then calls Do_mmap_pgoff when it falls into the kernel (Figure 2). The function allocates a section of the application's address space as a mapped memory address and uses a VMA (VM_AREA_STRUCT) structure to represent the region and then returns to the application (Figure 3). When the application accesses the address pointer returned by the MMAP (Figure 4), a fault is triggered because the virtual reality mapping is not yet established (Figure 5). Then the system will call the page break processing function (Figure 6), in the page fault processing function, the kernel through the corresponding area of the VMA structure to determine that the region belongs to the file map, and then call the specific file system interface read into the corresponding pages cache entries (Figure 7, 8, 9), and fill in the corresponding Virtual reality mapping table. After these steps, the application can normally access the appropriate memory area.

Back to top of page

6 Summary

The file cache management is an important part of the Linux operating system, and it is also a hot research field. Currently, the Linux kernel's work in this area focuses on developing more efficient cache replacement algorithms, such as LIRS (its variant clockpro), ARC, and so on. The relevant information is visible http://linux-mm.org/AdvancedPageReplacement.

Resources
    • Understanding Linux Kernel Edition2
    • Linux Kernel source code scenario analysis
    • http://www.top500.org
    • R.w.carr and J.l.hennessy. Wsclock-a simple and effective algorithm for virtual mem-ory management. In Proc. ACM SOSP-08, Dec 1981.
    • S. Jiang, F. Chen, and X. Zhang. Clock-pro:an effective improvement of the CLOCK repla-cement. In Proc. USENIX ATC, APR. 2005.
    • S. Jiang and X. Zhang. Lirs:an efficient low in-terreference set replacement policy to improve buffer cache performance. In Proc. ACM sig-metrics, June 2002.
    • N. Megiddo and D. S. Modha. Arc:a self-tun-ing, low Overhead replacement Cache. In Proc. 2nd USENIX FAST, Mar 2003.
Article comments

Introduction to the Linux kernel file Cache management mechanism-IBM

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.