This article is about the Linux page cache data synchronization and page recycling mechanism. Data synchronization and page recycling are two independent concepts, data synchronization is the memory/cache data and backup device data consistency problem, page recycling is the memory space when the allocation of the allocated physical memory pages, to obtain enough space to allocate clean pages, support higher priority work . Data synchronization can be triggered at any time, and page recycling is triggered when physical memory usage reaches a certain threshold.
Data synchronization means that the dirty pages in the physical memory and the page cache are written back to the files in the backup device. There are two ways to call data synchronization
1. Periodic calls, mainly pdflush mechanisms
2. Forced calls, such as Call sync, fsync system calls. When the number of dirty pages, the kernel will also force data synchronization, to control the number of dirty pages, so that data synchronization caused by the IO as smooth as possible
Pdflush is a set of kernel threads, equivalent to the kernel maintaining a pdflush thread pool, allocating Pdflush threads based on the load of data synchronization, a Pdflush thread can correspond to a block device, so that multiple pdflush threads correspond to multiple block devices, You can avoid excessive IO load on individual block devices that affect data synchronization for other block devices.
cat/proc/sys/vm/nr_pdflush_threads can view the number of Pdflush threads currently booted by the system
Data integrity synchronization triggered by system calls such as sync (that is, synchronizing all dirty pages), and kernel functions called by Pdflush triggered periodic brush out synchronization are given.
1. You can see that the target of data synchronization is mainly the object of file system, such as file system Super block, file inode metadata, file inode data block.
2. Whether it is data integrity synchronization or flush synchronization, the final call path is pooled into the Sync_sb_inodes function, which synchronizes all dirty inodes for a given super block
Synchronizing all Dirty inodes of a super block If you want to traverse all the inode lists every time to filter the dirty inode, the efficiency is quite low. In fact, the kernel maintains a dirty inode list, pointing to the dirty Inode list with the Super_block---s_dirty pointer to the super block, so that the inode of the linked list is synchronized in turn.
For an inode synchronization consists of two parts, metadata synchronization and data block synchronization, the kernel provides a number of flags to refine the operation details of data synchronization.
Compare several system calls that force synchronization
Sync: Synchronizing all dirty pages is data integrity synchronization. When an IO request is sent to the request queue, it is returned without waiting for the disk operation to complete. data loss can occur when a disk fails
Fsync: Synchronization of metadata and data blocks of a single file, waiting until the disk operation is complete before returning, ensuring the reliability of the data
Fdatasync: Data block synchronization for a single file, waiting until the disk operation is complete before returning to ensure the reliability of the data
Msync: Synchronizing dirty pages generated by mmap
The page recycling mechanism consists of three parts, data brush out flush, swap swap, releasing release.
Data brush flush and data synchronization are similar, which is to synchronize the page cache with backup files to disk, so that these pages can be recycled
Swap swap is primarily for anonymous mappings, private mappings, malloc dynamically allocated memory pages that do not have backup files, and swap them to swap areas located on disk to reclaim these pages
Release releases are primarily for some LRU read-only memory pages, which are released directly in the case of high pressure, so that the page can be recycled
The kernel's page recycling mechanism mainly solves several problems:
1. What recovery algorithm is used to ensure maximum benefit
2. Which pages are recycled
3. How to organize the swap area, how to access the swap area in the page
4. How to avoid page bumps in the case of high recovery pressure
Computer Bottom Knowledge Supplements (vii) page cache data Synchronization and page recycling mechanism