Linux memory mechanism-cache and buffer
In Linux often found that the free memory is very small, it seems that all the memory is occupied by the system, the surface of the memory is not enough to use, it is not. This is an excellent feature of Linux memory management, which differs from Windows memory management in this regard. The main feature is that no matter how large the physical memory is, Linux takes advantage of it, reads some program-called hard disk data into memory, and uses the high-speed feature of memory read-write to improve the data access performance of Linux system. Instead, Windows allocates memory for the application only when it needs memory, and does not take full advantage of the large capacity of the memory space. In other words, with each additional physical memory, Linux will be able to take full advantage of the benefits of hardware investment, and Windows will only do it as a device, even if the increase of 8GB or even larger.
This feature of Linux is mainly the use of free physical memory, the partition of a portion of space, as a cache and buffers, in order to improve data access performance.
1. What is the cache?
The page cache (cache) is a primary disk cache implemented by the Linux kernel. It is primarily used to reduce disk i/0 operations. Specifically, by caching the data in the disk into physical memory, access to the disk becomes access to physical memory.
The value of the disk cache is two: first, accessing the disk is much slower than accessing the memory, so accessing the data from memory is faster than accessing it from disk. Second, once the data is accessed, it is likely to be accessed again in the short term.
The page cache is made up of physical pages in memory, and each page in the cache corresponds to multiple blocks on the disk. Whenever the kernel starts to perform a page I/O operation (typically a disk operation on a block of page size in a normal file), it first checks whether the required data is in the cache, and if so, the kernel uses the data in the cache directly, thereby avoiding access to the disk.
For example, when you open a source program file using a text editor, the data for the file is transferred into memory. As you edit the file, more and more data is transferred into the memory page. Finally, when you compile it, the kernel can directly use pages in the page cache without having to read the file back from disk. Because users tend to read or manipulate the same file over and over, the page cache can reduce the amount of disk operations.
2, how to update the cache?
Because of the caching effect of the page cache, the write operation is actually delayed. When the data in the page cache is newer than the data stored in the background, the data is called dirty data. Dirty pages that accumulate in memory must eventually be written back to disk. Dirty pages are written back to disk when the following two scenarios occur:
When free memory falls below a specific threshold, the kernel must write the dirty page back to disk in order to free up memory.
When a dirty page resides in memory for more than a certain threshold, the kernel must write the dirty page that is timed out back to disk to ensure that the dirty page does not reside in memory indefinitely.
In the 2.6 kernel, a group of kernel threads--pdflush the background writeback process to perform both tasks uniformly.
First, the Pdflush thread flushes the dirty page back to disk when the idle memory in the system falls below a specific threshold. The purpose of the background write process is to free up dirty pages to regain memory when available physical memory is too low. Specific memory thresholds can be set through Dirty_background_ratio sysctl system invocation. When the free memory ratio threshold: Dirty_background_ratio is low, the kernel calls the function Wakeup_bdflush () to wake up a pdflush thread, and then pdflush the thread further calls the function background_writeout () to start writing dirty pages back to disk. The function background_writeout () requires a long shaping parameter that specifies the number of pages to be written back. The function background_writeout () writes back to the data consecutively, specifying that the following two conditions are met:
A specified minimum number of pages has been written out of the disk.
The number of free memory has been picked up, exceeding the threshold value of dirty_background_ratio.
The above conditions ensure that the Pdflush operation can alleviate the low memory pressure in the system. The writeback operation does not stop until both conditions are reached, unless Pdflush writes back all the dirty pages, and no remaining dirty pages can be written back.
To meet the second goal, the Pdflush daemon is periodically woken up (regardless of whether free memory is too low), writes out dirty pages that have been in memory for too long, and ensures that there are no persistent dirty pages in memory. If the system crashes and the memory is in disarray, the dirty pages that have not yet been written back to the disk in memory are lost, so it is important to periodically synchronize the page cache and disk. at system startup, the kernel Initializes a timer that periodically wakes up the pdflush thread and then makes it run function wb_kupdate ().
The difference between cache and buffer:
Cache: Caching is a small but high-speed storage area between the CPU and main memory. Because the CPU speed is much higher than the main memory, the CPU accesses the data directly from the memory to wait for a certain period of time, the cache holds the CPU just used or recycled part of the data, when the CPU re-use the part of the data can be directly called from the cache, which reduces the CPU waiting time, Provides the efficiency of the system.
Buffer: An area where data is transferred between devices that are not synchronized or that have different priority levels. Through buffers, you can reduce the number of waits between processes, so that when you read data from a slow device, the operating process of a fast device is uninterrupted.
Buffer and cache in the free command output: (They are all memory-intensive):
Buffer: Memory as buffer cache, which is the read and write buffer of the block device
Cache: As the page cache memory, the file system cache
If the cache has a large value, it indicates that there are many files in the cache, and if the files that are frequently accessed can be cache, the disk's read IO bi will be very small.
A buffer is something that have yet to being "written" to disk. A cache is something that have been "read" from the disk and stored for later use.
Buffer is used to temporarily store data that needs to be written to the hard disk
The data that the cache uses to temporarily read from the hard disk, which is often used recently.
The cache is read more often, and buffer is used to write it.
Some doubts about memory usage under Linux [reprint]