Linux file read-write mechanism and optimization method

Source: Internet
Author: User
Tags function prototype root access

Guide Linux is a highly controllable, safe and efficient operating system. This article only discusses the Linux file read and write mechanism, does not involve different reading methods such as read,fread,cin, such as the comparison, these reading is essentially called the system API read, just do a different package. All of the following tests use open, read, write, which is a set of system APIs

Cache

Caching is a component used to reduce the average time required for high-speed devices to access slow devices, where file reads and writes involve computer memory and disk, and memory operations are much larger than disks, and if each call to Read,write is directed to the disk, speed is limited and disk life is reduced on the one hand. As a result, the operating system caches data regardless of whether it is a read or write operation on the disk.

Page Cache

Page cache is a buffer between memory and files, which is actually an area of memory, and all file IO (including network files) interacts directly with the page cache, operating system through a series of data structures such as Inode, Address_space, struct page, the implementation of mapping a file to the level of the page, the specific data structure and the relationship between us for the moment, just know the existence of the page cache and it plays an important role in the file Io, to a large extent, the optimization of file read and write is the use of the page cache optimization

Dirty Page

The page cache corresponds to a region in the file, and if the page cache and the corresponding file area contents are inconsistent, the page cache is called a dirty page (Dirty pages). Modify the page cache or create a new page cache, as long as no disk is being brushed, dirty pages will be generated

View page Cache size

There are two ways to view the page cache size on Linux, one is the free command

$ free       total used       free     shared    buffers     cached Mem:      20470840    1973416   18497424        164     270208    1202864-/+ buffers/cache:     500344   19970496 Swap:            0          0          

Cached that column is the page cache size, in units of byte

The other is to look directly at/proc/meminfo, where we only focus on two fields

Cached:          1202872 KB Dirty:                kb

Cached is the page cache size, dirty is the dirty page size

Dirty Page Write-back parameters

Linux has some parameters that can change the write-back behavior of dirty pages by the operating system

$ sysctl-a 2>/dev/null | grep dirtyvm.dirty_background_ratio = 10vm.dirty_background_bytes = 0vm.dirty_ratio = 20vm.dirty_bytes = 0vm.dirty_ Writeback_centisecs = 500vm.dirty_expire_centisecs = 3000

Vm.dirty_background_ratio is the percentage of dirty pages that memory can fill, and when the total size of the dirty page reaches that scale, the system daemon starts to brush the Dirty page disk (vm.dirty_background_bytes, just like that, by setting the number of bytes Vm.dirty_ratio is an absolute dirty data limit, and the percentage of dirty data in memory cannot exceed this value. If the dirty data exceeds this amount, the new IO request will be blocked until the dirty data is written into the disk; vm.dirty_writeback_centisecs specifies how long it takes to do a dirty writeback operation in 1% seconds; vm.dirty_expire_ CENTISECS Specifies the amount of time the dirty data can survive, in 1% seconds, for example, 30 seconds, when the operating system writes back, and if the dirty data is in memory for more than 30 seconds, it is written back to disk.

These parameters can be modified with the sudo sysctl-w vm.dirty_background_ratio=5 command, which requires root access and can be executed under the root user echo 5 >/proc/sys/vm/dirty_ Background_ratio to modify

file read/write process

With the concept of page caching and dirty pages, let's look at the file read and write process

Read the file
1. The user initiates the read operation 2. The operating system looks up page cache  A. If missed, a page fault is generated, and the pages cache is created, and the corresponding page is read from disk to populate the page cache  B. If hit, returns the content to be read directly from the page cache 3. User Read Call complete
Write a file
1. User initiated write Operation 2. Operating system Lookup page cache  A. If missing, a page fault is generated, and then the pages cache is created to write the user's incoming content to the page cache  B. If hit, the user's incoming content is written directly to the page cache 3. User write call Completion 4. When the page is modified to become a dirty page, the operating system has two mechanisms to write the dirty page back to disk 5. User manually calls Fsync () 6. The dirty pages are written back to disk by the Pdflush process periodically

Page caches and disk files have a corresponding relationship, this relationship is maintained by the operating system, the page cache read and write operations are completed in the kernel state, is transparent to the user

the optimized thinking of document reading and writing

Different optimization schemes adapt to different usage scenarios, such as file size, read-write frequency, etc., here we do not consider modifying the system parameters of the scheme, modify the system parameters are always gains, need to choose a balance point, this and business correlation is too high, such as whether to require strong consistency of data, whether to tolerate data loss and so on. The idea of optimization has the following two points:

1. Maximize the use of page caching

2. Reduce the number of system API calls

1th is easy to understand, try to make every IO operation hit the page cache, which is much faster than the disk operation, the 2nd mentioned system API is mainly read and write, because the system calls from the user state into the kernel state, and some are accompanied by a copy of the memory data, So reducing system calls in some scenarios can also improve performance

ReadAhead

ReadAhead is a non-blocking system call that triggers the operating system to pre-read the contents of the file into the page cache and return immediately, with the function prototype as follows

ssize_t readahead (int fd, off64_t offset, size_t count);

In general, call ReadAhead immediately after calling read does not increase the read speed, we usually in bulk read or a period of time before the call ReadAhead, assuming the following scenario, we need to continuously read 1000 1M files, there are two scenarios, the pseudo-code is as follows

call the Read function directly
char* buf = (char*) malloc (10*1024*1024); for (int i = 0; i <; ++i) {    int fd = Open_file ();    int size = Stat_file_size ();    Read (FD, buf, size);    Do something with buf    close (FD);}
Bulk Call ReadAhead before calling read
int* FDS = (int*) malloc (sizeof (int) *1000), int* fd_size = (int*) malloc (sizeof (int) *1000), for (int i = 0; i <; ++i) {    int fd = Open_file ();    int size = Stat_file_size ();    ReadAhead (FD, 0, size);    Fds[i] = FD;    Fd_size[i] = size;} char* buf = (char*) malloc (10*1024*1024); for (int i = 0; i <; ++i) {    read (Fds[i], buf, Fd_size[i]);    Do something with buf    close (Fds[i]);}

Interested can write code to actually test, it should be noted that before testing must write back the dirty page and empty page cache, execute the following command

Sync && sudo sysctl-w vm.drop_caches=3

You can check the cached and dirty in/proc/meminfo to see if it takes effect.

The test found that the second method was about 10%-20% higher than the first reading speed, which was performed immediately after the batch execution of ReadAhead, and the optimization space was limited, if there was a scenario that could call readahead some time before read, That will greatly increase the read speed of read itself

This scenario is actually taking advantage of the operating system's page cache, that is, triggering the operating system to read the file to the page cache, and the operating system of the pages processing, cache hit, cache retirement by a set of perfect mechanism, although users can also do cache management for their own data, but the direct use of page cache is not much different, and will increase the maintenance cost.

mmap

Mmap is a memory-mapped file method that maps a file or other object to the address space of a process, implements a one by one-mapping relationship between the file disk address and a virtual address in the process virtual address space, and the function prototype is as follows

void *mmap (void *addr, size_t length, int prot, int flags, int fd, off_t offset);

After such a mapping relationship is implemented, the process can read and write the memory in a pointer way, and the system will automatically write back the dirty page to the corresponding file disk, which completes the operation of the file without having to call the system call function such as Read,write. As shown

Mmap, in addition to reducing system calls such as Read,write, can also reduce the number of copies of memory, such as when a read call, a complete process is the operating system read the disk file to the page cache, and then copy the data from the page cache into the read pass buffer, If Mmap is used, the operating system only needs to read the disk to the page cache, and then the user can manipulate the mmap mapped memory directly via pointers, reducing the data copy from the kernel state to the user state.

Mmap is suitable for frequent reading and writing of the same area, such as a 64M file stored some index information, we need to frequently modify and persist to the disk, so that the file can be mapped through mmap to the user virtual memory, and then through a pointer to modify the memory area, The modified parts are automatically brushed back to disk by the operating system, or they can be manually brushed by calling Msync

Linux file read-write mechanism and optimization method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.