Linux File read/write mechanism and Optimization Methods

Source: Internet
Author: User

Linux File read/write mechanism and Optimization Methods
GuideLinux is a highly controllable, secure, and efficient operating system. This article only discusses the file read/write mechanism in Linux, and does not involve the comparison of different reading methods, such as read, fread, and cin. These reading methods basically call the system api read, but different encapsulation is implemented. All of the following tests use the open, read, and write system APIs.

Cache

Cache is a component used to reduce the average time required by high-speed devices to access low-speed devices. file read/write involves computer memory and disks, and the memory operation speed is much higher than that of the disk, write operations are performed on disks directly. On the one hand, the speed will be limited, and on the other hand, the service life of the disk will be reduced. Therefore, the operating system will cache data for read or write operations on the disk.

Page Cache

Page Cache is a buffer between memory and files. It is actually a memory area. All file IO (including network files) interacts directly with the Page Cache, the operating system maps a file to a page level through a series of data structures, such as inode, address_space, and struct page. The specific data structures and relationships are not discussed for the moment, you only need to know the existence of the page cache and it plays an important role in file IO. To a large extent, the optimization of file read/write is the optimization of page cache usage.

Dirty Page

A part of the file corresponding to the Page cache. If the Page cache is inconsistent with the content of the corresponding file area, the Page cache is called a Dirty Page ). Modify the page cache or create a new page cache. Dirty pages are generated if the disk is not flushed.

View page cache size

There are two ways to view the page cache size in linux, one is the free command

$ free              total       used       free     shared    buffers     cached Mem:      20470840    1973416   18497424        164     270208    1202864 -/+ buffers/cache:     500344   19970496 Swap:            0          0          0 

The cached column is the page cache size, in bytes.

The other is to directly view/proc/meminfo. Here we only focus on two fields

 Cached:1202872kB Dirty:52kB

Cached indicates the page cache size and Dirty indicates the Dirty page size.

Dirty page write-back Parameters

Linux has some parameters that can change the write-back behavior of the operating system on dirty pages.

$ sysctl -a 2>/dev/null | grep dirtyvm.dirty_background_ratio = 10vm.dirty_background_bytes = 0vm.dirty_ratio = 20vm.dirty_bytes = 0vm.dirty_writeback_centisecs = 500vm.dirty_expire_centisecs = 3000

Vm. dirty_background_ratio is the percentage of dirty pages that can be filled with memory. When the total size of dirty pages reaches this ratio, the system background process will start to fl the dirty pages to the disk (vm. dirty_background_bytes is similar, but is set by the number of bytes); vm. dirty_ratio is an absolute dirty data limit. The percentage of dirty data in the memory cannot exceed this value. If the number of dirty data exceeds this limit, new IO requests will be blocked until the dirty data is written into the disk; vm. dirty_writeback_centisecs specifies how long it will take to perform a dirty data write-back operation, measured in 1% seconds; vm. dirty_expire_centisecs specifies the time for dirty data to survive, in the unit of 1% seconds. For example, if the value is set to 30 seconds, during the write-back operation of the operating system, if the dirty data exceeds 30 seconds in the memory, it will be written back to the disk.

These parameters can be passed through sudo sysctl-w vm. run commands such as dirty_background_ratio = 5. The root permission is required. You can also run echo 5>/proc/sys/vm/dirty_background_ratio under the root user to modify the value.

File read/write process

With the concept of page caching and dirty pages, let's look at the file read/write process.

Read files
1. the user initiates the read operation. 2. operating System search page cache. if the page is not hit, a page missing exception occurs. Create a page cache and read the corresponding page from the disk to fill the page cache B. if hit, the content to be read is directly returned from the page cache. 3. user read call complete
Write files
1. the user initiates the write operation. 2. operating System search page cache. if a page is not hit, a page missing exception occurs. Then, a page cache is created to write user-passed content to page cache B. if hit, the user's incoming content is directly written to the page cache. 3. user write call completed 4. after a page is modified, it becomes a dirty page. The operating system has two mechanisms to write dirty pages back to the disk. manually call fsync () 6. the pdflush Process regularly writes dirty pages back to the disk.

The relationship between page cache and disk files is maintained by the operating system. The read/write operations on the page cache are completed in the kernel mode and transparent to users.

Optimization of file read/write

Different optimization schemes are suitable for different application scenarios, such as file size and read/write frequency. Here, we do not consider modifying system parameters. There is always a loss in modifying system parameters, you need to select a balance point, which is highly relevant to the business, such as whether to require strong data consistency and whether to tolerate data loss. The optimization methods include the following:

1. maximize the use of page Cache

2. Reduce the number of system api calls

The first point is easy to understand. Try to hit the page cache for each IO operation, which is much faster than operating the disk. The second point mentioned in the system api is mainly read and write, because the system call enters the kernel state from the user State, and some are also accompanied by the copy of the memory data, reducing system calls in some scenarios also improves the performance.

Readahead

Readahead is a non-blocking system call that triggers the operating system to pre-read the file content to the page cache and immediately returns it. The function prototype is as follows:

ssize_treadahead(intfd,off64_toffset,size_tcount);

In general, calling readahead immediately after calling read does not increase the read speed. We usually call readahead in batches or some time before reading. Assume that the following scenarios are met, we need to read 1000 1 m Files consecutively. There are two solutions: pseudo code:

Directly call the read function
char* buf = (char*)malloc(10*1024*1024);for (int i = 0; i < 1000; ++i){    int fd = open_file();    int size = stat_file_size();    read(fd, buf, size);    // do something with buf    close(fd);}
Call readahead in batches before calling read
int* fds = (int*)malloc(sizeof(int)*1000);int* fd_size = (int*)malloc(sizeof(int)*1000);for (int i = 0; i < 1000; ++i){    int fd = open_file();    int size = stat_file_size();    readahead(fd, 0, size);    fds[i] = fd;    fd_size[i] = size;}char* buf = (char*)malloc(10*1024*1024);for (int i = 0; i < 1000; ++i){    read(fds[i], buf, fd_size[i]);    // do something with buf    close(fds[i]);}

If you are interested, you can write the code for actual testing. Note that you must write the dirty page and clear the page cache before testing. Execute the following command:

sync&&sudosysctl-wvm.drop_caches=3

Check the Cached and Dirty items in/proc/meminfo to check whether the configuration takes effect.

The test shows that the second method is about 10%-20% faster than the first method. In this scenario, the read operation is performed immediately after the readahead is executed in batches, and the optimization space is limited, if you can call readahead some time before the read operation, the read speed of the read operation will be greatly improved.

This solution actually uses the page cache of the operating system, that is, it triggers the operating system to read files to the page cache in advance, in addition, the operating system provides a complete set of mechanisms for page missing processing, cache hit, and cache elimination. Although users can also perform Cache Management for their own data, however, it is no different from using page cache directly, and it will increase the maintenance cost.

Mmap

Mmap is a memory ing file method that maps a file or other objects to the address space of a process, implements a one-to-one ing between the file disk address and a segment of virtual address in the virtual address space of the process. The function prototype is as follows:

void*mmap(void*addr,size_tlength,intprot,intflags,intfd,off_toffset);

After such a ing relationship is implemented, the process can read and write the memory segment using a pointer, and the system will automatically write back the dirty page to the corresponding file disk, that is, the operation on the file is completed without calling the read, write, and other system call functions. As shown in

In addition to system calls such as read and write, mmap can also reduce the number of copies in the memory. For example, during read calls, a complete process is for the operating system to read disk files to the page cache, copy data from the page cache to the buffer passed by read. If mmap is used, the operating system only needs to read the disk to the page cache, then, the user can directly operate the memory mapped to mmap through the pointer, reducing the data copying from the kernel state to the user State.

Mmap is suitable for frequent reads and writes to the same region. For example, if a 64 M file stores some index information, we need to modify it frequently and make it persistent to the disk, in this way, the file can be mapped to the user's virtual memory through mmap, and then the memory area can be modified through the pointer. The operating system will automatically fl the modified part back to the disk, you can also manually call msync to manually fl the disk.

From: http:// OS .51cto.com/art/201609/517642.htm

URL: www.linuxprobe.com/linux-read-write-tuning.html


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.