The Linux kernel implements I/O primarily in three kernel subsystems: Virtual file system (VFS), page caching, and page writeback.
A virtual file system (sometimes called virtual files switch) is an abstraction of the file operations of a Linux kernel. It allows the kernel to use file system functions and manipulate file system data without needing to know the file system type. VFS implements this abstraction by using a common file model, which is the basis for all Linux file systems. Based on function pointers and various object-oriented methods, the universal file model provides a framework that Linux kernel file systems must follow. It allows VFS to initiate requests to the file system. The framework provides hooks to support reading, establishing links, synchronizing, and other functions. Each file system then uses the appropriate function to handle the action.
Page caching is a way of saving data that has been accessed recently on the disk file system in memory. Disk access is too slow relative to the current processor speed. When the requested data is saved in memory, subsequent requests for the same data can be read directly from memory by the kernel, avoiding duplicate disk access as much as possible. Page caching leverages a method of referential locality (localityofreference)------time locality (temporallocality), which makes it possible for newly accessed resources to be accessed again shortly. Due to the avoidance of time-consuming disk access, the memory cost of caching data on first access is compensated for the page cache is the kernel's first destination for file system data. The kernel calls the storage subsystem to read data from disk only if it is not found in the cache. , the kernel uses buffers to defer write operations. When a process initiates a write request, the data is copied into a buffer and the buffer is marked as "dirty", which means that the copy in memory is newer than the disk. At this point, the write request can be returned. If there is a new write request to the same chunk, the buffer is updated to the new data. Write requests in other parts of the file open a new buffer.
Page writeback writes those "dirty" buffers to disk, synchronizing disk files and memory data. There are two conditions that trigger a write-back:
1, when the free memory is less than the set threshold, the dirty buffer will be written back to disk, the cleaned buffer may be removed to free up memory space;
2, when a dirty buffer lifetime exceeds the set threshold, the buffer is written back to disk. To avoid the uncertainty of the data.
I/O Scheduler and I/O performance
Disk addressing to understand how the I/O Scheduler works, you need to know some background knowledge first. Hard drives are based on cylinder (cylinders), head (heads), and Sector (section) geometric addressing to obtain data, which is also known as CHS addressing. Each drive consists of multiple platters, each consisting of a disk, a spindle, and a read-write header.
The I/O Scheduler implements two basic operations:
1, the merge (merging) operation is the process of merging two or more adjacent I/O requests into one. Consider two requests, read block 5th at a time, and read data on 6 and 7 on the other. These requests are combined into a block 5 through 7 operation. The total I/O throughput may be the same, but the number of I/O decreases by half.
2, sort (sorting) is to pick the relatively more important one in two operations and reschedule the waiting I/O request in the order in which the block number is incremented. For example, the I/O operation requires access to block 52,109, and the 7,I/O dispatch three requests are sorted in 7,52,109 order. If a request now accesses 81, it will be inserted in the middle of Access 52 and 109. The I/O scheduler then dispatches them one at a time in the queue: 7, then 52, then 81, and the last 109.
Each read request must return the most recent data. Therefore, when the requested data is not in the page cache, the read request is blocked until the data is read from the disk-which can be a fairly lengthy operation. We refer to this performance loss as read latency (latency). A typical program may have several I/O requests in a short period of time. Because each request is synchronized separately, later requests will depend on the previous request. When a write operation requires multiple blocks to be inserted in the queue, the block read delay at the end of the queue becomes very serious. This phenomenon is a famous writes-starving-reads problem.
The I/O Scheduler uses a mechanism to prevent "starvation" from occurring. The simplest way to do this is to use the Linux elevator scheduling method like the 2.4 kernel. In this method, if there is a certain number of old requests in the queue, the insertion of a new request is stopped. As a whole, each request can be treated equally, but when read, it increases the reading delay (read latency). The 2.6 kernel discards the Linus elevator scheduling algorithm and uses several new scheduler algorithms instead.
1,deadline I/O Scheduler is to solve the problem of 2.4 Scheduler and traditional elevator scheduling algorithm. The Linus elevator algorithm maintains a sorted I/O wait list. The I/O request at the top of the queue is the next scheduled. The DEADLINEI/O scheduler retains this queue, in order to further improve the original scheduler, adding two new queues: Read FIFO queue and write FIFO queue. Each request in the FIFO queue is set to an expiration time. The expiration time of the read FIFO queue is set to 500 milliseconds, and the write queue is 5 seconds.
2,anticipatory I/O Scheduler, the Deadline I/O scheduler behaves well, but is not perfect. When faced with numerous independent read requests, the problem persists-each read request is executed after the previous request has been returned, and the I/O scheduler has processed other requests when the application gets the data, prepares to run, and submits the next read request. This results in an unnecessary seek operation for each search: Finding the data, reading the data, and returning it.
The ANTICIPATORYI/O Scheduler adds a predictive mechanism to the deadline I/O scheduler, and when a read operation is committed, the anticipatory I/O scheduler dispatches it before its expiration time. Unlike the deadline I/O Scheduler, the anticipatory I/O Scheduler waits 6 milliseconds. If the application makes another read request to the same part of the hard disk within 6 milliseconds, the read request is immediately responded to, and the anticipatory I/O Scheduler continues to wait.
3,CFQ I/O Scheduler, although there are differences in methods, the complete Fair Queuing (CFQ) I/O Scheduler and the target of the above scheduler are the same. With CFQ, each process has its own queue, and each queue is assigned a time slice. The I/O Scheduler uses a rotary approach to access and process requests in the queue until the queue is exhausted or all requests are processed. In the latter case, the CFQ I/O Scheduler will be idling for a period of time (default of 10 milliseconds), waiting for new requests in the current queue. If the prediction succeeds, the I/O Scheduler avoids the find operation. If the prediction is not valid, the scheduler instead processes the queue for the next process.
4,noop I/O Scheduler, the NOOPI/O Scheduler is currently the simplest scheduler. No matter what the situation, it is not sorted, just a simple merge. It is typically used on special devices that do not need to queue to requests.
The Scheduler default I/O scheduler can be specified at startup by the kernel parameter iosched. Valid options are As,cfq,deadline, and NoOp. Each block device can also be selected at run-hour, which can be done by modifying the/sys/block/device/queue/scheduler. Read this file to know what the current I/O scheduler is, and write the above valid options to this file to change the I/O scheduler. For example, to set the I/O scheduler for the device HDA to CFQ, you can use the following methods:
#echo CFQ >/sys/block/hda/queue/scheduler
Because disk I/O is slow compared to the rest of the system, and the I/O system is an important part of modern computers, it is important to make I/O performance optimal. Reducing the number of I/O operations (by aggregating many small operations into large operations), implementing block-aligned I/O, or using user-space buffering, leveraging the benefits of advanced I/O, such as vector I/O, locating I/O, and asynchronous I/O, are important steps that need to be considered frequently during system programming. In order for I/O requests to be submitted in a sequence that facilitates addressing operations, the user space program can do different processing. They can be sorted in the following ways:
1, full path: In the layout algorithm used by most file systems, the files in each directory tend to be distributed adjacent to the disk.
2,inode number: Using inode sorting is more efficient than path sorting, and in general, the order of the inode means the order of the physical blocks.
3, the physical block of the file: the physical block is obtained through the file logical block, and then sorted again. The first step is to determine the number of blocks in the file. This can be done through the stat () call. Second, for each logical block, we use the IOCTL () call to get the physical block associated with it.
Linux file io (v) IO core