1. Overview
The devices that the system can randomly access a fixed-size data disk are called Block devices. These data disks are called blocks. Another basic device type is character devices. Character devices are accessed sequentially by byte streams, such as serial ports and keyboards. The fundamental difference between the two types of devices is whether they can be randomly accessed. In other words, whether they can jump from one location to another when accessing the device.
The character device only needs to control one location-the current location; the block Device Access location must move before and after different intervals of the media, and the block device requires high performance. How to manage Block devices and how to manage requests from Block devices is called the block I/O layer in the kernel.
2. Anatomy of a block Device
The smallest addressable unit sector in a block device. The most common things of a sector are 512 bytes. Software will use its own minimum logical addressable unit-block. An abstraction of a rock File System-you can only access a file system based on blocks. Although the addressing of physical disks is based on the sector level, all disk operations performed by the kernel are performed by block. Therefore, the block size can only be several times the size of the slice, but the size cannot exceed one page.
Sector: The minimum addressing unit of the device, also known as "Hard Sector" or "device block"
Block: The minimum addressing unit of the file system, also known as "file block" or "I/O BLOCK"
3. Buffer and buffer Header
When a block is called memory, it is stored in a buffer zone. Each buffer corresponds to a block, which is equivalent to the representation of the disk block in the memory.
All this information is closely integrated with the control information of the file system. The control information of the file system is stored in the super fast State. The super block is a data structure containing the information of the file system. Because the kernel needs related control information to process data, each buffer has a corresponding descriptor. This descriptor is represented by a buffer_head struct and is called a buffer header, which is defined in the <Linux/buffer_head.h> file.
The h_count field in the struct indicates the use technology of the buffer. Before operating the buffer header, you should increase the reference count of the buffer header to ensure that the buffer tray will not be allocated. After the buffer operation is completed, the reference count will be reduced.
The buffer header is used to describe the ing between disk blocks and physical memory buffers.
However, using the buffer header as an I/O operation unit has two drawbacks: first, the buffer header is a large and difficult-to-control data structure. For the kernel, it is more inclined to operate the page structure, because the page operation is easier and more efficient. Second, the buffer header can only describe a single buffer. When used as a container for all I/O, the buffer header will force the kernel to interrupt the I/O operation on large data blocks, make it an operation on multiple buffer_head struct.
4. Bio struct
Currently, the basic container for block I/O operations in the kernel is represented by the bio struct, which is defined in <Linux/Bio. h>. This structure represents the block I/O operations that are being organized in the form of a segment linked list. A fragment is a small contiguous memory buffer. Using fragments to describe the buffer. Even if a buffer is scattered across multiple locations in the memory, the Bio struct can also perform I/O operations on the kernel. Vector I/O like this is called aggregation I/O.
The most important fields in the bio struct are bi_io_ves, bi_vcnt, and bi_idx.
In short, each block I/O request is represented by a bio struct. Each request contains one or more blocks, which are stored in the bio_vec struct array.
4.1. Comparison between the buffer header and bio struct
The bio struct represents an I/O operation, which can contain one or more pages in the memory. On the other hand, the buffer_head struct represents a buffer, which only describes a block in the disk. The bio struct is lightweight, and the block described by it does not need to be a continuous storage zone and does not need to be separated by I/O operations.
Advantages of using bio struct instead of buffer_head struct:
1) bio struct can easily process high-end memory because it processes physical pages rather than direct pointers.
2) bio struct can represent both common page I/O and direct I/O
3) bio struct facilitates distributed-centralized block I/O operations
4) bio struct is a lightweight struct than the buffer header. Because it only needs to contain the information required for block I/O operations, and does not need to contain unnecessary information related to the buffer itself.
However, the concept of a buffer header is still required. After all, it is also responsible for describing the ing between disk blocks and pages.
5. Request queue
Block devices store the block I/O requests they have suspended in the Request queue. This queue is represented by the request_queue struct and is defined in <Linux/blkdev. h> contains a two-way Request queue and related control information. Add requests to the queue using high-level Code such as the file system in the kernel. As long as the request queue is not empty, the corresponding block Device Driver of the queue obtains the request from the queue header and sends it to the corresponding block device.
6. I/O Scheduler
Disk addressing is one of the most slow operations in the computer. To Optimize Addressing operations (minimize addressing time), the kernel does not simply follow the request receiving order, it will not be submitted to the disk immediately. Instead, it executes a pre-operation named merge and sort before submission. The sub-system responsible for submitting I/O requests in the kernel is called the I/O scheduler.
The I/O scheduler allocates disk I/O resources to all pending block I/O requests in the system. Specifically, this type of resource allocation is completed by merging and sorting pending requests in the Request queue. The process scheduler allocates processor resources to running processes in the system. Both process scheduler and I/O scheduler Virtualize a resource to multiple objects. For process scheduler, the processor is virtualized and shared by running processes in the system. The virtual block device of the I/O scheduler sends multiple disk requests to reduce the disk addressing time and ensure optimal disk performance.
6.1. I/O scheduler work
The I/O scheduler manages the request queues of Block devices. It determines the order of requests in the queue and the time when the request block device is distributed. This helps reduce the disk addressing time and increase the global (possibly unfair to some requests) throughput. The I/O scheduler uses two methods to reduce disk addressing time: Merge and sort.
Merging refers to combining two or more requests into a new request. When the file system submits a request to the Request queue to read a data zone from the file, if a request already exists in the queue, the disk sector it accesses is adjacent to the disk sector currently requested to access, these two requests can be merged into a new request for operations on a single and multiple adjacent disk sectors. Therefore, merging requests can obviously reduce system overhead and disk addressing times.
When a read request is submitted to the Request queue, there are no other requests in the queue that need to operate on adjacent sectors. In this case, the current request and other requests cannot be merged. This requires sorting. If there is a request for sorting, the location of the disk sector to be operated is close to that of the current request, so that the two requests are also adjacent to the queue, the entire request queue is arranged in an orderly manner in the growth direction of the slice. In this way, the disk addressing time for all requests is shortened by keeping the head moving in a straight line.
6.2. Linus elevator
In the 2.4 kernel, Linus elevator uses the default I/O scheduling program, which can perform merge and sort processing.
Merge: when a new request is added to the queue, it first checks whether the other pending requests can be merged with the new request. If the new request is directly connected to an existing request, it is merged forward. If the new request is directly connected to an existing request, it is merged backward.
Sort: If the merge fails, you need to find possible insertion points (the new request position in the queue must comply with the rule that the request is ordered in the sector direction ). If this parameter is found, the new request is inserted to the end of the queue. If no proper position is available, the new request is inserted to the end of the queue. In addition, if you find that there are requests with long residence time in the queue, the new requests will be added to the end of the queue, even after insertion, they will be sorted.
The defect is that this algorithm does not provide substantive services for requests that have waited for a period of time, which will eventually result in request hunger. For example, a request to operate in the same location on the disk can cause other requests in a remote location to never run.
6.3. deadline I/O Scheduler
The deadline (deadline) I/O scheduler was proposed to solve the hunger problem caused by Linus elevator locks.
Write operations usually submit requests to disks when the kernel is free. Write operations and applications that submit them are executed asynchronously. Read operations are synchronous and mutually dependent, therefore, the response time of read requests directly affects system performance. Therefore, the 2.6 kernel introduces a new deadline I/O call program. Note that reducing hunger must be at the cost of reducing global throughput.
In the deadline I/O scheduler, each request has a timeout time. By default, the Read Request timeout is 500 milliseconds, and The Write Request timeout is 5 seconds. The scheduler has three Queues: one is the sorting queue, which maintains the Request queue in order of the physical location of the disk. When a new request is submitted to the sorting queue, the deadline I/O scheduler is similar to the Linus elevator, merge and insert requests, they are also inserted into the Read Request FIFO queue and Write Request FIFO queue based on the request type. If the request times out in the two queues, the final-life I/O scheduler extracts the request from the FIFO queue for service.
The implementation of the deadline I/O scheduler is in drivers/block/deadline-iosched.c.
6.4. prediction I/O Scheduler
The deadline I/O scheduler does a lot of work to reduce the corresponding time for reading operations, but it reduces the system throughput. For example, when the system submits a read request during a heavy write operation, the I/O scheduler can quickly process the Read Request and then return the result of performing the write operation, this process is repeated for each read request. The two addressing operations compromise the global throughput of the system. The goal of the anticipatory I/O scheduler is to provide good global throughput while maintaining a good read response time.
The foundation of the prediction I/O scheduler is the deadline I/O scheduler. The prediction I/O scheduler implements three queues (plus one dispatch Queue) and sets a timeout value for each request, its main improvement is to increase the ability of prediction-inspired (heuristic. Attempts to reduce the number of addressing requests that process new read requests during I/O operations. The biggest difference from the deadline I/O scheduler is that after a request is submitted, it does not directly return other requests, but intentionally idle for a moment (6 ms by default ). These milliseconds are a good opportunity for applications to submit other read requests. Any requests that operate on adjacent disk locations will be processed immediately.
6.5. Completely and fairly queued I/O scheduling program
The complete Fair Queuing (CFQ) is designed for proprietary workloads. Cfqi/o scheduler puts incoming I/O requests into a specific queue, which is organized according to the process that causes the I/O Request. In each queue, incoming requests are merged with adjacent requests and inserted for classification. The queue can be classified by sector. The difference between the cfqi/o scheduler is that each process that submits an I/O Request has its own queue.
The cfqi/o scheduler rotates the scheduling queue by time slice, selects the number of requests from each queue (4 by default), and schedules the next round.
The I/O scheduler is implemented in drivers/block/cfq-iosched.c.
6.6. Empty operation I/O Scheduler
The Noop I/O scheduler is basically an empty operation and does nothing. The empty operation I/O Scheduler only performs merging without sorting. This algorithm is not meaningless because it is intended to be used on Block devices. If the block device has only one active node and no tracing burden, there is no need to sort it.
The empty operation I/O scheduler is implemented in drivers/block/noop-iosched.c and is designed for Random Device Access.