Linux: block device read/write and IO scheduler

Source: Internet
Author: User

 

The read and write requests mentioned in the previous article are not directly sent to the disk driver for processing. There is an important IO scheduler process in the middle.

Disk rotation is the most tragic part of Block devices. This process will take a lot of time. IO scheduler is mainly used to reduce disk rotation requirements. Mainly implemented through the way in 2: 1. Merge 2. Sort

Each device will correspond to its own request queue, and all requests will be in the Request queue before being processed. When a new request comes, if it finds that the request is adjacent to a previous request, it can be merged into a request. If the merged data cannot be found, the data is sorted in the disk rotation direction. I/O scheduler is usually used to merge and sort a single request without affecting the processing time of a single request.

Linux now has five types of IO scheduler.

1 Linus Elevator
In the 2.4 kernel, it is replaced by the default IO scheduler. 2.6 Later.
It is mainly used to maintain a request queue for each device.
When a new request is sent,
1. merge if it can be merged.
2. if it cannot be merged, it will try sorting. If all the requests in the queue are very old, this new request cannot be queued and can only be placed at the end. Otherwise, insert it to the appropriate location.
3. if it cannot be merged and there is no proper position for insertion, It will be placed at the end of the Request queue.

2 DeadLine IO
Deadline io schedator is actually an improvement to Elevator. 1. Avoid some requests that cannot be processed for too long. 2. Read and Write operations are differentiated.
Deadline IO maintains three queues. The first queue, like the Elevator, should be sorted by physical location as much as possible. The second and third queues are sorted by time. The difference is that one is read and the other is write.
Deadline IO distinguishes between read and write because the designer thinks that if an application sends a read request, it will usually block it and wait until the result is returned. Write requests are not usually written to the memory by application requests, and then written back to the disk by background processes. Generally, applications do not move on after writing achievements. Therefore, read requests have a higher priority than write requests.

In this design, each new request is first placed in the first queue. The algorithm is the same as that of Elevator, and is also added to the end of the read or write queue. In this way, we first process some requests in the first queue and check whether the first few requests in the second/Third queue have been waiting for too long. If it has exceeded a threshold value, we will process it. This threshold is 5 ms for read requests and 5 s for write requests.

I personally think it is best not to use partitions that record database change logs, such as oracle online log and mysql binlog. Because this type of write requests usually call fsync. If the write fails, the application performance will be greatly affected.

3 anticipatory IO
This is based on deadline IO. One disadvantage of deadline IO is that it can process the requests in the first queue well, but it is possible that the requests in the second and third queues are transferred far away. This affects the performance. The improvement of anticipatory is that after a read request has been processed (the disk has been moved), it is not eager to process other requests, but wait for a period of time (6 ms by default, controllable ), check whether there are any nearby requests. If yes, process them immediately. It will be a waste of time.

4 complete fair queruing IO

I/O scheduler. Fairness is for all processes with the same priority. In the preceding 3, the algorithm does not differentiate any processes. Therefore, cfq is quite different from other products. Cfq maintains a queue for each process. The Request Location in a single queue is similar to that in Elevator. Each time four (default) requests are processed in a queue, other requests are processed. Cfq is the default io scheduler after 2.6.

Each process has its own io priority, which can be viewed and modified through ionice. The higher the priority, the earlier the process is processed. The more time slices are used for this process, the more requests are processed at a time.
The concept of time slice. If the number of requests in the time slice has not been processed, the processing will stop. If the request to be processed in the time slice has been processed, the process will also be stopped.



5 noop
If you do not do anything, you can process one request. This method is simple and more effective. The problem is that there are too many disk seek instances, which is unacceptable for traditional disks. However, an SSD disk does not need to be rotated.

Elevator cannot be found in the new kernel. So there are only four types left.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.