Linux io Scheduler (Linux IO Scheduler)

Source: Internet
Author: User
Tags benchmark

each block device or partition of a block device has its own request queue (request_queue), and each request queue can select an I/O Scheduler to coordinate the request submitted . The basic purpose of the I/O Scheduler is to arrange requests according to the sector code they correspond to on the block device to reduce the movement of the heads and improve efficiency. Requests in the request queue for each device are responded to in order. In fact, in addition to this queue, each scheduler itself maintains a different number of queues, which are used to process requests that are submitted, while the request at the front of the queue is moved to the request queue in time to wait for a response.

< Span lang= "ZH-CN" > < Span lang= "X-none" >     io Scheduler is located in the kernel stack as follows:

< Span lang= "ZH-CN" > < Span lang= "X-none" >      

< Span lang= "ZH-CN" > < Span lang= "X-none" >      The kernel implements io Scheduler has four main types --noop,deadline,cfg
/span>

< Span lang= "ZH-CN" > < Span lang= "X-none" > 1,noop algorithm

< Span lang= "ZH-CN" > < Span lang= "X-none" >      noop scheduling algorithm is the simplest IO scheduling algorithm in the kernel. The NoOp scheduling algorithm is also called an elevator scheduling algorithm, it puts IO requests into a FIFO queue, and then executes these IO requests, of course, for some continuous IO requests on disk, the NOOP algorithm will do some appropriate merging. This scheduling algorithm is especially suitable for applications that do not want the scheduler to reorganize the order of IO requests.

< Span lang= "ZH-CN" > < Span lang= "X-none" >       This scheduling algorithm has obvious advantages in the following scenarios:

< Span lang= "ZH-CN" > < Span lang= "X-none" >      1) has a smarter IO scheduling device under the IO Scheduler. If your block device drivers is raid, or a storage device such as San,nas, these devices will better organize IO requests without the IO Scheduler to perform additional scheduling tasks;

< Span lang= "ZH-CN" > < Span lang= "X-none" >     &NBSP;2) applications on the upper level are more understanding of the underlying device than the IO scheduler. or the upper-level application to reach the IO Scheduler's IO request has been carefully optimized, then the IO Scheduler does not need to superfluous, only need to sequentially execute the upper layer of the IO request.

< Span lang= "ZH-CN" > < Span lang= "X-none" >      3) for some non-rotating head storage devices, the use of noop more effective. Because the request reorganization of the IO Scheduler takes a certain amount of CPU time for a disk with a rotating head type, these CPU times can be saved for SSD disks. This article mentions that using the NoOp effect in SSDs is better.

< Span lang= "ZH-CN" > < Span lang= "X-none" > 2,deadline algorithm

< Span lang= "ZH-CN" > < Span lang= "X-none" > The core of the      deadline algorithm is to ensure that each IO request must be serviced within a certain amount of time to avoid starvation of a request.

< Span lang= "ZH-CN" > < Span lang= "X-none" >      deadline algorithm introduced four queues, these four queues can be divided into two categories, each class consists of two types of read and write queue, a class of queues to the request by the starting sector ordinal order, Organized by red-black trees, called sort_list, and the other sorts of requests by their generation time, organized by linked lists, called Fifo_list. whenever a transmission direction is determined (read or write), Then a batch of successive requests will be dispatch from the corresponding sort_list to the Requst_queue request queue, the specific number determined by Fifo_batch. Only the following three cases result in the end of a bulk transfer:

1) There is no request in the corresponding sort_list.

2) The next requested sector does not meet the incremental requirements

3) The previous request was already the last request for the bulk transfer.

All requests are assigned a period value (according to Jiffies) at the time of generation, and the time limit is sorted in Fifo_list, the duration of the read request is defaulted to 500ms, the duration of the write request is the default of 5s, it can be seen that the kernel to read the request is very eccentric, in fact, not only that, In the deadline scheduler, it also defines a starved and writes_starved,writes_starved default of 2, which can be understood as the starvation of write requests, the kernel always prioritizes read requests, starved indicates the number of read requests currently processed, Only when the starved exceeds the writes_starved will they consider writing the request. Therefore, if a write request deadline has been exceeded, the request will not necessarily be immediately responded to, because the read request batch has not finished processing, even after processing, you must wait until starved more than writes_starved to be able to respond. Why does the kernel favor read requests? This is considered in terms of overall performance. The relationship between the read request and the application is synchronous, because the application waits for the content to be read before it can proceed to the next step, so the read request blocks the process, and the write request is different, and when the application makes a write request, the memory content is not affected by the program when it is written to the block device, so the scheduler prioritizes the read request

By default, the time-out for a read request is 500ms, and the time-out for the write request is 5s.

This article says that in some multithreaded applications, the deadline algorithm is better than the CFQ algorithm. This article says that in some database applications, the deadline algorithm is better than the CFQ algorithm.

3,anticipatory Algorithm

The core of the anticipatory algorithm is the principle of locality, which expects a process to continue to make IO requests here after an IO request last night. In IO operation, there is a phenomenon called "false idle" (deceptive idleness), it means that a process has just finished a wave of read operations, it seems to be idle, not read, but in fact it is processing the data, after processing the data, it will continue to read, This time if the IO Scheduler is going to handle the request of another process, then when the next request of the original dummy idle process comes, the head will have to seek to the location just now, which greatly increases the seek time and the rotation time of the head. Therefore, the anticipatory algorithm will be completed after a read request, and then wait a certain time t (usually 6ms), if the 6ms, the process is still read requests come over, then I continue to serve, otherwise, processing the next process read and write requests.

In some scenarios, the antocipatory algorithm has a very effective performance boost. This article has said that this article also has a review.

It is worth mentioning that the anticipatory algorithm was removed from the Linux 2.6.33 version, because the CFQ can also be configured to achieve the effect of the anticipatory algorithm.

4,CFQ Algorithm

CFQ (Completely Fair Queuing) algorithm, as the name implies, an absolute fair algorithm. It attempts to assign a request queue and a time slice to all processes that compete for a block device, and the process can send its read-write request to the underlying block device when the scheduler is allocated to the system, and the process's request queue is suspended for scheduling when the process's time slice is exhausted. The time slices per process and the queue length per process depend on the IO priority of the process. Each process will have a iocfq Scheduler will consider it as one of the factors to determine when the request queue for the process can acquire the use of block devices. io:rt (real Time), be (best try), idle (idle), rtbe8 child priority. In fact, we already know that cfq Scheduler fairness is for the process, and only the synchronization request (read syn write) 8 (RT) +8 (BE) +1 (IDLE) =17

From Linux 2.6.18, CFQ as the default IO scheduling algorithm.

For a common server, CFQ is a good choice.

For the use of which scheduling algorithm, or according to the specific business scenario to do the foot benchmark to choose, can not rely on other people's text to decide.

5. Change the IO scheduling algorithm

In Rhel5/oel5 and later versions (such as RHEL6 and RHEL7), I/O Scheduler can be set for each disk, and the modifications will take effect immediately, such as:

$ cat/sys/block/sda1/queue/'cfq'>/sys/block/sda1/queue//sys/ block/sda1/queue/Schedulernoop anticipatory deadline [CFQ]

Reference documents

1,https://en.wikipedia.org/wiki/noop_scheduler

2,https://en.wikipedia.org/wiki/deadline_scheduler

3,https://en.wikipedia.org/wiki/anticipatory_scheduling

4,https://en.wikipedia.org/wiki/cfq

5,http://www.redhat.com/magazine/008jun05/features/schedulers/This article introduces four IO scheduling algorithms and makes a review of their application scenarios.

6,http://www.dbform.com/html/2011/1510.html

7,https://support.rackspace.com/how-to/configure-flash-drives-in-high-io-instances-as-data-drives/ This article describes the configuration of IO Scheduler in some SSDs

8,https://www.percona.com/blog/2009/01/30/linux-schedulers-in-tpcc-like-benchmark/

9,http://www.ibm.com/support/knowledgecenter/api/content/linuxonibm/liaat/liaatbestpractices_pdf.pdf

10,http://dl.acm.org/citation.cfm?id=502046&dl=guide&coll=guide

11,http://www.nuodb.com/techblog/tuning-linux-io-scheduler-ssds

Linux io Scheduler (Linux IO Scheduler)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.