I/O Scheduling in Linux

Source: Internet
Author: User

Reprinted: http://www.php-oa.com/2010/01/03/linux-io-elevator.html

I/O Scheduling Algorithms compete for disk I/O by each process. he requires optimal processing of the request order and timing to achieve the best overall I/O performance as possible. in fact, there are only two Io optimizations, merging and sorting... List four scheduling algorithms in Linux

CFQ (completely fair queuing fully queues) (Elevator = CFQ ):

This is the default algorithm, which is usually the best choice for general-purpose servers. it tries to evenly distribute access to the I/O bandwidth. in multimedia applications, audio and video can always read data from the disk in a timely manner. however, it also performs well for other applications. each process has one queue, and each queue performs merge and sort according to the preceding rules. round robin scheduling between processes. Four requests of a process are executed each time. you can call queued and quantum for optimization.

Deadline (ELEVATOR = deadline ):

This algorithm tries to minimize the latency of each request. this algorithm reassembles the request order to improve performance. you can call the expiration read/write process of the queue, such as the read_expire and write_expire parameters to control how long data must be read. If the timeout occurs, the sorting will be abandoned. suitable for small files. you can also use front_merges to merge adjacent files.

Noop (ELEVATOR = Noop ):

I/O requests are allocated to the queue, and scheduling is performed by hardware only when the CPU clock frequency is relatively limited.
Noop is not so worried about I/O. It processes all I/O requests in a FIFO queue. By default, it is considered that I/O does not have performance problems. this also makes the CPU unnecessary. of course, you will be very worried about using this Scheduler for complex application types.
The Noop scheduling algorithm is used to optimize a request when it is stored in a queue and processed by the I/O subsystem. this algorithm only applies to specific hardware (such as RAM disk and TCQ disk ). all modern disk controllers can be optimized using tagged command queuing. tagged command queuing (TCQ) can be used by the disk controller to re-sort I/O requests to reduce the head action. i/O requests that need to be restructured usually carry an identifier, so that the Controller will process these I/O requests according to the rules.
Some applications need to limit the queue length, while modern device drivers have the TCO function used to control the queue length, and this function can be used as a kernel parameter when the system starts. for example, to control the queue length of the SCSI drive lun2 to 64 requests, you can modify/etc/grub. conf and add the following kernel parameters: aic7xxx = tag_info: {0, 0, 64, 0, 0, 0 }}

Anticipatory (ELEVATOR = ):

Optimize the service time for read operations, and wait for a short time when one I/O is provided, so that the process can be submitted to another I/O. anticipatory scheduler (AS) was once an I/O scheduler of Linux 2.6 kernel. the Chinese meaning of anticipatory is "expected, expected". This word does reveal the characteristics of this algorithm. Simply put, when an I/O occurs, if another process requests an I/O operation, a default 6-millisecond prediction time will be generated to guess what the next process request I/O is. this will cause a relatively large latency for random reads, which is very bad for database applications, while for Web
Server. this algorithm can also be simply understood as for low-speed disks, because the "Guess" is actually designed to reduce the head moving time. therefore, this algorithm is more suitable for sequential read/write applications. the kernel parameters that can be adjusted include antic_expire, read_expire, and write_expire.

How to view and set the IO Scheduling Method in Linux
12 $ cat /sys/block/{DEVICE-NAME}/queue/scheduler$ cat /sys/block/sd*/queue/scheduler
Example: The output result is as follows:

1 noop anticipatory deadline [cfq]
Set current IO
12 $ echo {SCHEDULER-NAME} > /sys/block/{DEVICE-NAME}/queue/scheduler$ echo noop > /sys/block/hda/queue/scheduler
It is recommended that deadline I/O scheduler be used for Io scheduling. In this example, the deadline scheduling algorithm gets a shorter wait time by reducing performance. It uses a polling scheduler, Which is concise and compact, it provides minimal read latency and excellent throughput, and is especially suitable for reading a large number of environments (such as databases and Oracle 10 Gb ). the anticipatory I/O scheduler anticipatory algorithm improves performance by increasing the wait time. Assume that a block device has only one physical lookup head (such as a separate SATA hard disk ), merge multiple random lower-case streams into one upper-case stream (equivalent to changing the random read/write order). This principle is used to use the read/write latency in exchange for the maximum read/write throughput. it is applicable to most environments, especially reading and writing environments, such as file servers, web applications, and apps. We can adopt as scheduling. I will teach you how to adjust the merge wait time later. cfq I/O scheduler is a compromise on all factors and tries to obtain fairness. The QoS policy is used to allocate the same amount of bandwidth to all tasks, it can be considered as a compromise between the two schedulers to avoid process starvation and achieve low latency. applicable to multi-user systems with a large number of processes Anticipatory AdjustmentAccording to the above content, the most common algorithm in our algorithm is the anticipatory algorithm, which will write more content according to the time. So let's talk about the adjustable part of this parameter. in addition to modifying an algorithm to this algorithm

Disk Queue Length
/Sys/block/SDA/queue/nr_requests has only 128 queues by default, which can be increased to 512. memory usage is increased, but more read/write operations can be merged to reduce the speed, but more read/write operations can be performed.

Wait time
/Sys/block/SDA/queue/iosched/antic_expire

Read optimization parameters

The/sys/block/SDA/queue/read_ahead_kb parameter is very useful for sequential read, meaning that the amount of content read in advance at a time, no matter how much actually needed. by default, a read of KB is much smaller than the one to be read. Setting a larger value is very useful for reading large files, which can effectively reduce the number of read seek times. This parameter can be set using blockdev-setra, setra sets the number of slice, so the actual byte is divided by 2. For example, if 512 is set, it actually reads 256 bytes. several very effective Kernel Parameters for Io scheduling and adjustment

 

1 /proc/sys/vm/dirty_ratio

This parameter controls the size of the file system write buffer in the file system. The unit is the percentage, indicating the percentage of the system memory, indicating the amount of the system memory used by the write buffer, write Data to the disk. by increasing the size, you can use more system memory for disk write buffering and greatly improve the system write performance. however, when you need continuous and constant write admission, you should reduce the value. Generally, the default value is 10 at startup. the following method is used to increase the value: Echo '40'>/proc/sys/Vm/dirty_ratio

 

 

1 /proc/sys/vm/dirty_background_ratio

This parameter controls the pdflush process of the file system and when to refresh the disk. the Unit is the percentage, indicating the percentage of the system memory. This means that when the write buffer is used to the system memory, pdflush starts to write data to the disk. by increasing the size, you can use more system memory for disk write buffering and greatly improve the system write performance. however, when you need continuous and constant write admission, you should reduce the value. Generally, the default value is 5 at startup. the following method is used to increase the value: Echo '20'>/proc/sys/Vm/dirty_background_ratio

 

1 /proc/sys/vm/dirty_writeback_centisecs

This parameter controls the run interval of the kernel's dirty data refresh process pdflush. unit: 1/100 seconds. the default value is 500, that is, 5 seconds. if your system writes data continuously, it is better to reduce the value. In this way, you can split the peak write operations into multiple write operations. the setting method is as follows: Echo '000000'>/proc/sys/Vm/dirty_writeback_centisecs if your system is a short-term peak write operation, in addition, if the data is not written much (dozens of MB per time) and the memory is rich, the value should be increased:
Echo '000000'>/proc/sys/Vm/dirty_writeback_centisecs

1 /proc/sys/vm/dirty_expire_centisecs

This parameter declares that the data in the write buffer in the Linux kernel is too old, and the pdflush process starts to consider writing the data to the disk. unit: 1/100 seconds. the default value is 30000, that is, if the data of 30 seconds is old, the disk will be refreshed. for write operations with special overloading, it is also good to narrow down this value, but it cannot be reduced too much, because too much will lead to Io increase too quickly. we recommend that you set it to 1500, that is, 15 seconds to calculate the old value. echo '000000'>/proc/sys/Vm/dirty_expire_centisecs. Of course, if your system memory is large and the write mode is intermittent, and the data written each time is not big (for example, dozens of MB), so this value is better.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.