Linux IO scheduling

Source: Internet
Author: User
Tags dmesg

I/O scheduling algorithm plays the role of referee when competing disk I/O for each process. He asked for the order and timing of the requests to be optimally handled in order to achieve the best possible overall I/O performance.
There are 4 scheduling algorithms listed under Linux
CFQ (Completely Fair Queuing perfectly Fair line) (ELEVATOR=CFQ):


This is the default algorithm, which is usually the best choice for a common server. It attempts to evenly distribute access to I/O bandwidth. In multimedia applications, audio and video are always guaranteed to read from disk in a timely manner. But it's also good for other types of applications. One queue per process, and each queue merges and sort according to the above rules. Round robin schedules between processes, executing 4 requests per process at a time. Can be adjusted queued and quantum to optimize


Deadline (Elevator=deadline):
This algorithm attempts to minimize the latency of each request. The algorithm re-ordered the request to improve performance. You can adjust the queue's outdated read-write process, such as Read_expire and write_expire two parameters to control how long the data must be read, timeout to discard the sort. Compare the appropriate small files. You can also use the open front_merges to merge adjacent files.




NOOP (Elevator=noop):
The I/O request is assigned to the queue, and the dispatch is performed by the hardware, only when the CPU clock frequency is relatively limited.
NoOp is less concerned with I/O, and all I/O requests are processed in a FIFO queue, with the default being that I/O does not have a performance problem. This also makes the CPU less worried. Of course, for a more complex application type using this scheduler, users will be very worried about themselves.
The NoOp scheduling algorithm is optimized by the disk hardware when the request is stored in the queue and processed by the I/O subsystem. This algorithm is generally only for some specific hardware (such as RAM disk and TCQ disk, etc.). Modern disk controllers are equipped with the ability to optimize with tagged command queuing. Tagged command Queuing (TCQ) can reduce the action of the head by reordering the I/O requests by the disk controller. I/O requests that usually need to be reorganized will have an identifier, so that the controller will process the I/O requests as they are received.
Some applications need to limit queue length, and modern device drivers have TCO capabilities to control queue lengths, and this function can be added as a kernel parameter when the system is booted. For example, to control the queue length of a SCSI drive Lun2 of 64 requests, you can modify the/etc/grub.conf and add the following kernel parameters: aic7xxx=tag_info:{{0,0,64,0,0,0,0}}


Anticipatory (Elevator=as):
Optimize the service time for read operations and wait a short time while providing an I/O, enabling the process to commit to additional I/O. Anticipatory scheduler (AS) was once the I/O scheduler of the Linux 2.6 kernel. Anticipatory's Chinese meaning is "expected, expected", the word does reveal the characteristics of this algorithm, simply said that when there is an I/O occurs, if there is a process request I/O operation, it will produce a default 6 millisecond guessing time, guess the next process request I/O is what to do. This can cause a large delay for random reads, bad for database applications, and a good performance for Web servers. This algorithm can also be easily understood to target low-speed disks, because that "guess" is actually intended to reduce the head movement time. This algorithm is therefore more suitable for sequential read and write applications. The kernel parameters that can be used for tuning are Antic_expire, Read_expire, and Write_expire.


How to view and set the IO scheduling method in Linux
View current IO
Cat/sys/block/{device-name}/queue/scheduler
Cat/sys/block/sd*/queue/scheduler
Example: The output results are as follows
NoOp anticipatory deadline [CFQ]

Set Current IO
echo {Scheduler-name} >/sys/block/{device-name}/queue/scheduler
echo NoOp >/sys/block/hda/queue/scheduler

Recommendations for the use of IO scheduling
Deadline I/O Scheduler
In this deadline scheduling algorithm by reducing performance to achieve a shorter wait time, it uses a polling scheduler, simple and small, provides the minimum read latency monk good throughput, especially suitable for reading more environment (such as database, Oracle 10G, etc.).

Anticipatory I/O Scheduler
The anticipatory algorithm provides higher performance by increasing the wait time, assuming that a block device has only one physical lookup head (such as a single SATA hard drive), merging multiple random small write streams into a single uppercase stream (equivalent to reading and writing to random read and write sequences). Use this principle to exchange read-write latencies for maximum read-write throughput. Suitable for most environments, especially those that read and write more, such as file servers, Web applications, apps and so on, we can adopt as scheduling. I'll teach you how to tune the wait time for this later.

CFQ I/O Scheduler
This is a compromise of all factors, as much as possible fairness, the use of QoS policies for all tasks to allocate an equal amount of bandwidth, to avoid process starvation and achieve a low latency, can be considered as a compromise between the two schedulers. For multi-user systems with a large number of processes

Anticipatory adjustment

According to the above content, our algorithm may use the most is the anticipatory algorithm, will be based on the time to more rows of content in writing, so the following talk about this parameter can be adjusted part.
In addition to the algorithm modified to this algorithm, the impact of it is also


Disk Queue Length
The/sys/block/sda/queue/nr_requests default is only 128 queues, which can be increased to 512. will be more memory intensive, but can be more read and write operations, slower, but can read and write more than the volume


Wait time
/sys/block/sda/queue/iosched/antic_expire how long to wait for new requests generated nearby





Parameters for read-optimized
/sys/block/sda/queue/read_ahead_kb
This parameter is useful for sequential reads, meaning how much content to read in advance, regardless of the actual need. The default time to read 128kb is much smaller than to read, set larger to read large files is very useful, can effectively reduce the number of read seek, this parameter can be set by using Blockdev–setra, Setra set how many sectors, so the actual byte is divided by 2, such as set 512, is actually read 256 bytes.
Several very effective IO scheduling tuning kernel parameters
/proc/sys/vm/dirty_ratio
This parameter controls the size of the file system write buffer for the filesystem, in percent, representing the percentage of system memory, indicating how much of the memory is used to write data to disk. Increased use of more system memory for disk write buffering can also greatly improve the write performance of the system. However, when you need a continuous, constant write situation, you should lower its value, generally starting on the default is 10. Here's how to increase it: Echo ' 40′>/proc/sys/vm/dirty_ratio


/proc/sys/vm/dirty_background_ratio
This parameter controls the Pdflush process of the file system and when the disk is refreshed. The unit is a percentage that represents the percentage of system memory, meaning that when the write buffer is used to the amount of system memory, Pdflush begins writing data to the disk. Increased use of more system memory for disk write buffering can also greatly improve the write performance of the system. However, when you need a continuous, constant write situation, you should lower its value, generally starting on the default is 5. Here's how to increase it: Echo ' 20′>/proc/sys/vm/dirty_background_ratio


/proc/sys/vm/dirty_writeback_centisecs
This parameter controls the run interval of the kernel's dirty data refresh process Pdflush. The unit is 1/100 seconds. The default value is 500, which is 5 seconds. If your system is continuously writing to the action, then actually it is better to lower this value, so that the spike write operation can be flattened into multiple writes. The Setup method is as follows: Echo ' 200′>/proc/sys/vm/dirty_writeback_centisecs If your system is short-term spike-type write operations, and write data is small (dozens of m/times) and memory is more affluent, Then you should increase this value: Echo ' 1000′>/proc/sys/vm/dirty_writeback_centisecs


/proc/sys/vm/dirty_expire_centisecs
This parameter declares that the Pdflush process begins to consider writing to disk when the data in the Linux kernel write buffer is "old". The unit is 1/100 seconds. The default is 30000, which means that 30 seconds of data is old and will flush the disk. For specially overloaded writes, it is good to shrink the value appropriately, but it does not shrink too much, because too much narrowing can cause the IO to improve too quickly. The recommended setting is 1500, which is 15 seconds old. Echo ' 1500′>/proc/sys/vm/dirty_expire_centisecs of course, if your system memory is large, and the write mode is intermittent, and the data written every time is small (say dozens of M), then this value is better.


View current system-supported IO scheduling Algorithms DMESG | Grep-i Scheduler
[Email protected] ~]# DMESG | Grep-i Scheduler IO Scheduler noop registered IO Scheduler anticipatory registered IO Scheduler deadline registered IO SC Heduler CFQ Registered (default) view the current system's I/O scheduling method: Cat/sys/block/sda/queue/scheduler noop anticipatory deadline [CFQ] pro-change I/O scheduling method:
For example: To change to noop elevator scheduling algorithm:
echo noop >/sys/block/sda/queue/scheduler Want to change the I/O scheduling method permanently:
To modify the kernel boot parameters, add the elevator= scheduler name Vi/boot/grub/menu.lst change to the following:
Kernel/boot/vmlinuz-2.6.18-8.el5 ro root=label=/elevator=deadline rhgb quiet
After rebooting, review the scheduling method:
Cat/sys/block/sda/queue/scheduler NoOp anticipatory [deadline] CFQ is already deadline.

Linux IO scheduling

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.