Linux I/O scheduling

Source: Internet
Author: User

A) I/O Scheduler summary

1) When the data block is written to the device or read from the device, the request is placed in a queue waiting to be completed.
2) Each block device has its own queue.
3) The I/O Scheduler is responsible for maintaining the order of these queues to make more efficient use of the media. The I/O scheduler turns unordered I/O operations into ordered I/O operations.
4) The kernel must first determine how many requests are in the queue before it starts scheduling.

II) 4 Algorithms for I/O scheduling

1) CFQ (Completely Fair Queuing, complete fair line)

Characteristics:
In the latest kernel version and release, CFQ is chosen as the default I/O scheduler, which is also the best choice for a common server.
CFQ attempts to evenly distribute access to I/O bandwidth, avoiding starvation of processes and achieving lower latencies, which is a tradeoff between the deadline and as schedulers.
CFQ is the best choice for multimedia applications (Video,audio) and desktop systems.
The CFQ gives the I/O request a priority, and the I/O priority requests are independent of the process priority, and the high-priority process reads and writes cannot automatically inherit high I/O priorities.

Working principle:
CFQ creates a separate queue for each process/thread to manage the requests generated by the process, i.e. one queue per process, scheduling between queues using time slices to ensure that each process is well allocated to the I/O bandwidth. The I/O Scheduler executes 4 requests per process at a time.


2) NOOP (Elevator-type dispatch program)

Characteristics:
In Linux2.4 or earlier versions of the scheduler, there was only one I/O scheduling algorithm at that time.
NoOp implements a FIFO queue that organizes I/O requests like the work of the elevator, and when a new request arrives, it merges the request into the most recent request to ensure that the same media is requested.
NoOp tends to starve to read while writing.
NoOp is the best choice for flash memory devices, RAM, and embedded systems.

The elevator algorithm starved read request explanation:
Because writing requests is easier than reading requests.
Write requests through the file system cache, do not have to wait a write completion, you can start the next write operation, write requests by merging, piled up into the I/O queue.
The read request needs to wait until all of the read operations in front of it have completed before the next read operation. There are a few milliseconds between reads, and the write request comes in between, starving the subsequent read request.

3) Deadline (Deadline scheduler)

Characteristics:
This classification and merging requirements are similar to the NOOP scheduler, which is categorized by time and hard drive area.
Deadline ensures that a service request is made within a cutoff time, which is adjustable and the default read period is shorter than the write period. This prevents the write operation from starving to death because it cannot be read.
Deadline is the best choice for the database environment (ORACLE rac,mysql, etc.).


4) as (expected I/O scheduler)

Characteristics:
Essentially the same as deadline, but after the last read operation, wait for 6ms to continue scheduling other I/O requests.
You can subscribe to a new read request from the application to improve the execution of the read operation, but at the expense of some write operations.
It inserts new I/O operations into each 6ms, while some small write streams are combined into an uppercase stream, with write latencies for maximum write throughput.
As is suitable for writing more environments, such as file servers
As has a poor performance on the database environment.

iii) View and setup of I/O scheduling methods

1) View the I/O schedule for the current system

[Email protected] tmp]# Cat/sys/block/sda/queue/scheduler
NoOp anticipatory deadline [CFQ]

2) temporarily change I/O scheduling
For example: To change to noop elevator scheduling algorithm:
echo NoOp >/sys/block/sda/queue/scheduler

3) Permanently change I/O scheduling
Modify kernel boot parameters, add elevator= Scheduler name
[Email protected] tmp]# Vi/boot/grub/menu.lst
Change to the following:
Kernel/boot/vmlinuz-2.6.18-8.el5 ro root=label=/elevator=deadline rhgb quiet

After rebooting, review the scheduling method:
[Email protected] ~]# Cat/sys/block/sda/queue/scheduler
NoOp anticipatory [deadline] Cfq
It's already deadline.


IV) Testing of I/O Scheduler

This test is divided into read-only, write-only, read-write at the same time, respectively, to a single file 600MB, read and write 2M, total read and write 300 times.

1) test Disk read
[Email protected] tmp]# echo Deadline >/sys/block/sda/queue/scheduler
[[Email protected] tmp]# time DD if=/dev/sda1 f=/dev/null bs=2m count=300
300+0 Records in
300+0 Records out
629145600 bytes (629 MB) copied, 6.81189 seconds, 92.4 MB/s

Real 0m6.833s
User 0m0.001s
SYS 0m4.556s


[Email protected] tmp]# echo noop >/sys/block/sda/queue/scheduler
[[Email protected] tmp]# time DD if=/dev/sda1 f=/dev/null bs=2m count=300
300+0 Records in
300+0 Records out
629145600 bytes (629 MB) copied, 6.61902 seconds, 95.1 MB/s

Real 0m6.645s
User 0m0.002s
SYS 0m4.540s


[Email protected] tmp]# echo anticipatory >/sys/block/sda/queue/scheduler
[[Email protected] tmp]# time DD if=/dev/sda1 f=/dev/null bs=2m count=300
300+0 Records in
300+0 Records out
629145600 bytes (629 MB) copied, 8.00389 seconds, 78.6 MB/s

Real 0m8.021s
User 0m0.002s
SYS 0m4.586s


[Email protected] tmp]# echo CFQ >/sys/block/sda/queue/scheduler
[[Email protected] tmp]# time DD if=/dev/sda1 f=/dev/null bs=2m count=300
300+0 Records in
300+0 Records out
629145600 bytes (629 MB) copied, 29.8 seconds, 21.1 MB/s

Real 0m29.826s
User 0m0.002s
SYS 0m28.606s

Results:
First NoOp: 6.61902 seconds, speed of 95.1mb/s
Second deadline: Used for 6.81189 seconds, speed of 92.4mb/s
Third anticipatory: Used for 8.00389 seconds, speed of 78.6mb/s
IV CFQ: Took 29.8 seconds, speed of 21.1mb/s


2) test Write disk
[Email protected] tmp]# echo CFQ >/sys/block/sda/queue/scheduler
[[Email protected] tmp]# time DD If=/dev/zero f=/tmp/test bs=2m count=300
300+0 Records in
300+0 Records out
629145600 bytes (629 MB) copied, 6.93058 seconds, 90.8 MB/s

Real 0m7.002s
User 0m0.001s
SYS 0m3.525s

[Email protected] tmp]# echo anticipatory >/sys/block/sda/queue/scheduler
[[Email protected] tmp]# time DD If=/dev/zero f=/tmp/test bs=2m count=300
300+0 Records in
300+0 Records out
629145600 bytes (629 MB) copied, 6.79441 seconds, 92.6 MB/s

Real 0m6.964s
User 0m0.003s
SYS 0m3.489s

[Email protected] tmp]# echo noop >/sys/block/sda/queue/scheduler
[[Email protected] tmp]# time DD If=/dev/zero f=/tmp/test bs=2m count=300
300+0 Records in
300+0 Records out
629145600 bytes (629 MB) copied, 9.49418 seconds, 66.3 MB/s

Real 0m9.855s
User 0m0.002s
SYS 0m4.075s

[Email protected] tmp]# echo Deadline >/sys/block/sda/queue/scheduler
[[Email protected] tmp]# time DD If=/dev/zero f=/tmp/test bs=2m count=300
300+0 Records in
300+0 Records out
629145600 bytes (629 MB) copied, 6.84128 seconds, 92.0 MB/s

Real 0m6.937s
User 0m0.002s
SYS 0m3.447s

Test results:
The first anticipatory, took 6.79441 seconds, the speed is 92.6mb/s
The second deadline, took 6.84128 seconds, the speed is 92.0mb/s
The third CFQ, took 6.93058 seconds, the speed is 90.8mb/s
The four noop, took 9.49418 seconds, the speed is 66.3mb/s


3) test simultaneous read/write

[Email protected] tmp]# echo Deadline >/sys/block/sda/queue/scheduler
[[email protected] tmp]# dd if=/dev/sda1 f=/tmp/test bs=2m count=300
300+0 Records in
300+0 Records out
629145600 bytes (629 MB) copied, 15.1331 seconds, 41.6 MB/s

        [[email protected] tmp]# echo cfq >/SYS/BLOCK/SDA /queue/scheduler 
        [[email protected] tmp]# DD if=/dev/sda1 F=/tmp/test bs=2m count=300
        300+0 Records in
         300+0 Records out
        629145600 bytes (629 MB) Copied, 36.9544 seconds, 17.0 MB/s

        [[email protected] tmp]# echo anticipatory >/sys/ block/sda/queue/scheduler 
        [[email protected] tmp]# DD if= /DEV/SDA1 f=/tmp/test bs=2m count=300
        300+0 Records in
         300+0 Records out
        629145600 bytes (629 MB) copied, 23.3617 seconds, 26.9 MB/s

[Email protected] tmp]# echo noop >/sys/block/sda/queue/scheduler
[[email protected] tmp]# dd if=/dev/sda1 f=/tmp/test bs=2m count=300
300+0 Records in
300+0 Records out
629145600 bytes (629 MB) copied, 17.508 seconds, 35.9 MB/s

Test results:
The first deadline, took 15.1331 seconds, the speed is 41.6mb/s
The second noop, took 17.508 seconds, the speed is 35.9mb/s
The third anticipatory, took 23.3617 seconds, the speed is 26.9ms/s
The four cfq, took 36.9544 seconds, the speed is 17.0mb/s

V) ionice

Ionice can change the type and priority of a task, but only the CFQ scheduler can use Ionice.
There are three examples to illustrate the function of Ionice:
Real-time scheduling with CFQ with a priority of 7
Ionice-c1-n7-ptime DD if=/dev/sda1 f=/tmp/test bs=2m count=300&
Default disk I/O scheduling with a priority of 3
Ionice-c2-n3-ptime DD if=/dev/sda1 f=/tmp/test bs=2m count=300&
Takes a free disk schedule with a priority of 0
Ionice-c3-n0-ptime DD if=/dev/sda1 f=/tmp/test bs=2m count=300&

Ionice Three scheduling methods, the highest real-time scheduling, followed by the default I/O scheduling, and finally the idle disk scheduling.
Ionice has 8 priority disk scheduling, with a maximum of 0 and a minimum of 7.
Note that the priority of disk scheduling is not related to the priority of the process Nice.
One is the priority for process I/O and one for the CPU of the process.

Linux I/O scheduling

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.