The use of the Linux Cgroup Blkio subsystem.
Two IO Isolation strategies supported by the Blkio subsystem
.1. (Completely Fair Queuing complete Fair queue) Cfq IO scheduling policy, support the allocation of time slices of IO processing by weight, so as to achieve the purpose of direct scheduling and limitation of IO in each resource group, the weight value range of 100-1000.
Configured with the following two files.
blkio.weight 默认值blkio.weight_device 块设备级的值 (major:minor weight) (优先级高于blkio.weight)
Example
500 > blkio.weightecho 8:0 500 > blkio.weight_device
The CFQ Scheduler and other IO Scheduler are described in the kernel documentation.
documentation/block/
.2. Limit IOPS usage Caps
Examples are as follows
bytes /secho "8:0 10485760" >/cgroup/blkio/test/blkio.throttle< Span class= "Hljs-class" >.read_bps_deviceio/secho "8:0" >/cgroup/blkio/test/blkio< Span class= "Hljs-class" >.throttle.read_iops_devicebytes/secho "8:0 10485760" >/cgroup/blkio/test/blkio.throttle.write_bps_deviceio/secho "8:0" >/cgroup/blkio/test/blkio.throttle.write_iops_device
These two resource restriction means each has the characteristic, the Cfq method is fairly, does not interfere with each other, in the case of ensuring the lowest use IO ratio, if the IO device is idle, can also overrun the use. (one sentence is to guarantee the lowest IOPS)
To limit the IOPS limit, all groups add up to iops that exceed the maximum ioPS metric for a block device, but the downside is that if too many of the maximum metrics are exceeded, the SLA cannot be guaranteed. (one sentence is to limit the highest IOPS)
Note that the current Blkio subsystem does not count buffered write operations, only the operation of the direct I/O is counted. But buffered read is counted.
Statistical Report of the Blkio subsystem
blkio.throttle.io_serviced reports ioPS, including in the current queue. Major, Minor, operation (ReadWriteSyncor async),The and Numbers.blkio.throttle.io_service_bytes report BPS, which is included in the current queue. Major, Minor, operation (ReadWriteSyncor async),and bytes.blkio.time reporting device I/O usage time major, minor,and time. (MS) Blkio.sectors reports read-in or read-out number of sectors major, minor,and Sectors.blkio.avg_queue_size report the average queue size of the block device (kernel config_debug_blk_cgroup=Y must set this macro) Blkio.group_wait_time the total wait time (NS) of the report queue wait time slice (kernel config_debug_blk_cgroup=Y must set this macro) Blkio.empty_time report block device idle time (no pending request) (NS) (Kernel config_debug_blk_cgroup=Y must set this macro) Blkio.idle_time reporting block device idle time (waiting for a request to merge?) (NS) (Kernel config_debug_blk_cgroup=Y must set this macro) Blkio.dequeue report block device request dequeue number major, minor,andNumber (Kernel config_debug_blk_cgroup=Y must set this macro) blkio.io_serviced report iops,major, Minor, operation (ReadWriteSyncor async),and numbers. Does not contain the current queue. Blkio.io_service_bytes report Bps,major, Minor, operation (ReadWriteSyncor async),and bytes. Does not contain the current queue. Blkio.io_service_time reports the time from sending IO requests to IO completion. Major, Minor, operation (Readwrite, sync, or async), and Time (NS). Blkio.io_wait_time reports that the IO wait time may exceed the total timeline because there may be multiple IO requests waiting at the same time. Major, Minor, operation (read, write, sync, or async), and Time (NS). Blkio.io_merged report the number of Io merges, operation (read, write, sync, or async), and Numbers.blkio.io_ Queued reported the number of IO into row, operation (read, write, sync, or async), and numbers.
examples of use of the Blkio subsystem
.1. Limit the use of IO using the weighting method.
Mount the Blkio subsystem mount the Blkio subsystem:~]# mount-t Cgroup-o Blkio blkio/cgroup/blkio/Creating two resource groups create two cgroupsFor the Blkio subsystem:~]# mkdir/cgroup/blkio/test1/~]# mkdir/cgroup/blkio/test2/Set IO weight per group set Blkio weights in the previously created cgroups:~]# echo>/cgroup/blkio/test1/blkio.weight~]# echo>/cgroup/blkio/test2/blkio.weight Create two files of the same size ~]# DDIf=/dev/zero Of=file_1 bs=1M count=4000~]# DDIf=/dev/zero of=file_2 bs=1M count=4000 swipe out of the file's page cache ~]# sync~]# echo3 >/proc/sys/vm/drop_caches Start the process of two read files, respectively, in two resource groups. ~]# Cgexec-g Blkio:test1 time DDIf=file_1 of=/dev/null~]# Cgexec-g Blkio:test2 time DDif=file_2 of=/dev/null View Iotop, you can see the weight evenly.# iotoptotal DISK READ:83.16 m/s | Total DISK write: 0.00 b/S time TID PRIO USER disk READ DISK write Swapin IO command15:18:04 15071 be/< Span class= "Hljs-number" >4 root 27.64 m/s 0.00 b/S 0.00% 92.30% dd if=file_2 Of=/dev/null15:18:04 15069 be/< Span class= "Hljs-number" >4 root 55.52 m/s 0.00 b/S 0.00% 88.48% dd if=file_1 of=/dev/null
# ll/dev/|grep "^b" BRW-RW----1 root CDROM 11, 0 June 10 15:50 sr0brw-rw----1 root disk 253, 0 June 10 15:50 vdabrw-rw----1 root disk 253, 1 June 10 15:50 vda1
CFQ The fairness of dispatching
for the method of weighting, if you want to play a fair dispatch on discrete I/O operations, you must open the Group_ of the block device Isolation settings.
when group isolation is disabled, fairness can being expected only for a sequential workload.
by default, group isolation is enabled and fairness can being expected for random I/O workloads as well.
to Enable group isolation, use the following command:
1 > /sys/block/<disk_device>/queue/iosched/group_isolation
"Reprint" Linux Cgroup resource isolation conquer-IO Isolation