Iostat that are easily misread

Source: Internet
Author: User

Iostat (1) is the most basic tool for viewing I/O performance on Linux systems, but it can be easily misread for those who are familiar with other Unix systems. For example, Avserv on HP-UX (equivalent to SVCTM on Linux) is the most important I/O metric, reflecting the performance of the hard disk device, which is the time it takes for I/O requests to be sent from the SCSI layer to the SCSI layer after I/O is complete, Not including the wait time in the SCSI queue, so avserv reflects the speed of the hard disk device processing I/O, also known as Disk Service timing, if the avserv is large, then there must be a hardware problem. However, the meaning of SVCTM on Linux is very different, in fact, on the man page of Iostat (1) and SAR (1) do not believe that SVCTM, the indicator will be discarded:
"Warning! Don't trust this field any more. This field is removed in a future Sysstat version. "

On Linux, the average time spent on each I/O is expressed as an await, but it does not reflect the performance of the hard disk device because the await includes not only the time the hard disk device handles I/O, but also the time it waits in the queue. I/O requests are not yet sent to the hard disk device in the queue, that is, the waiting time in the queue is not consumed by the hard disk device, so the await does not reflect the speed of the hard disk device, and the kernel issues such as the I/O scheduler can also cause the await to become larger. So are there any metrics that can measure the performance of a hard drive device? Unfortunately, Iostat (1) and SAR (1) did not, because the/proc/diskstats they relied on did not provide this data. To really understand the output of iostat, you should start with understanding/proc/diskstats.

123456789 # Cat/proc/diskstats   8        0 sda Span class= "CRAYON-CN" >239219 1806 37281259 2513275 904326 88832 50268824 26816609 0 4753060 29329105    8        1 sda1 338 0 53241 6959 154 0 5496 3724 0 6337 10683    8        2 sda2 238695 1797 37226458 2504489 620322 88832 50263328 25266599 0 3297988 27770221    8       16 sdb 1009117 481 1011773 127319 0 0 0 0 0 126604 126604    8       17 sdb1 1008792 480 1010929 127078 0 0 0 0 0 126363 126363 253       0 dm-0 1005 0 Span class= "CRAYON-CN" >8040 15137 30146 0 241168 2490230 Span class= "CRAYON-CN" >0 30911 2505369 253 1 DM-1 192791 0 35500457 2376087 359162 0 44095600 22949466 0 2312433 25325563 253       2 dm-2 47132 0 1717329 183565 Span class= "CRAYON-CN" >496207 0 5926560 7348763 0 2517753 7532688

/proc/diskstats has 11 fields, the following kernel document explains what they mean https://www.kernel.org/doc/Documentation/iostats.txt, I re-stated, note that except for the field # 9 are cumulative values, which accumulate from the start of the system:

  1. (Rd_ios) The number of read operations.
  2. (rd_merges) The number of times the read operation was merged. If two read operations read adjacent blocks of data, they can be combined into one to improve efficiency. The combined operation is usually responsible for I/O Scheduler (also called elevator).
  3. (rd_sectors) The number of sectors read.
  4. (rd_ticks) The elapsed time (in milliseconds) that the read operation was consumed. Each read is timed from __make_request () to End_that_request_last (), including the time it waits in the queue.
  5. (wr_ios) Number of write operations.
  6. (wr_merges) The number of merge write operations.
  7. (wr_sectors) The number of sectors written.
  8. (Wr_ticks) The time, in milliseconds, that the write operation consumes.
  9. (In_flight) The number of I/Os not currently completed. This value is added 1 when I/O requests enter the queue, minus 1 at the end of I/O.
    Note: When I/O requests enter the queue instead of being submitted to the hard disk device.
  10. (io_ticks) This device is used to process I/O natural time (wall-clock times).
    Note that the difference between Io_ticks and rd_ticks (Field # #) and Wr_ticks (Field # #) is that rd_ticks and wr_ticks add up to the time spent on each I/O because the hard disk device can often handle multiple I/O in parallel, so Rd_ Ticks and wr_ticks tend to be larger than natural time. Io_ticks indicates that the device has I/O (that is, non-idle) time, regardless of the number of I/O, only consider there is no. When the actual calculation, the field # # (in_flight) is not zero when io_ticks hold the time, the field of # (In_flight) is zero when io_ticks stop timing.
  11. (time_in_queue) The weighted value of the field # # (io_ticks). The field # # (io_ticks) is a natural time, regardless of the current number of I/O, and Time_in_queue is multiplied by the current amount of I/O (that is, the field # in-flight) times the natural time. Although the name of the field is Time_in_queue, it is not really just the time in the queue, which also contains the time that the hard disk handles I/O. This field is used by Iostat when calculating Avgqu-sz.

Iostat (1) is calculated on the basis of/proc/diskstats because/proc/diskstats does not separate the queue waiting time from the hard disk processing time, so it is impossible for any of the tools based on it to provide disk service separately Time and the value associated with the queue.
Note: In the formula below, "Δ" represents the difference between two samples, and "Δt" indicates the sampling period.

    • TPS: I/O times per second =[(Δrd_ios+δwr_ios)/δt]
      • r/s: Reads per second =[δrd_ios/δt]
      • w/s: Number of write operations per second =[ΔWR_IOS/ΔT]
    • rkb/s: Kilobytes read per second =[δrd_sectors/δt]*[512/1024]
    • wkb/s: bytes written per second =[δwr_sectors/δt]*[512/ 1024x768]
    • rrqm/s: Number of merged read operations per second =[ΔRD_MERGES/ΔT]
    • wrqm/s: Number of merge writes per second =[δwr_merges/δt]
    • avgrq-sz: each I /O average number of sectors =[δrd_sectors+δwr_sectors]/[δrd_ios+δwr_ios]
    • Avgqu-sz: Average number of outstanding I/O requests =[δtime_in_queue/δt]
      (the manual says the average number of I/O requests in the queue, and a more appropriate understanding should be the average number of outstanding I/O requests.)
    • await: Average time required for each I/O =[δrd_ticks+δwr_ticks]/[δrd_ios+δwr_ios]
      (includes not only the time the hard disk device handles I/O, but also the time it waits in the kernel queue.)

       

      • r_await: The average time required for each read operation =[δrd_ticks/δrd_ios]
        includes not only the time of the hard disk device read operation, but also the time waiting in the kernel queue.
      • w_await: The average time required for each write operation =[δwr_ticks/δwr_ios]
        includes not only the time of the hard disk device write operation, but also the time waiting in the kernel queue.
    • %util: The busy ratio of the hard disk device =[δio_ticks/δt]
      indicates that the device has I/O (that is, non-idle) time ratio, regardless of the number of I/O, only consider whether there is.
    • SVCTM: deprecated metrics, no meaning, Svctm=[util/tput]

The proper interpretation of iostat (1) helps to analyze the problem correctly, and we discuss it in the actual case.

About rrqm/s and wrqm/s

As mentioned earlier, if two I/O operations occur on adjacent blocks of data, they can be combined into one to increase efficiency, and the combined operation is usually responsible for I/O Scheduler (also called elevator).

The following cases perform the same stress tests on many hard disk devices, and the result is that SDB is faster than the other hard drives, but the hard drive models are the same, why does SDB behave differently?

You can see that the rrqm/s of the other hard drives are 0, and SDB is not, that is, I/O merge, so more efficient, r/s and rmb/s are higher, we know I/O merge is the kernel I/O Scheduler (Elevator) responsible, so check the SDB/sys /block/sdb/queue/scheduler, found it with other hard drives with different I/O scheduler, so the performance is not the same.

%util and hard disk device saturation

%util indicates that the device has an I/O (that is, non-idle) time ratio, regardless of the number of I/O, only consider there is no. Since modern hard disk devices have the ability to process multiple I/O requests in parallel,%util does not mean that the device is saturated even if it reaches 100%. To give a simplified example: a hard disk processing a single I/O takes 0.1 seconds, the ability to process 10 I/O requests at the same time, when 10 I/O requests are sequentially submitted in sequence, it takes 1 seconds to complete all, in 1 seconds of the sampling period%util to 100%, and if 10 i/ O Request a one-time commit, 0.1 seconds to complete, in the 1-second sampling period of%util only 10%. It can be seen that even with%util up to 100%, the hard disk still has the potential to handle more I/O requests without saturation. So Iostat (1) Is there any indicator that can measure the saturation of a hard disk device? I'm sorry, no.

How big is the await.

Await is the time consumed by a single I/O, including the time that the hard disk device handles I/O and the time that I/O requests are waiting in the kernel queue, and normally the queue waiting time is negligible, so wait is considered an indicator of the speed of the hard drive, so how much is normal?
For SSDs, ranging from 0.0x milliseconds to 1.x milliseconds, see product manuals;
For mechanical hard drives, you can refer to the calculation methods in the following documents:
Http://cseweb.ucsd.edu/classes/wi01/cse102/sol2.pdf
Roughly speaking, a 10,000 rpm mechanical hard drive is 8.38 milliseconds, including seek time, rotation delay, transmission time.

In practice, according to the application scenario to determine whether the await is normal, if I/O mode is random, I/O load is relatively high, will lead to the head running, seek long, then the corresponding await to be estimated larger; if the I/O pattern is sequential read-write, only a single process generates I/O load, Then the seek time and rotation delay can be negligible, mainly considering the transmission time, the corresponding await should be very small, even less than 1 milliseconds. In the following instances, the await is 7.50 milliseconds and does not seem large, but given that this is a DD test, which is a sequential read operation, and that only a single task is on the hard disk, the await here should be less than 1 milliseconds to be normal:

12 Device:Rrqm/SWrqm/S R/s w/s rsec/s wsec/s avgrq-s Z avgqu-sz await SVCTM %util Sdg 0.00 0.00 133.00 0.00 2128.00 0.00 16.00 c14>1.00 7.50 7.49 99.60

For a disk array, because there is a hardware cache, write operations are not equal to the disk is completed, so the write operation of the service time greatly accelerated, if the disk array write operation is not within one or two milliseconds or slower, read operation is not necessarily, not the data in the cache still need to read the physical hard disk, The reading speed of a single small chunk of data is similar to Flixbox.

Iostat that are easily misread

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.