Iostat and iowait detailed explanations

Source: Internet
Author: User

Simply put, sar-u see the CPU utilization iowait not practical, iostat-x in the svctm and util parameters

Command form: Iostat-x 1

Output every second

the SVCTM parameter represents the average service time Per device I/O operation (in milliseconds), reflects the load on the disk, if the item is greater than 15MS, and util% is close to 100%, it means that the disk is now a bottleneck for the overall system performance . .

await the parameter represents the average wait time (in milliseconds) for each device I/O operation, as well as the SVCTM for reference. If the difference is too high, there must be an IO problem. if SVCTM is closer to await, it indicates that I/O has almost no wait time, and if the await is much larger than SVCTM, the I/O queue is too long and the application gets slower response times.


await value generally depends on the value of SVCTM and I/O Queue length and I/O request mode, if the value of SVCTM is close to await, indicates that there is little I/O waiting, disk performance is good, if the value of await is much higher than the value of SVCTM, the I/O queue waits too long , the applications running on the system will slow down, and the problem can be resolved by replacing a faster hard drive.
%util key is also an important metric for measuring disk I/O, and if the%util is close to 100%, indicating that the disk generates too many I/O requests and that the I/O system is already full-loaded, the disk may have bottlenecks. In the long run, it is bound to affect the performance of the system by optimizing the program or by replacing a higher, faster disk to resolve the problem


SVCTM a normal time around 20ms, reason:

high-speed CPU


%iowait does not respond to disk bottlenecks

Iowait is actually measured by CPU time:
%iowait = (CPU idle time)/(All CPU time)


Iostat to learn about Linux HDD IO performance

This parameter has not been used in the past. Now seriously study the Iostat, because there is an important server pressure is high, so put it up to analyze it. This is the server with an IO pressure.

$iostat-X 1
Linux 2.6.33-fukai (fukai-laptop)           _i686_     (2 CPU)
avg-cpu: %user  %nice%system%iowait %steal  %idle
5.47 & nbsp;  0.50    8.96   48.26    0.00   36.82

RRQM/S: The number of read operations per second for the merge. Delta (rmerge)/s
WRQM/S: The number of write operations per second for the merge. Delta (wmerge)/s
R/S: Number of Read I/O devices completed per second. Delta (RIO)/s
W/S: Number of write I/O devices completed per second. Delta (WIO)/s
RSEC/S: Number of Read sectors per second. Delta (rsect)/s
WSEC/S: Number of Write sectors per second. Delta (wsect)/s
rkb/s: Reads K bytes per second. is half the rsect/s because the size of each sector is 512 bytes. (Calculation required)
wkb/s: Writes K bytes per second. is half the wsect/s. (Calculation required)
Avgrq-sz: The average data size (sector) per device I/O operation. Delta (rsect+wsect)/delta (Rio+wio)
Avgqu-sz: Average I/O queue length. That is Delta (AVEQ)/s/1000 (because the Aveq is in milliseconds).
Await: The average wait time (in milliseconds) for each device I/O operation. Delta (ruse+wuse)/delta (Rio+wio)
SVCTM: The average service time (in milliseconds) per device I/O operation. Delta (use)/delta (RIO+WIO)
%util: How much time in a second is spent on I/O operations, or how many times in a second I/O queues are non-empty. That is, the delta (use)/s/1000 (because the unit of use is milliseconds)

If%util is close to 100%, it indicates that there are too many I/O requests, the I/O system is full, and the disk may have bottlenecks.

Idle less than 70% io pressure is larger, the general reading speed has more wait.

You can also combine vmstat to see the b parameter (the number of processes waiting for a resource) and the WA parameter (the percentage of CPU time that IO waits for, higher than 30% when the IO pressure is high)

In addition to await the parameters are more and SVCTM to reference. If the difference is too high, there must be an IO problem.

Avgqu-sz is also an IO tuning need to pay attention to, this is the direct operation of the size of the data, if the number of times, but the small amount of data, in fact, the IO will be very small. If the data is large, the IO data will be high. You can also pass AVGQU-SZX (r/s or w/s) = RSEC/S or wsec/s. That is to say, the speed of reading is determined by this.

In addition, you can also refer

SVCTM generally less than await (because the waiting time for waiting requests is repeatedly computed), the size of SVCTM is generally related to disk performance, cpu/memory load will have an impact on it, too many requests will indirectly lead to increased SVCTM. The size of an await typically depends on the service time (SVCTM) and the length of the I/O queue and the emit mode of the I/O request. If the SVCTM is closer to await, stating that I/O has almost no waiting time, if the await is much larger than SVCTM, the I/O queue is too long, the response time of the application gets slower, and if the response time exceeds the allowable range of the user, consider replacing the faster disk and adjusting the kernel Elev Ator algorithm, optimize the application, or upgrade the CPU.

The queue Length (AVGQU-SZ) can also be used as an indicator for measuring the system I/O load, but because Avgqu-sz is averaged over a unit time, it does not reflect instantaneous I/O flooding.

Someone else a good example (I/O system vs. supermarket queuing)

For example, how do we decide which checkout to go to when we queue up in the supermarket? The first is the number of teams to see the platoon, 5 people than 20 people faster? In addition to the number of heads, we also often look at what the front people buy things, if there is a purchase for a week, the food of the aunt, then you can consider changing a team platoon. There is the speed of the cashier, if the money is not clear to the novice, then there are waiting. In addition, the timing is also very important, perhaps 5 minutes before the overcrowded cash table, now is empty, this time the payment is very cool Ah, of course, the premise is that the past 5 minutes to do things than queued to make sense (but I have not found anything more boring than queuing).

I/O systems also have many similarities with supermarket queues:

r/s+w/s similar to the total number of people who have been

Average Queue Length (AVGQU-SZ) is similar to the number of average queueing people in a unit time

Average service time (SVCTM) is similar to the cashier's payment speed

Average wait time (await) is similar to the average wait time per person

Average I/O data (AVGRQ-SZ) is similar to the average number of things each person buys

The I/O operation rate (%util) is similar to the time scale at which a person is queued at the cashier.

We can analyze the mode of I/O requests based on these data, and the speed and response time of I/O.

The following is the analysis of the output of this parameter written by others

# iostat-x 1
AVG-CPU:%user%nice%sys%idle
16.24 0.00 4.31 79.44
device:rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkb/s wkb/s avgrq-sz avgqu-sz await SVCTM%util
/dev/cciss/c0d0
0.00 44.90 1.02 27.55 8.16 579.59 4.08 289.80 20.57 22.35 78.21 5.00 14.29

The Iostat output above indicates that there are 28.57 device I/O operations per second: Total io (IO)/s = r/s (read) +w/s (write) = 1.02+27.55 = 28.57 (Times/sec) where the write operation takes up the body (w:r = 27:1).

The average Per Device I/O operation takes only 5ms to complete, but each I/O request needs to wait for 78ms, why? Because there are too many I/O requests (about 29 per second), assuming that these requests are issued at the same time, the average wait time can be computed like this:

Average wait time = single I/O service time * (1 + 2 + ... + total requests-1)/Total requests

Apply to the above example: Average wait time = 5ms * (1+2+...+28)/29 = 70ms, and the average wait time for 78ms given by Iostat is very close. This in turn indicates that I/O is initiated concurrently.

The number of I/O requests per second (about 29), the average queue is not long (only 2 or so), indicating that the arrival of these 29 requests is uneven, most of the time I/O is idle.

14.29% of the time in a second I/O queue is requested, that is, 85.71% of the time I/O system has nothing to do, all 29 I/O requests are processed within 142 milliseconds.

Delta (ruse+wuse)/delta (IO) = await = 78.21 = Delta (ruse+wuse)/s =78.21 * Delta (IO)/s = 78.21*28.57 = 2232.8, indicating I/O in per second please A total of 2232.8ms is required to wait. So the average queue length should be 2232.8ms/1000ms = 2.23, while the average queue Length (Avgqu-sz) given by Iostat is 22.35, why?! Because the Bug,avgqu-sz value in Iostat should be 2.23, not 22.35


Iostat and iowait detailed explanations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.