Iostat is used to check the I/O performance of linux hard disks. recently, the company has installed several DELLPE2650 and 2850 Servers. the RHLE5.132-bit system is installed in a unified manner, while the SCSI hard disks of the servers are all RAID 1. The boss of the company asked for a unified detection report on hard disk I/O. he found many tools in Linux and found that iostat is the most practical. He needs to install sysstat first, that is, yum-yinstallsysstat; the yum server in the company cannot be set up.
Iostat to check the linux hard disk I/O performance
Recently, the company has installed several DELL PE2650 and 2850 Servers. the RHLE5.132 system is installed in a unified manner, while the server's SCSI hard disks are all RAID 1. The boss of the company asked for a unified detection report on hard disk I/O, found a lot of tools in Linux, found the most practical is iostat, this requires first install sysstat, that isYum-y install sysstatThe establishment of yum servers in the company is not the focus of this article.
# Iostat-x 1 10
Linux 2.6.18-92. el5xen 03/01/2010
Avg-cpu: % user % nice % system % iowait % steal % idle
1.10 0.00 4.82 39.54 0.07 54.46
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm % util
Sda 0.00 3.50 0.40 2.50 5.60 48.00 18.48 0.00 0.97 0.97 0.28
Sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Sde 0.00 0.10 0.30 0.20 2.40 2.40 9.60 0.00 1.60 1.60 0.08
Sdf 17.40 0.50 102.00 0.20 12095.20 5.60 118.40 0.70 6.81 2.09 21.36
Sdg 232.40 1.90 379.70 0.50 76451.20 19.20 201.13 4.94 13.78 2.45 93.16
Rrqm/s:The number of merge read operations per second. That is, delta (rmerge)/s
Wrqm/s:The number of write operations performed by merge per second. That is, delta (wmerge)/s
R/s:The number of read I/O devices per second. That is, delta (rio)/s
W/s:The number of write I/O devices per second. That is, delta (wio)/s
Rsec/s:Number of read sectors per second. That is, delta (rsect)/s
Wsec/s:Number of write sectors per second. That is, delta (wsect)/s
RkB/s:The number of bytes read per second. It is half of rsect/s because the size of each slice is 512 bytes. (Computing required)
WkB/s:Number of bytes written per second. Half of wsect/s. (Computing required)
Avgrq-sz:Average data size (slice) of each device I/O operation ). Delta (rsect + wsect)/delta (rio + wio)
Avgqu-sz:Average I/O queue length. That is, delta (aveq)/s/1000 (because aveq is measured in milliseconds ).
Await:The average wait time (in milliseconds) for each device I/O operation ). That is, delta (ruse + wuse)/delta (rio + wio)
Svctm:The average service time (in milliseconds) for each device I/O operation ). That is, delta (use)/delta (rio + wio)
% Util:The amount of time in one second is used for I/O operations, or the number of I/O queues in one second is not empty. That is, delta (use)/s/1000 (because the unit of use is milliseconds)
If % util is close to 100%, it indicates that too many I/O requests are generated and the I/O system is fully loaded.
There may be bottlenecks. the idle is under a high I/O pressure of less than 70%, and the reading speed is usually wait.
You can also view the parameters B (number of processes waiting for resources) and wa in combination with vmstat (percentage of CPU time occupied by I/O wait, higher than 30% when I/O pressure is high)
For more information, see
General:
Svctm <await (because the wait time of the simultaneously waiting request is calculated repeatedly ),
The size of svctm is generally related to disk performance.: The CPU/memory load will also affect it, and too many requests will indirectly lead to the increase of svctm.
Await: the size of await generally depends on the service time (svctm), the length of the I/O queue, and the mode in which I/O requests are sent.
If svctm is close to await, I/O has almost no waiting time;
If await is much larger than svctm, it means that the I/O queue is too long and the response time of the application is slow.
If the response time exceeds the user's allowable range, you can consider replacing a faster disk and adjusting the kernel.Elevator algorithm, application optimization, or CPU upgrade.
The queue length (avgqu-sz) can also be used as an indicator to measure the system I/O load. However, because avgqu-sz is based on the average per unit time, therefore, it cannot reflect the instantaneous I/O flood.
A good example for others (I/O system vs supermarket queue)
For example, how can we decide which payment platform to pay when queuing for checkout at the supermarket? First, let's look at the number of people in the queue. is the total number of five people faster than 20? In addition to the number of people, we often look at the number of items purchased by the previous person. if there is a big mom who has purchased food for a week, we can consider changing the team. There is also the cashier's speed. if you have a newbie who doesn't even know about the money, you will have to wait. In addition, the timing is also very important. it may be 5 minutes ago, but the money was crowded, and now people go to the building, but it is refreshing to pay the money. of course, the premise is that what has been done in the past five minutes is more meaningful than queuing (but I have not found anything boring than queuing ).
I/O systems have many similarities with supermarket queues:
R/s + w/s is similar to the total number of payers
The average queue length (avgqu-sz) is similar to the number of average queues per unit time.
The average service time (svctm) is similar to the cashier's collection speed
The average wait time (await) is similar to the average wait time of each person
Average I/O data (avgrq-sz) is similar to the average number of items bought by each person
The I/O operation rate (% util) is similar to the time ratio in which someone queues before the cashier.
We can analyze the I/O request mode and the I/O speed and response time based on the data.
The following is an analysis of the output of this parameter written by someone else.
# Iostat-x 1
Avg-cpu: % user % nice % sys % idle
16.24 0.00 4.31 79.44
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm % util
/Dev/cciss/c0d0
0.00 44.90 1.02 27.55 8.16 579.59 4.08 289.80 20.57 22.35 78.21 5.00 14.29
/Dev/cciss/c0d0p1
0.00 44.90 1.02 27.55 8.16 579.59 4.08 289.80 20.57 22.35 78.21 5.00 14.29
/Dev/cciss/c0d0p2
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
The iostat output above indicates that 28.57 device I/O operations are performed in seconds: total IO/s = r/s (read) + w/s (write) = 1.02 + 27.55 = 28.57 (times/second), where the write operation occupies the main body (w: r = 27 ).
On average, only 5 ms is required for each device I/O operation, but 78 ms is required for each I/O request. why? Because too many I/O requests are sent (about 29 requests per second), assuming these requests are sent at the same time, the average wait time can be calculated as follows:
Average wait time = single I/O service time * (1 + 2 +... + Requests-1)/total requests
Application to the above example: average wait time = 5 ms * (1 + 2 +... + 28)/29 = 70 ms, which is very close to the average waiting time of 78 ms given by iostat. This in turn indicates that I/O is initiated at the same time.
There are many I/O requests per second (about 29), but the average queue is not long (only about 2). This indicates that the arrival of these 29 requests is uneven, i/O is idle most of the time.
In one second, 14.29% of the time I/O queues have Requests. that is to say, the I/O system has nothing to do in 85.71% of the time, all 29 I/O requests are processed within 142 milliseconds.
Delta (ruse + wuse)/delta (io) = await = 78.21 => delta (ruse + wuse)/s = 78.21 * delta (io) /s = 78.21*28.57 = 2232.8, indicating that I/O requests per second need to wait for a total of 2232.8 ms. Therefore, the average queue length should be 2232.8 ms/1000 ms = 2.23, while the average queue length (avgqu-sz) provided by iostat is 22.35. why? Because there is a bug in iostat, the avgqu-sz value should be 2.23 instead of 22.35.
※Note: When I use iostat to detect servers, I usually use the iostat-d command. the returned results are generally followedTps, blk_read/s, and blk_wrth/sThese three types of servers are generally used for comparative testing in the same environment. in this way, the performance difference will suddenly come out.