Linux IO Performance monitoring tool Iostat detailed

Source: Internet
Author: User
Tags disk usage

Original address: http://www.ha97.com/4546.html

  

Linux system has a performance problem, generally we can use top, iostat, free, vmstat and other commands to view the initial positioning problems. Where iostat can provide richer IO performance status data.

1. Basic use
$iostat -d -k 1 10
The parameter-D indicates that the device (disk) usage status is displayed; k Some columns that use block are forced to use kilobytes; 1 10 indicates that the data is refreshed every 1 seconds and is displayed 10 times.

$iostat-D-K 1 10
Device:tps kb_read/s kb_wrtn/s Kb_read Kb_wrtn
SDA 39.29 21.14 1.44 441339807 29990031
SDA1 0.00 0.00 0.00 1623 523
Sda2 1.32 1.43 4.54 29834273 94827104
Sda3 6.30 0.85 24.95 17816289 520725244
Sda5 0.85 0.46 3.40 9543503 70970116
SDA6 0.00 0.00 0.00 550 236
SDA7 0.00 0.00 0.00 406 0
SDA8 0.00 0.00 0.00 406 0
SDA9 0.00 0.00 0.00 406 0
SDA10 60.68 18.35 71.43 383002263 1490928140

Device:tps kb_read/s kb_wrtn/s Kb_read Kb_wrtn
SDA 327.55 5159.18 102.04 5056 100
SDA1 0.00 0.00 0.00 0 0

TPS: The number of transmissions per second of the device (indicate, transfers per second, were issued to the.). "One-time transfer" means "one-time I/O request". Multiple logical requests may be merged into "one I/O request". The size of the "one transfer" request is unknown.

KB_READ/S: The amount of data read from the device (drive expressed) per second, KB_WRTN/S: The amount of data written to the device (drive expressed) per second, Kb_read: Total amount of data read, KB_WRTN: Total amount of data written ; These units are kilobytes.

In the example above, we can see statistics on the disk SDA and its partitions, when the total disk TPS for statistics is 39.29, and the following is the TPS for each partition. (because it is an instantaneous value, the total TPS is not strictly equal to the sum of each partition TPs)

2.-X parameter

With the-x parameter, we can get more statistical information.

iostat-d-x-k 1 10
device:rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkb/s wkb/s avgrq-sz avgqu-sz await SVCTM%util
SDA 1.56 28.31 7.80 31.49 42.51 2.92 21.26 1.46 1.16 0.03 0.79 2.62 10.28
device:rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkb/s wkb/s avgrq-sz avgqu-sz await SVCTM%util
SDA 2.00 20.00 381.00 7.00 12320.00 216.00 6160.00 108.00 32.31 1.75 4.50 2.17 84.20

RRQM/S: How much of this device-dependent read request is merged per second (when the system call needs to read the data, the VFS sends the request to each FS, and if FS finds that different read requests read the same block data, FS merges the request into the merge); wrqm/ S: How much of this device-related write request per second has been merge.

RSEC/S: Number of sectors read per second; wsec/: Number of sectors written per second. R/s:the number of read requests that were issued to the device per second;w/s:the number of write requests that were issue D to the device per second;

Await: The average time (in milliseconds) of processing per IO request. This can be understood as the response time of IO, generally the system IO response time should be less than 5ms, if greater than 10ms is relatively large.

%util: All processing io time, divided by total statistic time, in the statistical time. For example, if the statistic interval is 1 seconds, the device has 0.8 seconds to process Io, and 0.2 seconds is idle, then the device's%util = 0.8/1 = 80%, so this parameter implies the device's busy level. Generally, if this parameter is 100% indicates that the device is already running close to full load (of course if it is a multi-disk, even if%util is 100% because of the concurrency of the disk, disk usage may not be the bottleneck).

3.-C parameter

Iostat can also be used to get CPU partial state values:

Iostat-c 1 10
AVG-CPU:%user%nice%sys%iowait%idle
1.98 0.00 0.35) 11.45 86.22
AVG-CPU:%user%nice%sys%iowait%idle
1.62 0.00 0.25) 34.46 63.67

4. Common usage

$iostat -d -k 1 10 #查看TPS和吞吐量信息
iostat -d -x -k 1 10 #查看设备使用率(%util)、响应时间(await)
iostat -c 1 10 #查看cpu状态

5. Example Analysis

$iostat-D-K 1 |grep sda10
Device:tps kb_read/s kb_wrtn/s Kb_read Kb_wrtn
SDA10 60.72 18.95 71.53 395637647 1493241908
SDA10 299.02 4266.67 129.41 4352 132
SDA10 483.84 4589.90 4117.17 4544 4076
SDA10 218.00 3360.00 100.00 3360 100
SDA10 546.00 8784.00 124.00 8784 124
SDA10 827.00 13232.00 136.00 13232 136

As seen above, the average number of disk transfers per second is about 400, and the disk reads about 5MB per second and writes about 1MB.

iostat-d-x-k 1
device:rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkb/s wkb/s avgrq-sz avgqu-sz await SVCTM%util
SDA 1.56 28.31 7.84 31.50 43.65 3.16 21.82 1.58 1.19 0.03 0.80 2.61 10.29
SDA 1.98 24.75 419.80 6.93 13465.35 253.47 6732.67 126.73 32.15 2.00 4.70 2.00 85.25
SDA 3.06 41.84 444.90 54.08 14204.08 2048.98 7102.04 1024.49 32.57 2.10 4.21 1.85 92.24

You can see the average response time of the disk <5ms, and the disk usage is >80. The disk responds properly, but it is already busy.

Extended:

rrqm/s:The number of read operations for the merge per second. Delta (rmerge)/s
wrqm/s:The number of write operations per second for the merge. That is, Delta (wmerge)/s
r/s:Number of read I/O devices completed per second. Delta (RIO)/s
w/s:Number of write I/O devices completed per second. Delta (WIO)/s
rsec/s:Number of Read sectors per second. Delta (rsect)/s
wsec/s:Number of write sectors per second. Delta (wsect)/s
rkb/s:Read K bytes per second. Is half the rsect/s because the size of each sector is 512 bytes. (Calculation required)
wkb/s:Write K bytes per second. It's half the wsect/s. (Calculation required)
Avgrq-sz:The average data size (sector) for each device I/O operation. Delta (Rsect+wsect)/delta (Rio+wio)
Avgqu-sz:Average I/O queue length. Delta (AVEQ)/s/1000 (because the Aveq is in milliseconds).
await:The average wait time (in milliseconds) for each device I/O operation. Delta (Ruse+wuse)/delta (Rio+wio)
SVCTM:The average service time (in milliseconds) for each device I/O operation. Delta (use)/delta (RIO+WIO)
%util:How much time per second is spent on I/O operations, or how much time in a second I/O queues are non-empty. That is, Delta (use)/s/1000 (because the use is in milliseconds)

If%util is close to 100%, it indicates that there are too many I/O requests and that the I/O system is fully loaded, the disk
There may be bottlenecks.
Idle less than 70% io pressure is larger, the general reading speed has more wait.

You can also combine vmstat to see the b parameter (the number of processes waiting for a resource) and the WA parameter (the percentage of CPU time that IO waits for, higher than 30% when the IO pressure is high)
In addition, the parameters of await are also more and SVCTM to reference. There must be an IO problem with too much difference.
Avgqu-sz is also an IO tuning need to pay attention to, this is the direct operation of the size of the data, if the number of times, but the small amount of data, in fact, the IO will be very small. If the data is large, the IO data will be high. can also be obtained by AVGQU-SZX (r/s or w/s) = RSEC/S or wsec/s. That is to say, the speed of reading is determined by this.

In addition, you can also refer
SVCTM generally less than await (because the waiting time for waiting requests is repeatedly computed), the size of SVCTM is generally related to disk performance, cpu/memory load will have an impact on it, too many requests will indirectly lead to increased SVCTM. The size of an await is generally dependent on the service time ( SVCTM), as well as the length of the I/O queue and the emit mode of I/O requests. If the SVCTM is closer to await, I/O has almost no wait time, if the await is much larger than SVCTM, the I/O queue is too long, the response time of the application gets slower, and the response time exceeds the Allows you to consider replacing a faster disk, adjusting the kernel elevator algorithm, optimizing the application, or upgrading the CPU.
The queue Length (AVGQU-SZ) can also be used as an indicator for measuring the system I/O load, but because Avgqu-sz is averaged over a unit time, it does not reflect instantaneous I/O flooding.


Someone else is a good example. (I/O system vs. supermarket queuing)

For example, how do we decide which checkout to go to when we queue up in the supermarket? The first is the number of teams to see the platoon, 5 people than 20 people faster? In addition to the number of heads, we also often look at the front of the person to buy things, if there is a purchase for a week of food aunt, then you can consider changing the line. There is the speed of the cashier, if the money is not clear to the novice, it will have to wait. In addition, timing is important, maybe 5 Minutes ago also crowded the checkout desk, now is empty, this time payment is very cool Ah, of course, the premise is that the past 5 minutes to do things than queued to make sense (but I have not found anything more boring than queuing).

I/O systems also have many similarities with supermarket queues:

r/s+w/s similar to the total number of people who have been
Average Queue Length (AVGQU-SZ) is similar to the number of average queueing people in a unit time
Average service time (SVCTM) is similar to the cashier's payment speed
Average wait time (await) is similar to the average wait time per person
Average I/O data (AVGRQ-SZ) is similar to the average number of things each person buys
The I/O operation rate (%util) is similar to the time scale at which a person is queued at the cashier.

We can analyze the mode of I/O requests based on these data, and the speed and response time of I/O.

Reference documents:

    1. Linux Mans Iostat
    2. How Linux Iostat computes its results
    3. Linux Iostat

Source:

http://www.orczhou.com/index.php/2010/03/iostat-detail/

Http://www.php-oa.com/2009/02/03/iostat.html

Linux IO Performance monitoring tool Iostat detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.