The Linux system has a performance problem, and generally we can view the initial positioning problem by commands such as top, iostat, free, and Vmstat. Where iostat can provide richer IO performance status data.
1. Basic Use
$iostat-D-K 1 10
Parameter-D indicates that the display device (disk) is using state;-K some columns that use the block are forced to use kilobytes, and 1 10 indicates that the data display is refreshed every 1 seconds, displaying a total of 10 times.
$iostat-D-K 1 10
Device:tps kb_read/s kb_wrtn/s Kb_read Kb_wrtn
SDA 39.29 21.14 1.44 441339807 29990031
SDA1 0.00 0.00 0.00 1623 523
Sda2 1.32 1.43 4.54 29834273 94827104
Sda3 6.30 0.85 24.95 17816289 520725244
Sda5 0.85 0.46 3.40 9543503 70970116
SDA6 0.00 0.00 0.00 550 236
SDA7 0.00 0.00 0.00 406 0
SDA8 0.00 0.00 0.00 406 0
SDA9 0.00 0.00 0.00 406 0
SDA10 60.68 18.35 71.43 383002263 1490928140
Device:tps kb_read/s kb_wrtn/s Kb_read Kb_wrtn
SDA 327.55 5159.18 102.04 5056 100
SDA1 0.00 0.00 0.00 0 0
Tips: The number of transmissions per second of the device (indicate of the amount of transfers/second that were issued to the device.). "One time transfer" means "one I/O request". Multiple logical requests may be merged into "Once I/O requests". The size of the "once transfer" request is unknown.
KB_READ/S: The amount of data read per second from the device (drive expressed), KB_WRTN/S: The amount of data written to the device (drive expressed) per second; Kb_read: Total amount of data read; KB_WRTN: Total amount of data written ; These units are kilobytes.
In the example above, we can see the disk SDA and the statistics for each partition, when the total disk TPS is 39.29, and the following is the TPS for each partition. (because it is an instantaneous value, the total TPS is not strictly equal to the sum of the TPS for each partition)
2.-X parameter
We can get more statistics using the-x parameter.
iostat-d-x-k 1 10
device:rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkb/s wkb/s avgrq-sz avgqu-sz await SVCTM%util
SDA 1.56 28.31 7.80 31.49 42.51 2.92 21.26 1.46 1.16 0.03 0.79 2.62 10.28
device:rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkb/s wkb/s avgrq-sz avgqu-sz await SVCTM%util
SDA 2.00 20.00 381.00 7.00 12320.00 216.00 6160.00 108.00 32.31 1.75 4.50 2.17 84.20
RRQM/S: How many of the read requests related to this device are merged per second (when the system calls to read the data, VFS sends the request to each FS, if the FS finds that different read requests read the same block data, FS will merge the request); wrqm/ S: How many of this device-related write requests per second have been taken into the merge.
RSEC/S: Number of sectors read per second; wsec/: number of slices written per second. R/s:the number of read requests that were issued to the device/second;w/s:the number of write requests that were issue D to the device per second;
Await: The average time of processing per IO request (in microseconds milliseconds). This can be understood as the response time IO, the general system IO response time should be less than 5ms, if more than 10ms is relatively large.
%util: All processing IO time in the statistical time, divided by the total statistic time. For example, if a statistic interval of 1 seconds, the device has 0.8 seconds processing Io, and 0.2 seconds idle, then the device's%util = 0.8/1 = 80%, so this parameter indicates the device's busy level. Generally, if the parameter is 100%, the device is already running at a full load (of course, if it is multiple disks, even if the%util is 100%, disk usage may not be the bottleneck because of the concurrency capability of the disk).
3.-C parameter
Iostat can also be used to get CPU partial state values:
Iostat-c 1 10
AVG-CPU:%user%nice%sys%iowait%idle
1.98 0.00 0.35 11.45 86.22
AVG-CPU:%user%nice%sys%iowait%idle
1.62 0.00 0.25 34.46 63.67
4. Common usage
$iostat-D-K 1 #查看TPS和吞吐量信息
iostat-d-x-k 1 #查看设备使用率 (%util), Response time (await)
Iostat-c 1 #查看cpu状态
5. Example Analysis
$iostat-D-K 1 |grep sda10
Device:tps kb_read/s kb_wrtn/s Kb_read Kb_wrtn
SDA10 60.72 18.95 71.53 395637647 1493241908
SDA10 299.02 4266.67 129.41 4352 132
SDA10 483.84 4589.90 4117.17 4544 4076
SDA10 218.00 3360.00 100.00 3360 100
SDA10 546.00 8784.00 124.00 8784 124
SDA10 827.00 13232.00 136.00 13232 136
As seen above, the average disk transmission per second is about 400, disk reads about 5MB per second, and writes about 1MB.
iostat-d-x-k 1
device:rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkb/s wkb/s avgrq-sz avgqu-sz await SVCTM%util
SDA 1.56 28.31 7.84 31.50 43.65 3.16 21.82 1.58 1.19 0.03 0.80 2.61 10.29
SDA 1.98 24.75 419.80 6.93 13465.35 253.47 6732.67 126.73 32.15 2.00 4.70 2.00 85.25
SDA 3.06 41.84 444.90 54.08 14204.08 2048.98 7102.04 1024.49 32.57 2.10 4.21 1.85 92.24
You can see the average response time of the disk <5ms, and disk usage >80. The disk is responding normally, but it's already busy.
Extended:
RRQM/S: Number of read operations per second for merge. Delta (rmerge)/s
wrqm/s: Number of write operations per second for merge. Delta (wmerge)/s
R/S: Number of Read I/O devices completed per second. Delta (RIO)/s
W/S: Number of write I/O devices completed per second. Delta (WIO)/s
RSEC/S: Number of sectors read per second. Delta (rsect)/s
WSEC/S: Number of Write sectors per second. Delta (wsect)/s
RKB/S: K bytes read per second. Is half the rsect/s, because each sector size is 512 bytes. (Need to calculate)
WKB/S: The number of K bytes per second. is half the wsect/s. (Need to calculate)
Avgrq-sz: The average data size (sector) per device I/O operation. Delta (Rsect+wsect)/delta (Rio+wio)
Avgqu-sz: Average I/O queue length. That is, Delta (AVEQ)/s/1000 (because the unit in Aveq is in milliseconds).
Await: The average waiting time (in milliseconds) for each device I/O operation. Delta (Ruse+wuse)/delta (Rio+wio)
SVCTM: Average service time (in milliseconds) per device I/O operation. Delta (use)/delta (RIO+WIO)
%util: How much time in a second is spent on I/O operations, or how many times in a second I/O queues are non-null. That is, Delta (use)/s/1000 (because the unit of use is in milliseconds)
If the%util is close to 100%, which indicates that there are too many I/O requests, the I/O system is full load, the disk
There may be a bottleneck.
Idle less than 70% io pressure is larger, the general reading speed has more wait.
You can also combine vmstat to view the B parameter (the number of processes waiting for the resource) and the WA parameter (percentage of CPU time spent on Io wait, high io pressure above 30%)
In addition, the parameters of await and SVCTM to refer to. The problem of IO is certain to be too bad.
Avgqu-sz is also a place to be aware of when doing IO tuning, this is the direct operation of the data size, if the number of times, but the data are small, in fact, IO will also be very small. If the data is large, the IO data will be high. You can also pass AVGQU-SZX (r/s or w/s) = RSEC/S or wsec/s. That is to say, the speed of reading is determined by this.
In addition, you can also refer to
SVCTM is generally less than await (because the waiting time of the waiting request is calculated repeatedly), the size of the SVCTM is generally related to disk performance, cpu/memory load will have an impact on it, too many requests will indirectly lead to increased SVCTM. The size of the await depends generally on the service time ( SVCTM) as well as the length of the I/O queue and the emit mode of I/O requests. If the SVCTM is closer to await, there is almost no wait time for the I/O, if await is much larger than SVCTM, the I/O queue is too long and the response time is slow, if the response time User can allow the range, at this time can consider replacing faster disk, adjust the kernel elevator algorithm, optimize the application, or upgrade the CPU.
Queue Length (AVGQU-SZ) can also be used as an indicator of system I/O load, but since Avgqu-sz is average per unit time, it does not reflect instantaneous I/O floods.
A good example of others. (I/O system vs. supermarket queues)
For example, when we queue checkout in a supermarket, how do we decide which payment table to go to? The first line is the number of teams, 5 people are always faster than 20 people? In addition to counting heads, we often look at what people buy in front of us, if there is an old lady who buys food for a week, then you can consider a different line. There is the speed of the cashier, if met even the money is not clear to the novice, there are waiting. In addition, timing is also important, perhaps 5 Minutes before the overcrowding of the cashier, now is empty, this time the payment is very good ah, of course, the premise is that the past 5 minutes to do things than the line to make sense (but I have not found anything than the queue is boring).
The I/O system also has many similarities with the supermarket queues:
r/s+w/s similar to the total number of people who paid
Average Queue Length (AVGQU-SZ) similar to the number of people queuing in a unit of time
Average service time (SVCTM) similar to Cashier's collection speed
Average wait time (await) is similar to the average waiting time per person
Average I/O data (AVGRQ-SZ) is similar to the average number of things each person buys
I/O operating rate (%util) is similar to the percentage of time someone queues in a cashier
Based on this data, we can analyze the pattern of I/O requests, as well as the speed and response time of I/O.