I. Iostat and iowait detailed commentary-View disk bottlenecks one, Iostat basics
%iowait does not respond to disk bottlenecks
1, installation Iostat
Iostat's bag is called Sysstat.
Yum Install sysstat- y
2, iowait the actual measurement is CPU time:
Time time)
Note: high-speed CPUs can cause high iowait values, but this does not mean that the disk is a system bottleneck. The only way to describe the disk as a system bottleneck is the high read/write time, which generally exceeds 20ms, which represents a less normal disk performance. Why is it 20ms? In general, a single read and write is the time to find + one rotation delay + data transfer. Because, modern hard disk data transmission is a few microseconds or dozens of microseconds thing, far less than seek time 2~20ms and rotation delay 4~8ms, so only calculate these two time is almost, namely 15~20ms. As long as the greater than 20ms, you must consider whether to write to disk too many times, resulting in disk performance degradation.
Second, Iostat to understand the performance of Linux hard disk IO 1, iostat analysis
Use its tool Filemon to detect average time-to-read disk reads and writes. Under Linux, you can also view disk performance through the Iostat command. One of the SVCTM, which reflects the load on the disk, if the item is greater than 15MS and the util% is close to 100%, then the disk is now a bottleneck for the overall system performance.
# Iostat-x1Linux3.10.0-514.26.2. el7.x86_64 (V01-OPS-ES03) June 27, 2018 _x86_64_ (2CPU) Avg-CPU:%user% Nice%system%iowait%steal%Idle15.10 0.00 5.72 18.54 0.00 60.64DEVICE:RRQM/s wrqm/s r/sW/s rkb/s wkb/s avgrq-sz avgqu-sz await r_await w_await SVCTM%UTILSDA0.24 0.40 0.15 0.15 2.64 2.65 35.08 0.00 1.55 2.94 0.20 0.69 0.02SDB0.00 0.10 0.06 0.05 0.54 0.69 22.27 0.00 1.68 3.08 0.07 0.42 0.00
- RRQM/S: The number of read operations per second for the merge. Delta (rmerge)/s
- WRQM/S: The number of write operations per second for the merge. Delta (wmerge)/s
- R/S: Number of Read I/O devices completed per second. Delta (RIO)/s
- W/S: Number of write I/O devices completed per second. Delta (WIO)/s
- RSEC/S: Number of Read sectors per second. Delta (rsect)/s
- WSEC/S: Number of Write sectors per second. Delta (wsect)/s
- rkb/s: Reads K bytes per second. is half the rsect/s because the size of each sector is 512 bytes. (Calculation required)
- wkb/s: Writes K bytes per second. is half the wsect/s. (Calculation required)
- Avgrq-sz: The average data size (sector) per device I/O operation. Delta (rsect+wsect)/delta (Rio+wio)
- Avgqu-sz: Average I/O queue length. That is Delta (AVEQ)/s/1000 (because the Aveq is in milliseconds).
- Await: The average wait time (in milliseconds) for each device I/O operation. Delta (ruse+wuse)/delta (Rio+wio)
- SVCTM: The average service time (in milliseconds) per device I/O operation. Delta (use)/delta (RIO+WIO)
- %util: How much time in a second is spent on I/O operations, or how many times in a second I/O queues are non-empty. That is, the delta (use)/s/1000 (because the unit of use is milliseconds)
If%util is close to 100%, it indicates that there are too many I/O requests, the I/O system is full, and the disk may have bottlenecks.
Idle less than 70% io pressure is larger, the general reading speed has more wait.
You can also combine vmstat to see the b parameter (the number of processes waiting for a resource) and the WA parameter (the percentage of CPU time that IO waits for, higher than 30% when the IO pressure is high)
The parameters of await are also more and SVCTM to refer to. If the difference is too high, there must be an IO problem.
Avgqu-sz is also an IO tuning need to pay attention to, this is the direct operation of the size of the data, if the number of times, but the small amount of data, in fact, the IO will be very small. If the data is large, the IO data will be high. You can also pass AVGQU-SZX (r/s or w/s) = RSEC/S or wsec/s. That is to say, the speed of reading is determined by this.
In addition, you can also refer
SVCTM generally less than await (because the waiting time for waiting requests is repeatedly computed), the size of SVCTM is generally related to disk performance, cpu/memory load will have an impact on it, too many requests will indirectly lead to increased SVCTM. The size of an await typically depends on the service time (SVCTM) and the length of the I/O queue and the emit mode of the I/O request. If the SVCTM is closer to await, stating that I/O has almost no waiting time, if the await is much larger than SVCTM, the I/O queue is too long, the response time of the application gets slower, and if the response time exceeds the allowable range of the user, consider replacing the faster disk and adjusting the kernel Elev Ator algorithm, optimize the application, or upgrade the CPU.
The queue Length (AVGQU-SZ) can also be used as an indicator for measuring the system I/O load, but because Avgqu-sz is averaged over a unit time, it does not reflect instantaneous I/O flooding.
Someone else a good example (I/O system vs. supermarket queuing)
For example, how do we decide which checkout to go to when we queue up in the supermarket? The first is the number of teams to see the platoon, 5 people than 20 people faster? In addition to the number of heads, we also often look at what the front people buy things, if there is a purchase for a week, the food of the aunt, then you can consider changing a team platoon. There is the speed of the cashier, if the money is not clear to the novice, then there are waiting. In addition, the timing is also very important, perhaps 5 minutes before the overcrowded cash table, now is empty, this time the payment is very cool Ah, of course, the premise is that the past 5 minutes to do things than queued to make sense (but I have not found anything more boring than queuing).
2. I/O system also has many similarities with supermarket queues:
- r/s+w/s similar to the total number of people who have been
- Average Queue Length (AVGQU-SZ) is similar to the number of average queueing people in a unit time
- Average service time (SVCTM) is similar to the cashier's payment speed
- Average wait time (await) is similar to the average wait time per person
- Average I/O data (AVGRQ-SZ) is similar to the average number of things each person buys
- The I/O operation rate (%util) is similar to the time scale at which a person is queued at the cashier.
- We can analyze the mode of I/O requests based on these data, and the speed and response time of I/O.
The following is the analysis of the output of this parameter written by others
# Iostat-x1Linux3.10.0-514.26.2. el7.x86_64 (V01-OPS-ES03) June 27, 2018 _x86_64_ (2CPU) Avg-CPU:%user% Nice%system%iowait%steal%Idle15.10 0.00 5.72 18.54 0.00 60.64DEVICE:RRQM/s wrqm/s r/sW/s rkb/s wkb/s avgrq-sz avgqu-sz await r_await w_await SVCTM%UTILSDA0.24 0.40 0.15 0.15 2.64 2.65 35.08 0.00 1.55 2.94 0.20 0.69 0.02SDB0.00 0.10 0.06 0.05 0.54 0.69 22.27 0.00 1.68 3.08 0.07 0.42 0.00
The Iostat output above indicates that there are 28.57 device I/O operations per second: Total io (IO)/s = r/s (read) +w/s (write) = 1.02+27.55 = 28.57 (Times/sec) where the write operation takes up the body (w:r = 27:1).
- The average Per Device I/O operation takes only 5ms to complete, but each I/O request needs to wait for 78ms, why? Because there are too many I/O requests (about 29 per second), assuming that these requests are issued at the same time, the average wait time can be computed like this:
- Average wait time = single I/O service time * (1 + 2 + ... + total requests-1)/Total requests
- Apply to the above example: Average wait time = 5ms * (1+2+...+28)/29 = 70ms, and the average wait time for 78ms given by Iostat is very close. This in turn indicates that I/O is initiated concurrently.
- The number of I/O requests per second (about 29), the average queue is not long (only 2 or so), indicating that the arrival of these 29 requests is uneven, most of the time I/O is idle.
- 14.29% of the time in a second I/O queue is requested, that is, 85.71% of the time I/O system has nothing to do, all 29 I/O requests are processed within 142 milliseconds.
Delta (ruse+wuse)/delta (IO) = await = 78.21 = Delta (ruse+wuse)/s =78.21 * Delta (IO)/s = 78.21*28.57 = 2232.8, indicating I/O in per second please A total of 2232.8ms is required to wait. So the average queue length should be 2232.8ms/1000ms = 2.23, while the average queue Length (Avgqu-sz) given by Iostat is 22.35, why?! Because the Bug,avgqu-sz value in Iostat should be 2.23, not 22.35
Second, the Linux iowait high problem search and solution
Linux has a number of simple tools that you can use to find problems, and many are more advanced
I/O Wait is a problem that requires the use of advanced tools to debug, and of course there are many advanced uses of the basic tools. The reason I/O wait is difficult to locate is because we have a lot of tools to tell you I am limited, but I don't tell you exactly what the process is about (which processes)
first, confirm whether the I/O problem causes the system to slow
Verify that the system is slow due to I/O we can use multiple commands, but the simplest is the UNIX command top
[Email protected] ~]# toptop-15:19:26 up 6:10, 4 users, load average:0.00, 0.01, 0.05tasks:147 total,
1 running, 146 sleeping, 0 stopped, 0 zombie%cpu (s): 0.0 US, 0.3 sy, 0.0 ni, 99.7 ID, 96.0 wa, 0.0 hi, 0.0 si, 0.0 stkib Mem: 999936 Total, 121588 free, 328672 used, 549676 buff /cachekib Swap: 2097148 Total, 2095792 free, 1356 used.
From the CPU line we can see the percentage of CPU wasted on I/O wait; the higher the number, the more CPU resources are waiting for I/O permissions
WA--iowait amountoftimethecpuhasbeenwaitingfor I/O to complete.
second, find out which disk is being written
The top command above explains I/O wait from a whole, but does not indicate which disk is affected, and to know which disk is causing the problem, we use another command Iostat command
[[email protected] ~]# iostat-x 2 5Linux 3.10.0-514.el7.x86_64 (localhost.localdomain) March 03, 2017 _x86_64_ (1 CPU) avg-cpu: %user %nice%system%iowait %steal %idle 0.34 0.00 0.31 0.01 0.00 99.33Device: rrqm/s wrqm/s r/s w/s rkb/s wkb/s Avgrq-sz Avgqu-sz await r_await w_await svctm %UTILSDA 0.00 0.05 1.16 0.17 39.00 17.38 84.60 0.00 2.17 0.87 11.14 0.65 111.41scd0 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 0.64 0.64 0.00 0.64 0.00dm-0 0.00 0.00 1.10 0.20 37.85 17.21 84.71 0.00 2.43 0.90 10.88 0.66 0.09dm-1 0.00 0.00 0.01 0.02 0.07 0.08 9.70 0.00 1.42 0.27 2.05 0.09 0.00
In the example above, Iostat will be updated every 2 seconds, printing 5 times of information, and-X's option is to print out the extended information
The first Iostat report prints the statistics after the last boot of the system, which means that in most cases the first printed information should be ignored, and the remaining reports are based on the time of the previous interval. For example, this command will be printed 5 times, the second report is a statistic from the first report, the third time is based on the second, and so on
In the above example, the%utilized of SDA is 111.41%, which is a good indication that a process is being written to the SDA disk.
In addition to%utilized, we can get richer resources from iostat, such as read/write requests per millisecond (rrqm/s & wrqm/s)), read and write per second (r/s & w/s), and of course more. In the example above, our project seems to be reading and writing very much information. This is very useful for us to find the appropriate process.
Third, find the process that causes high I/O wait corresponding
[[email protected] ~]# iotoptotal DISK READ: 0.00 B/S | Total DISK WRITE: 0.00 b/sactual DISK READ: 0.00 B/S | Actual Disk write: 0.00 b/S TID PRIO USER DISK READ disk write swapin io> COMMAND 1028 BE/4 root 0.00 B/S 0.00 B/S 0.00% 0.00% sshd
The simplest way to find the culprit is to use command Iotop, by looking at iotop statistics, we can easily guide sshd as the culprit
Although Iotop is a very powerful tool and is easy to use, it is not installed by default on all Linux operating systems. And I personally prefer not to rely too much on those commands that are not installed by default. A system administrator may find that he cannot immediately install additional software other than the default program, unless waiting for the next maintenance time.
Iv. finding which file caused the i/owait
The lsof command can show all the files that a process opens, or all the processes that open a file. From this list, we can find out exactly what files are written, depending on the size of the file and the specific data of the IO file in/proc
We can use the-p <pid> method to reduce the output, PID is the specific process
[[email protected] ~]# lsof-p 1028COMMAND PID USER FD TYPE DEVICE size/off NODE namesshd 1028 Root CWD Dir 253,0 233 64/sshd 1028 Root RTD DIR 253,0 233 64/sshd 1028 Root txt REG 253,0 819640 2393730/usr/sbin/sshdsshd 1028 Root mem REG 253,0 61752 180464/usr/lib64/libnss_files-2.17.sosshd 1028 Root mem reg 253,0 43928 180476/usr/lib64/librt-2.17.sosshd 1028 root mem REG 253,0 15688 26 9136/USR/LIB64/LIBKEYUTILS.SO.1.5SSHD 1028 Root mem REG 253,0 62744 482870/usr/lib64/libkrb5support.so.0.1ss HD 1028 root mem reg 253,0 11384 180425/usr/lib64/libfreebl3.sosshd 1028 root mem reg 253,0 143352 180472/USR/LIB64/LIBPTHREAD-2.17.SOSSHD 1028 Root mem REG 253,0 251784 202440/usr/lib64/libnspr4.sosshd 1 028 Root mem reg 253,0 20016 202441/usr/lib64/libplc4.sosshd 1028 root mem REG 253,0 15768 202442/u Sr/lib64/libplds4.sosshd 1028 Root mem reg 253,0 182056 202443/usr/lib64/libnssutil3.sosshd 1028 root mem REG 253,0 1220240 650074/USR/LIB64/LIBNSS3.SOSSHD 1028 Root mem REG 253,0 164048 650076/usr/lib64/libsmime3.sosshd 1028 Root MEM Reg 253,0 276752 650077/usr/lib64/libssl3.sosshd 1028 root mem REG 253,0 121296 269112/usr/lib64 /LIBSASL2.SO.3.0.0SSHD 1028 Root mem REG 253,0 398264 202404/usr/lib64/libpcre.so.1.2.0sshd 1028 Root mem Reg 253,0 2116736 180446/usr/lib64/libc-2.17.sosshd 1028 root mem REG 253,0 15848 202439/usr/lib64/lib COM_ERR.SO.2.1SSHD 1028 Root mem REG 253,0 202568 482862/usr/lib64/libk5crypto.so.3.1sshd 1028 Root mem Reg 253,0 959008 482868/usr/lib64/libkrb5.so.3.3sshd 1028 root mem REG 253,0 324888 482858/usr/lib64/lib GSSAPI_KRB5.SO.2.2SSHD 1028 Root mem REG 253,0 110632 180474/usr/lib64/libresolv-2.17.sosshd 1028 Root mem REG 253,0 40640 180450/USR/LIB64/LIBCRYPT-2.17.SOSSHD 1028 Root mem REG 253,0 113152 180456/usr/lib64/libnsl-2.17.sosshd 1028 Root mem reg 253,0 90664 202424/usr/lib64/libz.so.1.2.7sshd 1028 root mem REG 253,0 14432 18643 2/USR/LIB64/LIBUTIL-2.17.SOSSHD 1028 Root mem REG 253,0 61872 766946/usr/lib64/liblber-2.4.so.2.10.3sshd 1028 Root mem reg 253,0 344280 766948/usr/lib64/libldap-2.4.so.2.10.3sshd 1028 root mem REG 253,0 1934 4 180452/usr/lib64/libdl-2.17.sosshd 1028 Root mem REG 253,0 2025472 482880/usr/lib64/libcrypto.so.1.0.1esshd 1028 Root mem reg 253,0 23968 202508/usr/lib64/libcap-ng.so.0.0.0sshd 1028 root mem REG 253,0 1557 202421/USR/LIB64/LIBSELINUX.SO.1SSHD 1028 Root mem REG 253,0 61672 539049/usr/lib64/libpam.so.0.83.1sshd 1028 Root mem reg 253,0 122936 202512/usr/lib64/libaudit.so.1.0.0sshd 1028 root mem REG 253,0 4252 0 298848/usr/lib64/libWRAP.SO.0.7.6SSHD 1028 Root mem REG 253,0 11328 568388/usr/lib64/libfipscheck.so.1.2.1sshd 1028 Root mem REG 253,0 155064 180439/usr/lib64/ld-2.17.sosshd 1028 root 0u CHR 1,3 0t0 5930/dev/nullsshd 1028 Root 1u chr 1,3 0t0 5930/dev/nullsshd 1028 root 2u CHR 1,3 0t0 5930/dev/nullssh D 1028 Root 3u IPv4 21185 0t0 tcp *:ssh (LISTEN) sshd 1028 Root 4u IPv6 21194 0t0 TCP *:s SH (LISTEN)
In order to further confirm that these files are read and written frequently, we can view them with the following command
[[email protected] ~]# df/tmp file system 1k-block used available% mount point/dev/mapper/cl-root 17811456 3981928 13829528 23%/
From the results of the above command, we can determine that/tmp is the root of our environment's logical disk
[[email protected] ~]# Pvdisplay ---Physical volume--- PV name /dev/sda2 VG name cl PV Size 19.00 Gib/not usable 3.00 MiB Allocatable Yes (but full) pe Size 4.00 MiB Total pe 4863 free PE 0 Allocated PE 4863 PV UUID 4qfaoy-dnso-nik1-ayn2-k6ay-wzmy-9nd2it
Over Pvdisplay we can see that/dev/sda2 is actually the specific disk we used to create the logical disk. With the above information we can safely say that the result of lsof is the file we are looking for
Finding and solving the problem of high Iowait Linux