Linux system Monitoring, diagnostic tool IO wait

Source: Internet
Author: User

1. Questions:

recently in the real-time synchronization of the log, before the online is done a single part of the online log stress test, Message Queuing and client, the machine is no problem, but did not think of the second log, the problem came:

a machine in the cluster top see the load high, the cluster of machine hardware configuration, the deployment of software are the same, but only this one load problem, the initial speculation that there may be hardware problems.

at the same time, we also need to pull out the culprit of the abnormal load, and then find the solution from the software and hardware level respectively.

2. Troubleshooting:

from top you can see that the load average is high, the%wa is high, and the%us is very low:

This issue is fully explained by IO Wait, which we can then use the relevant IO diagnostic tool for specific verification and troubleshooting.

PS: If you are not familiar with the usage of top, please refer to a blog post I wrote last year:

Linux system monitoring, diagnostic tools Top

There are several types of common combinations:
o detection of CPU bottlenecks with Vmstat, SAR, Iostat
o detect memory bottlenecks with free and vmstat
o detection of disk I/O bottlenecks with Iostat and DMESG
o detection of network bandwidth bottlenecks with Netstat

2.1 Vmstat

the meaning of the Vmstat command is to display the virtual memory status ("Viryual memor Statics"), but it can report on the overall operational state of the system, such as process, memory, I/O, etc.

Its related fields are described below:

Procs (process)? R: The number of processes running in the queue, this value can also determine whether to increase the CPU. (longer than 1)? B: The number of processes waiting for IO, that is, the number of processes in non-disruptive sleep state, showing the number of tasks that are executing and waiting for CPU resources. When this value exceeds the number of CPUs, the CPU bottleneck will occur memory (RAM)? SWPD: Using virtual memory size, if the value of SWPD is not 0, but the value of Si,so is 0 long, this situation does not affect system performance.? Free: idle physical memory size.? Buff: The amount of memory used as a buffer. Cache: As the memory size of the buffer, if the cache value is large, the number of files in the cache, if the frequently accessed files can be cached, then the disk read IO bi will be very small. Swap? Si: Writes from the swap area to the memory size per second, and the disk is transferred into memory. So: The amount of memory written to the swap area per second is transferred from memory to disk. Note: When memory is sufficient, these 2 values are 0, and if these 2 values are longer than 0 o'clock, system performance will be affected and both disk IO and CPU resources will be consumed. Some friends see that free memory is very small or close to 0 o'clock, think that memory is not enough, not to see this, but also to combine SI and so, if it is very small, but Si and so are very few (mostly 0), then do not worry, the system performance will not be affected. IO (now the size of the Linux version block is 1kb)? BI: The number of blocks read per second? Bo: Number of blocks written per second note: When the random disk reads and writes, these 2 values are larger (for example, exceeding 1024k), and you can see that the CPU is waiting for the value of the IO more. System (Systems)? In: Number of interrupts per second, including clock interrupts.? CS: The number of context switches per second. Note: The larger the 2 values above, the greater the CPU time that is consumed by the kernel. CPU (expressed as a percentage)? US: When the user process execution time percentage is higher, the user process consumes more CPU time, but if it is used over a long period of 50%, then we should consider optimizing the program algorithm or accelerating. SY: When the kernel system process Execution time percentage (System times) SY value is high, the system kernel consumes more CPU resources, this is not benign performance, we should check the cause. Wa:io wait time percentage when the value of WA is high, the IO Wait is more serious, which may be caused by a large number of random accesses to the disk, or the disk bottleneck (block operation). ID: Percentage of idle time
As can be seen from the Vmstat, the CPU spends most of its time waiting for Io, possibly due to a large number of random disk access or disk bandwidth caused by, Bi, Bo also more than 1024k, should have encountered an IO bottleneck. 2.2 Iostat

The following is a more professional disk IO diagnostic tool to see the relevant statistics.

its related fields are described below:

RRQM/S:    The number of read operations per second for the merge. Delta (rmerge)/swrqm/s:    The number of write operations per second for the merge. Delta (wmerge)/SR/S:       number of Read I/O devices completed per second. Delta (RIO)/SW/S:       Number of write I/O devices completed per second. Delta (wio)/SRSEC/S:    number of Read sectors per second. Delta (rsect)/SWSEC/S:    number of write sectors per second. Delta (wsect)/srkb/s:     reads K bytes per second. is half the rsect/s because the size of each sector is 512 bytes. (calculation required) wkb/s:     writes K bytes per second. is half the wsect/s. Avgrq-sz:  The data size (sector) of the average Per device I/O operation (requires calculation). Delta (rsect+wsect)/delta (rio+wio) Avgqu-sz:  average I/O queue length. That is Delta (AVEQ)/s/1000 (because the Aveq is in milliseconds). Await:     The average wait time (in milliseconds) for each device I/O operation. Delta (ruse+wuse)/delta (rio+wio) SVCTM:     The average service time (in milliseconds) per device I/O operation. That is, Delta (use)/delta (rio+wio)%util:     How much time in a second is spent on I/O operations, or how many times in a second I/O queues are non-empty. That is, the delta (use)/s/1000 (because the unit of use is milliseconds)
you can see that the utilization of SDB in both drives is already 100%, there is a serious IO bottleneck, and the next step is to find out which process is reading and writing data to this hard drive. 2.3 iotop

based on the results of iotop, we quickly locate the problem of the flume process, resulting in a lot of IO wait.

But at the beginning I have said that the machine configuration in the cluster, the deployment of the program is the same as rsync, is the hard drive broken?

this has to find the operation of the students to verify, the final conclusion is:

SDB is a dual-disk RAID1, using the raid card as "LSI Logic/symbios Logic sas1068e" with no cache. Nearly 400 of the IOPS pressure has reached the hardware limit. The raid card used by other machines is "LSI Logic/symbios Logic megaraid SAS 1078", with 256MB cache, which does not meet the hardware bottleneck, and the solution is to replace the machines that provide greater iops.
However, as we have said before, we are looking at the possibility of finding a solution that is the least expensive in terms of both hardware and software:

Knowing the reason for the hardware, we can try to move the read and write operations to another disk, and then look at the effect:

3, the last words: a way

in fact, in addition to using the above professional tools to locate the problem, we can directly use the process state to find the relevant process.

we know that the process has the following states:

PROCESS State CODES D uninterruptible sleep (usually IO) R running or runnable (on run queue) S interruptible sleep (waiti Ng for the event to complete) T-stopped, either by a job control signal or because it is being traced. W paging (not valid since the 2.6.xx kernel) X dead (should never is seen) Z defunct ("zombie") process, terminated but no T reaped by its parent.

where the status is D is usually caused by the wait IO to cause so-called "non-disruptive sleep", we can start from this point and then step by bit to locate the problem:

For x in ' seq 10 '; Do Ps-eo State,pid,cmd | grep "^d"; echo "----"; Sleep 5;  Done d 248 [jbd2/dm-0-8] D 16528 bonnie++-n 0-u 0-r 239-s 478-f-b-d/tmp----d [Kdmflush] D 16528 bonnie++-N 0 -U 0-r 239-s 478-f-b-d/tmp----# or: While true; do date; PS AUXF | awk ' {if ($8== "D") print $;} '; Sleep 1;        Done Tue 20:03:54 CLT (Root 302 0.0 0.0) 0 0?        D May22 2:58 \_ [Kdmflush] root 321 0.0 0.0 0 0? D May22 4:11 \_ [jbd2/dm-0-8] Tue, 20:03:55 clt, Tue, 20:03:56 clt 2011cat/proc/16528/io rchar:487 52567 wchar:549961789 syscr:5967 syscw:67138 read_bytes:49020928 write_bytes:549961728 cancelled_write_bytes:0 lsof -P 16528 COMMAND PID USER FD TYPE DEVICE size/off NODE NAME bonnie++ 16528 root cwd DIR 252,0 4096 130597/tmp <truncat ed> bonnie++ 16528 root 8u reg 252,0 501219328 131869/tmp/bonnie.16528 bonnie++ 16528 root 9u REG 252,0 501219328 1318 69/tmp/bonnie.16528 bonnie++ 16528 ROot 10u reg 252,0 501219328 131869/tmp/bonnie.16528 bonnie++ 16528 root 11u REG 252,0 501219328 131869/tmp/bonnie.16528 bonnie++ 16528 root 12u REG 252,0 501219328 131869 <strong>/tmp/Bonnie.16528</strong> df/tmp Filesystem 1K-BL        Ocks used Available use% mounted on/dev/mapper/workstation-root 7667140 2628608 4653920 37%/fuser-vm/tmp USER PID ACCESS command/tmp:db2fenc1 1067 .... m DB2FMP db2fenc1 1071 .... m DB2FMP db2fenc1 2560. ... m db2fmp db2fenc1 5221 .... m DB2FMP
4, Refer:

[1] Troubleshooting high I/O Wait in Linux
--a Walkthrough on what to find processes that is causing high I/O Wait on Linux Systems

[2] Understanding Linux system Load

[3] iostat, Vmstat and Mpstat Examples for Linux performance monitoring

[4] vmstat vmstat command

[5] Linux vmstat Command Real-combat detailed
[6] factors affecting the performance of Linux servers

[7] linux disk IO view Iostat,vmstat

[8] what Process is using the all of my disk IO


[9] Linux Wait IO problem


[Tracking] -down-high IO-Wait in Linux


Linux system Monitoring, diagnostic tool IO wait

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.