[Turn]linux system Monitoring, diagnostic tool IO wait

Source: Internet
Author: User

1. Questions:

Recently in the real-time synchronization of the log, before the online is done a single part of the online log stress test, Message Queuing and client, the machine is no problem, but did not think of the second log, the problem came:

A machine in the cluster top see the load high, the cluster of machine hardware configuration, the deployment of software are the same, but only this one load problem, the initial speculation that there may be hardware problems.

At the same time, we also need to pull out the culprit of the abnormal load, and then find the solution from the software and hardware level respectively.

2. Troubleshooting:

From top you can see that the load average is high, the%wa is high, and the%us is low:

Technology sharing

Since we can roughly infer that IO has encountered bottlenecks, we can then use the relevant IO diagnostic tools, specific verification and troubleshooting.

PS: If you are not familiar with the usage of top, please refer to a blog post I wrote last year:

There are several types of common combinations:

Detection of CPU bottlenecks with Vmstat, SAR, Iostat

Use free, vmstat to detect if it is a memory bottleneck

Detection of disk I/O bottlenecks with Iostat and DMESG

Detection of network bandwidth bottlenecks with Netstat

2.1 Vmstat

The meaning of the Vmstat command is to display the virtual memory status ("Viryual memor Statics"), but it can report on the overall operational state of the system, such as process, memory, I/O, etc.

Technology sharing
Its related fields are described below:

Procs (Process)
? R: The number of processes running in the queue, this value can also determine whether to increase the CPU. (longer than 1)
? B: The number of processes waiting for IO, that is, the number of processes in non-disruptive sleep state, showing the number of tasks that are executing and waiting for CPU resources. When this value exceeds the number of CPUs, there is a CPU bottleneck.

Memory (RAM)
? SWPD: Using virtual memory size, if the value of SWPD is not 0, but the value of Si,so is 0 for a long time, this situation does not affect system performance.
? Free: The size of the idle physical memory.
? Buff: The amount of memory used as a buffer.
? Cache: As the memory size of the buffer, if the cache value is large, the number of files in the cache, if the frequently accessed files can be cached, then the disk read IO bi will be very small.

Swap
? Si: Writes from the swap area to the memory size per second, and the disk is transferred into memory.
? So: The amount of memory written to the swap area per second is transferred from memory to disk.
Note: When memory is sufficient, these 2 values are 0, and if these 2 values are longer than 0 o'clock, system performance will be affected and both disk IO and CPU resources will be consumed. Some friends see that free memory is very small or close to 0 o'clock, think that memory is not enough, not to see this, but also to combine SI and so, if it is very small, but Si and so are very few (mostly 0), then do not worry, the system performance will not be affected.

IO (now the size of the Linux version block is 1KB)
? BI: Number of blocks read per second
? Bo: Number of blocks written per second
Note: When the random disk reads and writes, these 2 values are larger (such as exceeding 1024k), you can see the CPU in the IO waiting value will be greater.

System (Systems)
? In: Number of interrupts per second, including clock interrupts.
? CS: The number of context switches per second.
Note: The larger the 2 values above, the greater the CPU time that is consumed by the kernel.

CPU (expressed as a percentage)
? US: Percentage of user Process Execution time
When the value of us is higher, it means that the user process consumes more CPU time, but if it is used over a long period of 50%, then we should consider optimizing the program algorithm or accelerating it.
? Sy: Percentage of kernel system Process Execution (System time)
When the value of SY is high, it indicates that the system kernel consumes more CPU resources, this is not a benign performance, we should check the cause.
? Wa:io Wait Time percentage
When the value of WA is high, it indicates that IO waits are serious, which may be caused by a large number of random accesses to the disk, or there may be bottlenecks (block operations) on the disk.
? ID: Percentage of idle time
As can be seen from the Vmstat, the CPU spends most of its time waiting for Io, possibly due to a large number of random disk access or disk bandwidth caused by, Bi, Bo also more than 1024k, should have encountered an IO bottleneck.

2.2 Iostat

The following is a more professional disk IO diagnostic tool to see the relevant statistics.

Technology sharing
Its related fields are described below:

RRQM/S: The number of read operations per second for the merge. Delta (rmerge)/s
WRQM/S: The number of write operations per second for the merge. Delta (wmerge)/s
R/S: Number of Read I/O devices completed per second. Delta (RIO)/s
W/S: Number of write I/O devices completed per second. Delta (WIO)/s
RSEC/S: Number of Read sectors per second. Delta (rsect)/s
WSEC/S: Number of Write sectors per second. Delta (wsect)/s
rkb/s: Reads K bytes per second. is half the qkxue.net/rsect/s because the size of each sector is 512 bytes. (Calculation required)
wkb/s: Writes K bytes per second. is half the wsect/s. (Calculation required)
Avgrq-sz: The average data size (sector) per device I/O operation. Delta (rsect+wsect)/delta (Rio+wio)
Avgqu-sz: Average I/O queue length. That is Delta (AVEQ)/s/1000 (because the Aveq is in milliseconds).
Await: The average wait time (in milliseconds) for each device I/O operation. Delta (ruse+wuse)/delta (Rio+wio)
SVCTM: The average service time (in milliseconds) per device I/O operation. Delta (use)/delta (RIO+WIO)
%util: How much time in a second is spent on I/O operations, or how many times in a second I/O queues are non-empty. That is, the delta (use)/s/1000 (because the unit of use is milliseconds)
You can see that the utilization of SDB in both drives is already 100%, there is a serious IO bottleneck, and the next step is to find out which process is reading and writing data to this hard drive.

2.3 Iotop

Technology sharing

Based on the results of iotop, we quickly locate the problem of the flume process, resulting in a lot of IO wait.

But at the beginning I have said that the machine configuration in the cluster, the deployment of the program is the same as rsync, is the hard drive broken?

This has to find the operation of the students to verify, the final conclusion is:

SDB is a dual-disk RAID1, using the raid card as "LSI Logic/symbios Logic sas1068e" with no cache. Nearly 400 of the IOPS pressure has reached the hardware limit. The raid card used by other machines is "LSI Logic/symbios Logic megaraid SAS 1078", with 256MB cache, which does not meet the hardware bottleneck, the solution is to replace the machine that provides more IOPS, such as finally we changed a with PERC6 /I the machine that integrates the RAID controller card. It is necessary to note that the raid information is stored in a RAID card and disk firmware, the RAID information on the disk and the information format above the RAID card is matched, otherwise the raid card can not be recognized by the need to format the disk.
ioPS essentially depends on the disk itself, but there are many ways to increase the IOPS, plus the hardware cache and RAID arrays are common methods. In the case of DB's high IOPS scenario, it is now popular to replace traditional mechanical hard drives with SSDs.
However, as we have said before, we are looking at the possibility of finding a solution that is the least expensive in terms of both hardware and software:

Knowing the reason for the hardware, we can try to move the read and write operations to another disk, and then look at the effect:

Technology sharing

3, the last words: a way

In fact, in addition to using the above professional tools to locate the problem, we can directly use the process state to find the relevant process.

We know that the process has the following States:

PROCESS State CODES
D uninterruptible sleep (usually IO)
R running or runnable (on run queue)
S Interruptib Le Sleep (Waiting for the event to complete)
T stopped, either by a job control signal or because it's being traced. W Paging (not valid since the 2.6.xx kernel)
X dead (should never is seen)
Z defunct ("zombie") process, Termin Ated but not reaped by its parent. The
where the status is D is usually caused by the wait IO so-called "non-disruptive sleep", we can start from this point and then step-by-bit problem:

For x in ' seq 10 '; Do Ps-eo State,pid,cmd | grep "^d"; echo "----"; Sleep 5; Done
D 248 [Jbd2/dm-0-8]
D 16528 bonnie++-n 0-u 0-r 239-s 478-f-b-d/tmp
----
D [Kdmflush]
D 16528 bonnie++-n 0-u 0-r 239-s 478-f-b-d/tmp
----
Or
While true; do date; PS AUXF | awk ' {if ($8== "D") print $;} '; Sleep 1; Done
Tue 20:03:54 CLT 2011
Root 302 0.0 0.0 0 0? D May22 2:58 \_ [Kdmflush]
Root 321 0.0 0.0 0 0? D May22 4:11 \_ [jbd2/dm-0-8]
Tue 20:03:55 CLT 2011
Tue 20:03:56 CLT 2011

Cat/proc/16528/io
rchar:48752567
wchar:549961789
syscr:5967
syscw:67138
read_bytes:49020928
write_bytes:549961728
cancelled_write_bytes:0

Lsof-p 16528
COMMAND PID USER FD TYPE DEVICE size/off NODE NAME
bonnie++ 16528 root cwd DIR 252,0 4096 130597/tmp
<truncated>
bonnie++ 16528 root 8u REG 252,0 501219328 131869/tmp/bonnie.16528
bonnie++ 16528 root 9u REG 252,0 501219328 131869/tmp/bonnie.16528
bonnie++ 16528 root 10u REG 252,0 501219328 131869/tmp/bonnie.16528
bonnie++ 16528 root 11u REG 252,0 501219328 131869/tmp/bonnie.16528
bonnie++ 16528 root 12u REG 252,0 501219328 131869 <strong>/tmp/Bonnie.16528</strong>

Df/tmp
Filesystem 1k-blocks used Available use% mounted on
Http://www.qixoo.qixoo.com/dev/mapper/workstation-root 7667140 2628608 4653920 37%/

Fuser-vm/tmp
USER PID ACCESS COMMAND
/tmp:db2fenc1 1067 .... m DB2FMP
Db2fenc1 1071 .... m DB2FMP
Db2fenc1 2560 .... m DB2FMP
Db2fenc1 5221 .... m DB2FMP
4, Refer:

[1] Troubleshooting high I/O Wait in Linux
--a Walkthrough on what to find processes that is causing high I/O Wait on Linux Systems

[2] Understanding Linux system Load

[3] Iostat, Vmstat and Mpstat Examples for Linux performance monitoring

[4] vmstat vmstat command

[5] Linux vmstat Command real-combat detailed


[6] Factors affecting the performance of Linux servers

[7] Linux disk IO view Iostat,vmstat

[8] What Process is using the all of my disk IO

[9] Linux Wait IO problem

[Tracking]-down-high IO-Wait in Linux

[11] Disk IOPS calculation and measurement

[DOC] Disk performance indicator-iops-huawei

[] RAID Card

Some I/O statistics tools under Linux

[15] Disposable optimization, TPS from 400+ to 4k+

[Turn]linux system Monitoring, diagnostic tool IO wait

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.