I/O wait for linux system monitoring and diagnosis tools
1. Problem:
Recently, log real-time synchronization was performed. Before the release, the online log stress test was performed. The message queue and the client and the local machine are normal, but I did not expect that after the second log is uploaded, question:
The top of a machine in the cluster shows a huge load. The hardware configuration of the machine in the cluster is the same, and the deployed software is the same. However, this server load alone has a problem. It is preliminarily estimated that the hardware may be faulty.
At the same time, we also need to find out the culprit of abnormal loads and find solutions from the software and hardware layers.
2. troubleshooting:
From top, we can see that load average is too high, % wa is too high, and % us is very low:
We can roughly infer that I/O has encountered a bottleneck. Next we can use the relevant I/O diagnostic tools for specific verification and troubleshooting.
PS: if you do not know how to use top, please refer to a blog post I wrote last year:
Linux system monitoring and diagnostic tools
There are several common combinations:
• Use vmstat, sar, and iostat to detect CPU bottlenecks
• Use free and vmstat to detect memory bottlenecks
• Use iostat and dmesg to detect disk I/O bottlenecks
• Use netstat to detect network bandwidth bottlenecks
2.1 vmstat
The vmstat command is used to display the virtual memory status "Viryual Memor Statics"), but it can report the overall running status of processes, memory, I/O and other systems.
Its related fields are described as follows:
- Procs process)
- • R: Number of processes in the running queue. This value can also determine whether to increase the CPU. Longer than 1)
- • B: Number of processes waiting for I/O, that is, the number of processes in non-interrupted sleep state, showing the number of tasks being executed and waiting for CPU resources. When this value exceeds the number of CPUs, a CPU bottleneck will occur.
- Memory)
- • Swpd: the virtual memory size. If the value of swpd is not 0 but the value of SI and SO is 0 for a long time, this will not affect the system performance.
- • Free: idle physical memory size.
- • Buff: memory size used as a buffer.
- • Cache: the memory size used for caching. If the cache value is large, it indicates that the number of files in the cache is large. If files frequently accessed, they can be cached, then the disk read IO bi will be very small.
- Swap
- • Si: The size of memory written from the swap area per second, which is transferred from the disk to the memory.
- • So: the memory size written to the swap zone per second, which is transferred from the memory to the disk.
- Note: When the memory is sufficient, these two values are all 0. If these two values are greater than 0 for a long time, the system performance will be affected and the disk I/O and CPU resources will be consumed. Some friends see free memory.) When the memory is very small or close to 0, they think that the memory is not enough. If the free memory is small, but si and so are rarely 0 in most cases, so don't worry, the system performance will not be affected at this time.
- IO the size of the current Linux version block is 1 KB)
- • Bi: Number of read blocks per second
- • Bo: number of blocks written per second
- Note: When reading and writing random disks, the larger the two values, for example, exceeds 1024 kb.) The larger the value of CPU waiting for I/O is.
- System)
- • In: Number of interrupts per second, including clock interruptions.
- • Cs: Number of context switches per second.
- Note: The larger the values above, the larger the CPU time consumed by the kernel.
- CPU in percentage)
- • Us: Percentage of user process execution time (user time)
- When the value of us is high, it indicates that the user process consumes a lot of CPU time. However, if the CPU usage exceeds 50% for a long time, we should consider optimizing the program algorithm or accelerating it.
- • Sy: Percentage of kernel system process execution time (system time)
- When the sy value is high, it indicates that the system kernel consumes many CPU resources, which is not a benign performance. We should check the cause.
- • Wa: Percentage of IO wait time
- When the value of wa is high, it indicates that the IO wait is serious, which may be caused by a large number of random access to the disk or the bottleneck block operation on the disk ).
- • Id: Percentage of idle time
From vmstat, we can see that most of the CPU time is wasted waiting for I/O. It may be caused by a large number of random disk access or disk bandwidth. bi and bo both exceed 1024 kb, i/O bottleneck.
2.2 iostat
Next we will use a more professional disk I/O diagnostic tool to view the relevant statistics.
Its related fields are described as follows:
- Rrqm/s: the number of merge read operations per second. That is, delta (rmerge)/s
- Wrqm/s: Number of write operations performed on merge per second. That is, delta (wmerge)/s
- R/s: The number of read I/O devices per second. That is, delta (rio)/s
- W/s: the number of write I/O devices completed per second. That is, delta (wio)/s
- Rsec/s: Number of read sectors per second. That is, delta (rsect)/s
- Wsec/s: Number of write sectors per second. That is, delta (wsect)/s
- RkB/s: the number of bytes read per second. It is half of rsect/s because the size of each slice is 512 bytes. (Computing required)
- WkB/s: the number of K bytes written per second. Half of wsect/s. (Computing required)
- Avgrq-sz: average data size (slice) of each device I/O operation ). Delta (rsect + wsect)/delta (rio + wio)
- Avgqu-sz: Average I/O queue length. That is, delta (aveq)/s/1000 (because aveq is measured in milliseconds ).
- Await: average wait time (in milliseconds) for each device I/O operation ). That is, delta (ruse + wuse)/delta (rio + wio)
- Svctm: Average service time (in milliseconds) for each device I/O operation ). That is, delta (use)/delta (rio + wio)
- % Util: the percentage of time in one second is used for I/O operations, or the number of I/O queues in one second is not empty. That is, delta (use)/s/1000 (because the Unit of use is milliseconds)
We can see that the sdb utilization of the two hard disks is already 100%, and there is a serious IO bottleneck. The next step is to find out which process is reading and writing data to the hard disk.
2.3 iotop
Based on iotop results, we quickly located a problem with the flume process, resulting in a large number of IO wait.
But I already said at the beginning that the machines in the cluster have the same configuration and the deployed programs have the same rsync as before. Is the hard disk broken?
I have to check the problem with the O & M personnel. The final conclusion is:
Sdb is a dual-disk RAID 1 with a RAID card of "LSI Logic/Symbios Logic SAS1068E" and no cache. Nearly 400 of IOPS has reached the hardware limit. The raid cards used by other machines are "LSI Logic/Symbios Logic MegaRAID SAS 1078" with 256 MB cache, which does not reach the hardware bottleneck. The solution is to replace the machines that provide larger IOPS.
However, as mentioned above, the purpose of starting from the two aspects of software and hardware is to see if we can find the solution with the lowest cost:
If you know the hardware reason, we can try to move the read/write operation to another disk, and then look at the effect:
3. The last words: a different path
In fact, in addition to using the above professional tools to locate this problem, we can directly use the Process status to find the relevant process.
We know that the process has the following statuses:
- PROCESS STATE CODES
- D uninterruptible sleep (usually IO)
- R running or runnable (on run queue)
- S interruptible sleep (waiting for an event to complete)
- T stopped, either by a job control signal or because it is being traced.
- W paging (not valid since the 2.6.xx kernel)
- X dead (should never be seen)
- Z defunct ("zombie") process, terminated but not reaped by its parent.
Among them, the State D is generally caused by the so-called "non-interrupted sleep" due to wait IO, we can start from this point and then locate the problem step by step:
- For x in 'seq 10'; do ps-eo state, pid, cmd | grep "^ D"; echo "----"; sleep 5; done
- D 248 [jbd2/dm-0-8]
- D 16528 bonnie ++-n 0-u 0-r 239-s 478-f-B-d/tmp
- ----
- D 22 [kdmflush]
- D 16528 bonnie ++-n 0-u 0-r 239-s 478-f-B-d/tmp
- ----
- # Or:
- While true; do date; ps auxf | awk '{if ($8 = "D") print $0 ;}'; sleep 1; done
- Tue Aug 23 20:03:54 CLT 2011
- Root 302 0.0 0.0 0 0? D May22 \ _ [kdmflush]
- Root 321 0.0 0.0 0 0? D May22 \ _ [jbd2/dm-0-8]
- Tue Aug 23 20:03:55 CLT 2011
- Tue Aug 23 20:03:56 CLT 2011
- Cat/proc/16528/io
- Rchar: 48752567
- W char: 549961789
- Syscr: 5967
- Syscw: 67138.
- Read_bytes: 49020928
- Write_bytes: 549961728
- Cancelled_write_bytes: 0
- Lsof-p 16528
- Command pid user fd type device size/OFF NODE NAME
- Bonnie ++ 16528 root cwd DIR 252,0 4096 130597/tmp
- <Truncated>
- Bonnie ++ 16528 root 8u REG 252,0 501219328 131869/tmp/Bonnie.16528
- Bonnie ++ 16528 root 9u REG 252,0 501219328 131869/tmp/Bonnie.16528
- Bonnie ++ 16528 root 10u REG 252,0 501219328 131869/tmp/Bonnie.16528
- Bonnie ++ 16528 root 11u REG 252,0 501219328 131869/tmp/Bonnie.16528
- Bonnie ++ 16528 root 12u REG 252,0 501219328 131869 <strong>/tmp/Bonnie.16528 </strong>
- Df/tmp
- Filesystem 1K-blocks Used Available Use % Mounted on
- /Dev/mapper/workstation-root 7667140 2628608 4653920 37%/
- Fuser-vm/tmp
- USER PID ACCESS COMMAND
- /Tmp: db2fenc1 1067... m db2fmp
- Db2fenc1 1071... m db2fmp
- Db2fenc1 2560... m db2fmp
- Db2fenc1 5221... m db2fmp
4. Refer:
[1] Troubleshooting High I/O Wait in Linux
-- A walkthrough on how to find processes that are causing high I/O Wait on Linux Systems
Http://bencane.com/2012/08/06/troubleshooting-high-io-wait-in-linux/
[2] understanding Linux system load
Http://www.ruanyifeng.com/blog/2011/07/linux_load_average_explained.html
[3] 24 iostat, vmstat and mpstat Examples for Linux Performance Monitoring
Http://www.thegeekstuff.com/2011/07/iostat-vmstat-mpstat-examples/
[4] vmstat command
Http://man.linuxde.net/vmstat
[5] Linux vmstat commands
Http://www.cnblogs.com/ggjucheng/archive/2012/01/05/2312625.html
[6] factors affecting Linux server performance
Http://www.rocklv.net/2004/news/article_284.html
[7] viewing iostat and vmstat for linux disk I/O
Http://blog.csdn.net/qiudakun/article/details/4699587
[8] What Process is using all of my disk IO
Http://stackoverflow.com/questions/488826/what-process-is-using-all-of-my-disk-io
[9] Linux Wait IO Problem
Http://www.chileoffshore.com/en/interesting-articles/126-linux-wait-io-problem
[10] Tracking Down High IO Wait in Linux
Http://ostatic.com/blog/tracking-down-high-io-wait-in-linux
From: http://my.oschina.net/leejun2005/blog/355915