I/O wait for linux system monitoring and diagnosis tools

Last Update:2014-12-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Problem:

Recently, log real-time synchronization was performed. Before the release, the online log stress test was performed. The message queue and the client and the local machine are normal, but I did not expect that after the second log is uploaded, question:

The top of a machine in the cluster shows a huge load. The hardware configuration of the machine in the cluster is the same, and the deployed software is the same. However, this server load alone has a problem. It is preliminarily estimated that the hardware may be faulty.

At the same time, we also need to find out the culprit of abnormal loads and find solutions from the software and hardware layers.

2. troubleshooting:

From top, we can see that load average is too high, % wa is too high, and % us is very low:

We can roughly infer that I/O has encountered a bottleneck. Next we can use the relevant I/O diagnostic tools for specific verification and troubleshooting.

PS: if you do not know how to use top, please refer to a blog post I wrote last year:

Linux system monitoring and diagnostic tools

There are several common combinations:

• Use vmstat, sar, and iostat to detect CPU bottlenecks
• Use free and vmstat to detect memory bottlenecks
• Use iostat and dmesg to detect disk I/O bottlenecks
• Use netstat to detect network bandwidth bottlenecks

2.1 vmstat

The vmstat command is used to display the virtual memory status "Viryual Memor Statics"), but it can report the overall running status of processes, memory, I/O and other systems.

Its related fields are described as follows:

Procs process)

• R: Number of processes in the running queue. This value can also determine whether to increase the CPU. Longer than 1)

• B: Number of processes waiting for I/O, that is, the number of processes in non-interrupted sleep state, showing the number of tasks being executed and waiting for CPU resources. When this value exceeds the number of CPUs, a CPU bottleneck will occur.

Memory)

• Swpd: the virtual memory size. If the value of swpd is not 0 but the value of SI and SO is 0 for a long time, this will not affect the system performance.

• Free: idle physical memory size.

• Buff: memory size used as a buffer.

• Cache: the memory size used for caching. If the cache value is large, it indicates that the number of files in the cache is large. If files frequently accessed, they can be cached, then the disk read IO bi will be very small.

Swap

• Si: The size of memory written from the swap area per second, which is transferred from the disk to the memory.

• So: the memory size written to the swap zone per second, which is transferred from the memory to the disk.

Note: When the memory is sufficient, these two values are all 0. If these two values are greater than 0 for a long time, the system performance will be affected and the disk I/O and CPU resources will be consumed. Some friends see free memory.) When the memory is very small or close to 0, they think that the memory is not enough. If the free memory is small, but si and so are rarely 0 in most cases, so don't worry, the system performance will not be affected at this time.

IO the size of the current Linux version block is 1 KB)

• Bi: Number of read blocks per second

• Bo: number of blocks written per second

Note: When reading and writing random disks, the larger the two values, for example, exceeds 1024 kb.) The larger the value of CPU waiting for I/O is.

System)

• In: Number of interrupts per second, including clock interruptions.

• Cs: Number of context switches per second.

Note: The larger the values above, the larger the CPU time consumed by the kernel.

CPU in percentage)

• Us: Percentage of user process execution time (user time)

When the value of us is high, it indicates that the user process consumes a lot of CPU time. However, if the CPU usage exceeds 50% for a long time, we should consider optimizing the program algorithm or accelerating it.

• Sy: Percentage of kernel system process execution time (system time)

When the sy value is high, it indicates that the system kernel consumes many CPU resources, which is not a benign performance. We should check the cause.

• Wa: Percentage of IO wait time

When the value of wa is high, it indicates that the IO wait is serious, which may be caused by a large number of random access to the disk or the bottleneck block operation on the disk ).

• Id: Percentage of idle time

From vmstat, we can see that most of the CPU time is wasted waiting for I/O. It may be caused by a large number of random disk access or disk bandwidth. bi and bo both exceed 1024 kb, i/O bottleneck.

2.2 iostat

Next we will use a more professional disk I/O diagnostic tool to view the relevant statistics.

Its related fields are described as follows:

 
 
  
  Rrqm/s: the number of merge read operations per second. That is, delta (rmerge)/s
  
  Wrqm/s: Number of write operations performed on merge per second. That is, delta (wmerge)/s
  
  R/s: The number of read I/O devices per second. That is, delta (rio)/s
  
  W/s: the number of write I/O devices completed per second. That is, delta (wio)/s
  
  Rsec/s: Number of read sectors per second. That is, delta (rsect)/s
  
  Wsec/s: Number of write sectors per second. That is, delta (wsect)/s
  
  RkB/s: the number of bytes read per second. It is half of rsect/s because the size of each slice is 512 bytes. (Computing required)
  
  WkB/s: the number of K bytes written per second. Half of wsect/s. (Computing required)
  
  Avgrq-sz: average data size (slice) of each device I/O operation ). Delta (rsect + wsect)/delta (rio + wio)
  
  Avgqu-sz: Average I/O queue length. That is, delta (aveq)/s/1000 (because aveq is measured in milliseconds ).
  
  Await: average wait time (in milliseconds) for each device I/O operation ). That is, delta (ruse + wuse)/delta (rio + wio)
  
  Svctm: Average service time (in milliseconds) for each device I/O operation ). That is, delta (use)/delta (rio + wio)
  
  % Util: the percentage of time in one second is used for I/O operations, or the number of I/O queues in one second is not empty. That is, delta (use)/s/1000 (because the Unit of use is milliseconds)

We can see that the sdb utilization of the two hard disks is already 100%, and there is a serious IO bottleneck. The next step is to find out which process is reading and writing data to the hard disk.

2.3 iotop

Based on iotop results, we quickly located a problem with the flume process, resulting in a large number of IO wait.

But I already said at the beginning that the machines in the cluster have the same configuration and the deployed programs have the same rsync as before. Is the hard disk broken?

I have to check the problem with the O & M personnel. The final conclusion is:

Sdb is a dual-disk RAID 1 with a RAID card of "LSI Logic/Symbios Logic SAS1068E" and no cache. Nearly 400 of IOPS has reached the hardware limit. The raid cards used by other machines are "LSI Logic/Symbios Logic MegaRAID SAS 1078" with 256 MB cache, which does not reach the hardware bottleneck. The solution is to replace the machines that provide larger IOPS.

However, as mentioned above, the purpose of starting from the two aspects of software and hardware is to see if we can find the solution with the lowest cost:

If you know the hardware reason, we can try to move the read/write operation to another disk, and then look at the effect:

3. The last words: a different path

In fact, in addition to using the above professional tools to locate this problem, we can directly use the Process status to find the relevant process.

We know that the process has the following statuses:

 
 
  
  PROCESS STATE CODES  
  
   D uninterruptible sleep (usually IO)  
  
   R running or runnable (on run queue)  
  
   S interruptible sleep (waiting for an event to complete)  
  
   T stopped, either by a job control signal or because it is being traced.  
  
   W paging (not valid since the 2.6.xx kernel)  
  
   X dead (should never be seen)  
  
   Z defunct ("zombie") process, terminated but not reaped by its parent.

Among them, the State D is generally caused by the so-called "non-interrupted sleep" due to wait IO, we can start from this point and then locate the problem step by step:

 
 
  
  For x in 'seq 10'; do ps-eo state, pid, cmd | grep "^ D"; echo "----"; sleep 5; done
  
  D 248 [jbd2/dm-0-8]
  
  D 16528 bonnie ++-n 0-u 0-r 239-s 478-f-B-d/tmp
  
  ----
  
  D 22 [kdmflush]
  
  D 16528 bonnie ++-n 0-u 0-r 239-s 478-f-B-d/tmp
  
  ----
  
  # Or:
  
  While true; do date; ps auxf | awk '{if ($8 = "D") print $0 ;}'; sleep 1; done
  
  Tue Aug 23 20:03:54 CLT 2011
  
  Root 302 0.0 0.0 0 0? D May22 \ _ [kdmflush]
  
  Root 321 0.0 0.0 0 0? D May22 \ _ [jbd2/dm-0-8]
  
  Tue Aug 23 20:03:55 CLT 2011
  
  Tue Aug 23 20:03:56 CLT 2011
  
  
  
  Cat/proc/16528/io
  
  Rchar: 48752567
  
  W char: 549961789
  
  Syscr: 5967
  
  Syscw: 67138.
  
  Read_bytes: 49020928
  
  Write_bytes: 549961728
  
  Cancelled_write_bytes: 0
  
  
  
  Lsof-p 16528
  
  Command pid user fd type device size/OFF NODE NAME
  
  Bonnie ++ 16528 root cwd DIR 252,0 4096 130597/tmp
  
  <Truncated>
  
  Bonnie ++ 16528 root 8u REG 252,0 501219328 131869/tmp/Bonnie.16528
  
  Bonnie ++ 16528 root 9u REG 252,0 501219328 131869/tmp/Bonnie.16528
  
  Bonnie ++ 16528 root 10u REG 252,0 501219328 131869/tmp/Bonnie.16528
  
  Bonnie ++ 16528 root 11u REG 252,0 501219328 131869/tmp/Bonnie.16528
  
  Bonnie ++ 16528 root 12u REG 252,0 501219328 131869 <strong>/tmp/Bonnie.16528 </strong>
  
  
  
  Df/tmp
  
  Filesystem 1K-blocks Used Available Use % Mounted on
  
  /Dev/mapper/workstation-root 7667140 2628608 4653920 37%/
  
  
  
  Fuser-vm/tmp
  
  USER PID ACCESS COMMAND
  
  /Tmp: db2fenc1 1067... m db2fmp
  
  Db2fenc1 1071... m db2fmp
  
  Db2fenc1 2560... m db2fmp
  
  Db2fenc1 5221... m db2fmp

4. Refer:

[1] Troubleshooting High I/O Wait in Linux

-- A walkthrough on how to find processes that are causing high I/O Wait on Linux Systems

Http://bencane.com/2012/08/06/troubleshooting-high-io-wait-in-linux/

[2] understanding Linux system load

Http://www.ruanyifeng.com/blog/2011/07/linux_load_average_explained.html

[3] 24 iostat, vmstat and mpstat Examples for Linux Performance Monitoring

Http://www.thegeekstuff.com/2011/07/iostat-vmstat-mpstat-examples/

[4] vmstat command

Http://man.linuxde.net/vmstat

[5] Linux vmstat commands

Http://www.cnblogs.com/ggjucheng/archive/2012/01/05/2312625.html

[6] factors affecting Linux server performance

Http://www.rocklv.net/2004/news/article_284.html

[7] viewing iostat and vmstat for linux disk I/O

Http://blog.csdn.net/qiudakun/article/details/4699587

[8] What Process is using all of my disk IO

Http://stackoverflow.com/questions/488826/what-process-is-using-all-of-my-disk-io

[9] Linux Wait IO Problem

Http://www.chileoffshore.com/en/interesting-articles/126-linux-wait-io-problem

[10] Tracking Down High IO Wait in Linux

Http://ostatic.com/blog/tracking-down-high-io-wait-in-linux

From: http://my.oschina.net/leejun2005/blog/355915

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

I/O wait for linux system monitoring and diagnosis tools

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support