Solve the problem of iowait too high in Linux

Source: Internet
Author: User
Tags sleep

I/O problem is always a difficult problem to locate, today, the online environment encountered I/O caused by the CPU load problem, see the following this relatively good article, after dinner I still in and tomatoes and cucumbers on the road to lose weight, just translated into Chinese, for the vast number of compatriots tasting


Linux has many tools available for troubleshooting some easy to use, are some more are.

Linux has a number of simple tools that can be used to find problems, and many are more advanced

I/O is a issue that requires use of some to the more advanced tools as "as" as an advanced usage of some of the BASI C Tools. The reason I/O are difficult to troubleshoot are due to the fact this by default there are plenty of tools That's your system is I/O bound, but not as many which can narrow the problem to a specific process or processes.

I/O wait is a problem that requires the use of advanced tools to debug, as well as the advanced usage of many basic tools. The problem with I/O wait is difficult to locate because we have a lot of tools to tell you that I/O is limited, but it doesn't tell you exactly what the process is causing (which processes)

Answering whether or not I/O is causing system slowness

Confirm that the system is slow due to I/O problems

To identify whether I/O causing system slowness can use several commands but the easiest are the UNIX command top.

Verify that the I/O is causing the system to slow down we can use multiple commands, but the simplest is the UNIX command top


# Top
top-14:31:20 up min, 4 users, load average:2.25, 1.74, 1.68
tasks:71 Total, 1 running, sleeping, 0 stopped, 0 zombie
Cpu (s): 2.3%us, 1.7%sy, 0.0%ni, 0.0%id, 96.0%wa, 0.0%hi, 0.0%si, 0.0%st
mem:245440k Total, 241004k used, 4436k free, 496k buffers
swap:409596k Total, 5436k used, 404160k free, 182812k cached

From the to the CPU (s) line you can, the current percentage's CPU in I/O wait; The higher the more CPU are waiting for I/O access.

From the CPU line we can see the percentage of CPU wasted on I/O wait; the higher the number, the more CPU resources are waiting for I/O permissions

WA--iowait
Amount of the CPU has been waiting for I/O to complete.

Finding which disk is being written to

Find that disk is being written

The above top command shows I/O wait from the system as a whole but it does don't tell you what disk is being affected; For this we'll use the Iostatcommand.

The top command above describes an I/O wait as a whole, but it does not indicate which disk is affected and what disk is causing the problem, we have used another command Iostat command


$ iostat-x 2 5
AVG-CPU:%user%nice%system%iowait%steal%idle
3.66 0.00 47.64 48.69 0.00 0.00

device:rrqm/s wrqm/s r/s w/s rkb/s wkb/s avgrq-sz avgqu-sz await r_await w_await SVCTM%util
SDA 44.50 39.27 117.28 29.32 11220.94 13126.70 332.17 65.77 462.79 9.80 2274.71 7.60 111.41
Dm-0 0.00 0.00 83.25 9.95 10515.18 4295.29 317.84 57.01 648.54 16.73 5935.79 11.48 107.02
Dm-1 0.00 0.00 57.07 40.84 228.27 163.35 8.00 93.84 979.61 13.94 2329.08 10.93 107.02

The Iostat command in the example'll print a every 2 seconds for 5 intervals; The-x tells Iostat to print out a extended.

In the example above, the Iostat is updated every 2 seconds, printing 5 times, and-x option is to print out the extended information

The 1st is from Iostat'll print statistics based on the "last Time" system was booted; For this reason in most circumstances the the "the" the "the" the "the" the "the" the "Iostat Every Sub-sequential The printed is based on the time since the previous interval. For example in our command we'll print a 5 times, the 2nd are disk statistics gathered since the 1st run O f the, the 3rd is based from the 2nd and.

The first Iostat report prints the statistics after the last boot of the system, which means that, in most cases, the first printed information should be ignored, and the remaining reports are based on the time of the previous interval. For example, this command will print 5 times, the second report is from the first report after a statistical information, the third is based on the second, and so on

In the above example the%utilized for SDA are 111.41% this are a good indicator that we problem lies with processes Writin G to SDA. While the test system in my example only has 1 disk this type of information is extremely helpful when the server has mult Iple disks as this can narrow down the search for which process is utilizing I/O.

In the above example, SDA's%utilized is 111.41%, which is a good indication that a process is being written to the SDA disk. Because the test system in the example has only one disk, when a server has more than one disk, this command can narrow down the range of processes we need to find.

Aside from%utilized there was a wealth of information in the output of iostat; Items such as read and write requests/millisecond (rrqm/s & wrqm/s), reads and writes per second (r/s & w/s) D Plenty more. In our example we seems to is read and write heavy this information'll be helpful when trying to identify the O Ffending process.

In addition to%utilized, we can get rich resource changes from iostat, such as read-write requests per millisecond (rrqm/s & wrqm/s)), read/write per second (r/s & w/s), and of course more. In the example above, our project seems to be reading and writing very much information. This is very useful for us to find the appropriate process.

Finding the processes that are causing high I/O

Find a process that causes a high I/O wait response

Iotop

# Iotop
Total DISK read:8.00 m/s | Total DISK write:20.36 m/s
TID prio USER disk READ disk WRITE swapin io> COMMAND
15758 BE/4 root 7.99/M 8.01/m 0.00% 61.97% bonnie++-n 0-u 0-r 239-s 478-f-b-d/tmp

The simplest method of finding which process is utilizing storage the most be to use the command iotop. After looking at the statistics it are easy to identify bonnie++ as the process causing the most I/O utilization on this MA Chine.

The easiest way to find the culprit is to use command Iotop, by looking at Iotop stats, we can easily guide bonnie++ is the culprit

While Iotop was a great command and easy to use, it was not installed on all (or the main) Linux distributions by default; And I personally prefer not to rely in commands that are not installed by default. A systems administrator may find themselves on a system where they simply cannot install the Non-defualt packages a Scheduled time which May is far too late depending on the issue.

Although Iotop is a very powerful tool and is simple to use, it is not installed by default on all Linux operating systems. And I personally prefer not to rely too much on those commands that are not installed by default. A system administrator may find that he cannot immediately install additional software other than the default program, unless it waits for the maintenance time behind it.

If Iotop is not available The below steps'll also allow you to narrow down the offending process/processes.

If Iotop is not installed, the following steps will teach you how to narrow down the scope of the target process

Process List "state"

Status of the process

The PS command has statistics for memory and CPUs but it does not have a statistic for disk I/O. While it could not have a statistic for I/O it does show the processes State which can is used to indicate whether or not a Process is waiting for I/O.

The PS command has a count of memory and CPU, but he does not have statistics on disk I/O, although he does not show disk I/O, but it shows the status of the progress we can use to know if a process is waiting for I/O

The PS state field provides the "processes" state; Below is a list of states to the man page.

The PS state indicates the state of the process now, and the following are the help documents for each state


PROCESS State Codes
D uninterruptible sleep (usually IO)
R running or runnable (on run queue)
S interruptible sleep (waiting for a event to complete)
T stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z Defunct ("zombie") process, terminated but not reaped by its parent.

Processes that are waiting for I/O are commonly in ' uninterruptible sleep ' state or ' D '; Given this information we can simply find the processes which are constantly in a.

The state of those waiting for I/O is generally "uninterruptible sleep", or "D", and we can easily find a process that is waiting for I/O

Example:

# for X in ' SEQ 1 1 10 '; Do Ps-eo State,pid,cmd | grep "^d"; echo "----"; Sleep 5; Done
D 248 [Jbd2/dm-0-8]
D 16528 bonnie++-n 0-u 0-r 239-s 478-f-b-d/tmp
----
D [kswapd0]
D 16528 bonnie++-n 0-u 0-r 239-s 478-f-b-d/tmp
----
D [kswapd0]
D 16528 bonnie++-n 0-u 0-r 239-s 478-f-b-d/tmp
----
D [kswapd0]
D 16528 bonnie++-n 0-u 0-r 239-s 478-f-b-d/tmp
----
D 16528 bonnie++-n 0-u 0-r 239-s 478-f-b-d/tmp
----

The above for loop would print the processes in a "D" state every 5 seconds for intervals.

The above example will loop the output state is the process of D, once every 5 seconds, altogether 10 times

From the output above the bonnie++ process with a PID of 16528 was waiting for I/O more often than any other process. At this point the bonnie++ seems likely to be causing I/O wait, but just because the process was in a uninterruptible Sleep state does not necessarily prove, it is the cause I/O wait.

From the output we can know that bonnie++ pid is 16528, it is more suspicious than other processes, it is the teacher in waiting, this time, bonnie++ seems to be we want to find the process, but, simply from its state, we have no way to prove that the bonnie++ caused i/ o Question

To help confirm we suspicions we can use The/proc file system. Within Each processes directory there is a file called "IO" which holds the same I/O statistics that iotop is utilizing.

To confirm our suspicions, we can use the/proc file system, which has a file called Io in each of the process directories, which holds information similar to Iotop.


# Cat/proc/16528/io
rchar:48752567
wchar:549961789
syscr:5967
syscw:67138
read_bytes:49020928
write_bytes:549961728
cancelled_write_bytes:0


The Read_bytes and write_bytes are the number of bytes that is specific process has written and read from the storage LA Yer. In this case the bonnie++ process has read MB and written 524 MB to disk. While for some processes I lot, in we example this are enough write and reads to cause the high I/O wait T The hat this system is seeing.

Read_bytes and Write_bytes are bytes that the process reads and writes from disk, in this case, the bonnie++ process read 46M of data and write 524 of the data to disk. Such data may not be much for other processes, but in our case, this is enough to cause problems with the system.

Finding what files are being written too heavily

Find the i/owait caused by that file

The lsof command would show your all of the files open by a specific process or all processes depending on the options provi Ded. From this list one can make a educated guess as to what files are likely being written to often on the size of the File and the amounts present in the "IO" file Within/proc.

The lsof command can show all the files that a process opens, or all processes that open a file. From this list, we can find out exactly what files are written, depending on the size of the file and the specific data of IO files in/proc

(This translation is a bit around, translation good direct message good http://www.503error.com)

To narrow down the output we'll use the-p <pid> options to print only files open by the specific process ID.

We can use P-<pid> to reduce output, PID is the specific process


# lsof-p 16528
COMMAND PID USER FD TYPE DEVICE size/off NODE NAME
bonnie++ 16528 root cwd DIR 252,0 4096 130597/tmp
&lt;truncated&gt;
bonnie++ 16528 root 8u REG 252,0 501219328 131869/tmp/bonnie.16528
bonnie++ 16528 root 9u REG 252,0 501219328 131869/tmp/bonnie.16528
bonnie++ 16528 root 10u REG 252,0 501219328 131869/tmp/bonnie.16528
bonnie++ 16528 root 11u REG 252,0 501219328 131869/tmp/bonnie.16528
bonnie++ 16528 root 12u REG 252,0 501219328 131869 &lt;strong&gt;/tmp/Bonnie.16528&lt;/strong&gt;

To even further confirm that this files are being written to the heavily we can-if the/tmp filesystem are part of SDA .

In order to further confirm that these files are frequently read and written, we can view them through the following command


# df/tmp
FileSystem 1k-blocks Used Available use% mounted on
/dev/mapper/workstation-root 7667140 2628608 4653920 37%/

From the output of DF we can determine That/tmp's the root logical volume in the workstation group.

From the command results above, we can determine that/tmp is the root of the logical disk of our environment


# Pvdisplay
---physical volume---
PV Name/dev/sda5
VG Name Workstation
PV Size 7.76 gib/not usable 2.00 MiB
Allocatable Yes
PE Size 4.00 MiB
Total PE 1986
Free PE 8
Allocated PE 1978
PV UUID clbabb-gclb-l5z3-tcj3-iok3-sq2p-rdpw5s

Using Pvdisplay We can have a THE/DEV/SDA5 partition part of the SDA disk are the partition that workstation The group is using and the turn is where/tmp exists. Given This information it's safe to say, the large files listed in the lsof above are likely the files being read &am P Written to frequently.

By Pvdisplay we can see that/DEV/SDA5 is actually the exact disk we use to create the logical disk. Through the above information we can safely say that the result of lsof is the file we are looking for

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.