Tracking high IO waits in CentOS

Source: Internet
Author: User

Tracking high IO waits in CentOS
The first sign of a high IO wait problem is the average system load. The Calculation of Server Load balancer is based on the CPU utilization, that is, the number of processes that use or wait for the CPU. Of course, on the Linux platform, almost all processes are in an uninterrupted sleep state. The baseline of Server Load balancer can be interpreted as that the CPU is fully utilized on a machine with a CPU core. Therefore, for a 4-core machine, if the average complexity of the system is 4, it means that the machine has enough resources to handle the work it needs to do, of course, it is barely. In the same 4-core system, if the average complexity is 8, it means that the server system needs 8 cores to process the work to be done, but now there are only 4 cores, so it is overloaded. If the system displays a high average load, but the CPU usage of the system and user is low, you need to observe the IO wait (I/O wait ). In the linuc system, IO wait has a great impact on the system load, mainly because one or more cores may be blocked by disk I/O or network I/O, only after the disk I/O or network I/O is completed can the tasks on these cores (that is, processes) be performed. These processes use ps aux to check whether they are in the "D" state, that is, they cannot be interrupted. It is one thing to find that the process is waiting for IO to complete. Another reason is to verify the high IO wait. Use iostat-x 1 to display the I/O status of the physical storage device in use: [username @ server ~] $ Iostat-x 1 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm % util cciss/c0d0 0.08 5.94 1.28 2.75 17.34 69.52 21.60 0.11 4.12 1.66 cciss/c0d0p1 0.00 0.00 0.00 0.00 0.00 0.00 5.30 0.00 8.76 5.98 0.00 cciss/c0d0p2 0.00 0.00 0.00 0.00 0.00 0.00 58.45 0.00 7.79 3.21 cciss/c0d0p3 0.00 0.08 5.94 1.28 2.75 17.34 69.52 21.60 0.11 26.82 4.12 1.66 from the above, obviously, the device/dev/cciss/c0d0p3. However, we didn't mount it to find a device. In fact, it is actually an LVM device. If you are using LVM as the storage, you should find that iostat should be a little messy. LVM uses the device mapper subsystem to map file systems to physical devices, so iostat may display multiple devices, such as/dev/dm-0 and/dev/dm-1. However, the output of "df-h" does not display the device mapper path, but prints the LVM path. The simplest way is to add the option "-N" to the iostat parameter ". [Username @ server ~] $ Iostat-xN 1 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm % util vg1-root 0.00 0.00 0.09 3.01 0.85 24.08 8.05 0.08 24.69 1.79 0.55 vg1-home 0.00 0.00 0.05 1.46 0.97 11.69 8.36 0.03 19.89 3.76 vg1-opt 0.57 0.00 0.00 0.03 1.56 0.46 12.48 8.12 0.05 29.89 3.53 vg1-tmp 0.56 0.00 0.00 0.00 0.06 0.00 0.45 8.00 vg1-usr 0.00 24.85 4.90 0.03 0.00 1.41 5.85 11.28 8.38 0. 07 32.48 3.11 0.63 vg1-var 0.00 0.00 0.55 1.19 9.21 9.54 10.74 0.04 24.10 4.24 0.74 vg1-swaplv 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 3.98 1.88 crop output information from the iostat command above for simplicity. The IO waits displayed by each listed file system are unacceptable. Observe the data marked with "await" in column 10. In contrast, the await time of the file system/usr is much higher. Let's analyze the file system and run the command "fuser-vm/opt" to check which processes are accessing the file system. The process list is as follows. Root @ server:/root> fuser-vm/opt user pid access command/opt: db2fenc1 1067 .... m db2fmp db2fenc1 1071 .... m db2fmp db2fenc1 2560 .... m db2fmp db2fenc1 5221 .... m db2fmp the current server has 112 DB2 processes accessing the/opt file system. For simplicity, list four items. It seems that the cause of the problem has been found. on the server, the database is configured to provide faster SAN access, and the operating system can use local disks. You can call the DBA (Database Administrator) to ask how the configuration can be done. Note LVM and device er ER in the last group. The output of the "Iostat-xN" command shows the logical volume name. However, you can use the "ls-lrt/dev/mapper" command to find the ing table. The dm-in the sixth column of the output information corresponds to the device name in iostat. Sometimes, there is nothing to do in the operating system or application layer, except for selecting a faster disk, there is no other choice. Fortunately, the price for fast disk access, such as SAN or SSD, is gradually decreasing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.