Monitoring and Analysis of disk I/O performance in Linux

Last Update:2018-12-04 Source: Internet

Author: User

Tags ftp transfer high cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Monitoring and Analysis of disk I/O performance in Linux 18:10:23

Tags: Performance Monitoring Analysis

Linux
Disk Io
Leisure
SuSE Linux copyright statement: original works are not reprinted! Otherwise, legal liability will be held.

In the past two days, I found that a server used for testing often has a high load, but the CPU and memory consumption is very small, which is very strange, after diagnosis, it is found that disk I/O consumption is relatively large due to high capacity of test data, because the cache is a small file and the number of files is relatively large, therefore, Io consumption is very high when concurrency is high.
So how can we quickly find that high concurrency is caused by high disk Io overhead?
I,Use the information in the top command to observe

The REDLINE parameters are described as follows:
Tasks: 437 total process count
4. Number of Running Processes of Running
430 sleeping sleep Processes
3 stopped process count
0 zombie botnets
CPU (s ):
CPU usage of 7.1% us user space
4.2% Sy CPU usage in kernel space
0.0% percentage of CPU used by processes that have changed their priorities in Ni user process space
76.8% ID idle CPU percentage
12% wa CPU time percentage waiting for Input and Output
The percentage of 12% wa can roughly reflect that the current disk IO requests waiting for input and output are too frequent.

For further analysis, we track key process locating programs
# Strace-P 28644 (high CPU usage)

It indicates that in the multi-thread condition, if concurrent operations are too frequent,Semtimedop fails to be called, and input/output fails.
Go to the Program # ps-Ef | grep 28644

It can be seen that the ora_lgwr_nms program causes a high read/write overhead.

2. Use the iostat command to observe
Disk I/O performance is an important indicator to measure the overall performance of computers. Linux provides the iostat command to obtain disk input/output (I/O) statistics.
# Iostat-x 1: Complete statistics, once per second.

The value of iowait is relatively large, indicating frequent reading and writing.
# Iostat-P 1: The read/write status of each partition is measured every second.

Run the # mount command to find the corresponding sda5/OPT partition and sdb8/Data Partition.

Locate the Data Partition and view the database archive. It is found that four documents are archived within one minute, and each file is as large as 48 mb. Therefore, writing should be very frequent, resulting in high disk I/O overhead.

However, the disk overhead of the OPT partition is relatively high due to FTP transfer.
After the analysis and positioning are completed, adjust the FTP and database archive for the relevant issues, and then check again.

Conclusion: Top and iostat are common commands. It is convenient to analyze and locate problems through flexible application of basic commands, in particular, the selection and use of basic command parameters is worth studying.

Supplement: disk iops knowledge

Iops(Input/output per second) is the input output per second (or read/write count), which is one of the main indicators to measure disk performance. Iops refers to the number of I/O requests that the system can process per second, i/O requests are generally read or write data operation requests. Applications with frequent random reads and writes, such as OLTP (online
Transaction processing), iops is a key indicator. Another important indicator isData throughput(Throughput) refers to the amount of data that can be successfully transferred per unit of time. For applications with a large number of sequential reads and writes, such as VOD (video on demand), more attention is given to throughput indicators.

A traditional disk is essentially a type of mechanical device, such as FC, SAS, and SATA disks, with a speed of 5400/7200/10 k/15 K rpm. The key factor affecting the disk is the disk service time, that is, the time it takes for the disk to complete an I/O request. It consists of three parts: Seeking time, rotation delay, and data transmission time.
Seek timeTseek refers to the time required to move the read/write head to the correct track. The shorter the tracing time, the faster I/O operations are. Currently, the average tracing time of a disk is generally 3-15 ms.
Rotation DelayTrotation refers to the time required for disk rotation to move the sector where the request data is located to the bottom of the read/write head. The rotation delay depends on the disk speed, which is usually expressed by 1/2 of the time required for disk rotation for one week. For example, the average rotation latency of a 7200 RPM disk is about 60*1000/7200/2 = 4.17 ms, while the average rotation latency of a 15000 rpm disk is about 2 ms.
Data transmission timeTtransfer refers to the time required to complete the data requested for transmission. It depends on the data transmission rate, and its value is equal to the data size divided by the data transmission rate. At present, IDE/ATA can reach 133 Mb/s, and sata ii can reach the interface data transmission rate of 300 MB/S. The data transmission time is usually far earlier than the first two parts.

Therefore, theoretically, the maximum iops of the disk can be calculated, that isIops = 1000 MS/(tseek + troatation ),Ignore the data transmission time. Assuming that the average physical tracing time of a disk is 3 ms, and the disk speed is, 10 K, and 15 K rpm, the theoretical maximum iops of the disk is,
Iops = 1000/(3 + 60000/7200/2) = 140
Iops = 1000/(3 + 60000/10000/2) = 167
Iops = 1000/(3 + 60000/15000/2) = 200

Iops mainly depends on the array algorithm, cache hit rate, and number of disks. The array algorithms vary with different arrays. there is no difference in read iops between RAID5 and raid10. However, for the same business, write iops eventually falls on each disk. If the write iops limit for each disk is reached, performance will be affected. For RAID5, there are actually four Io operations for each write, and only two Io operations are performed for raid10. Therefore, raid10 is faster than RAID5.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More