Monitoring and Analysis of disk I/O performance in Linux

Source: Internet
Author: User
Tags ftp transfer high cpu usage

Monitoring and Analysis of disk I/O performance in Linux 18:10:23

Tags: Performance Monitoring Analysis

Linux
Disk Io
Leisure
SuSE Linux copyright statement: original works are not reprinted! Otherwise, legal liability will be held.

In the past two days, I found that a server used for testing often has a high load, but the CPU and memory consumption is very small, which is very strange, after diagnosis, it is found that disk I/O consumption is relatively large due to high capacity of test data, because the cache is a small file and the number of files is relatively large, therefore, Io consumption is very high when concurrency is high.
So how can we quickly find that high concurrency is caused by high disk Io overhead?
I,Use the information in the top command to observe

The REDLINE parameters are described as follows:
Tasks: 437 total process count
4. Number of Running Processes of Running
430 sleeping sleep Processes
3 stopped process count
0 zombie botnets
CPU (s ):
CPU usage of 7.1% us user space
4.2% Sy CPU usage in kernel space
0.0% percentage of CPU used by processes that have changed their priorities in Ni user process space
76.8% ID idle CPU percentage
12% wa CPU time percentage waiting for Input and Output
The percentage of 12% wa can roughly reflect that the current disk IO requests waiting for input and output are too frequent.

For further analysis, we track key process locating programs
# Strace-P 28644 (high CPU usage)

It indicates that in the multi-thread condition, if concurrent operations are too frequent,Semtimedop fails to be called, and input/output fails.
Go to the Program # ps-Ef | grep 28644
 

It can be seen that the ora_lgwr_nms program causes a high read/write overhead.

2. Use the iostat command to observe
Disk I/O performance is an important indicator to measure the overall performance of computers. Linux provides the iostat command to obtain disk input/output (I/O) statistics.
# Iostat-x 1: Complete statistics, once per second.

The value of iowait is relatively large, indicating frequent reading and writing.
# Iostat-P 1: The read/write status of each partition is measured every second.

Run the # mount command to find the corresponding sda5/OPT partition and sdb8/Data Partition.

Locate the Data Partition and view the database archive. It is found that four documents are archived within one minute, and each file is as large as 48 mb. Therefore, writing should be very frequent, resulting in high disk I/O overhead.

However, the disk overhead of the OPT partition is relatively high due to FTP transfer.
After the analysis and positioning are completed, adjust the FTP and database archive for the relevant issues, and then check again.

Conclusion: Top and iostat are common commands. It is convenient to analyze and locate problems through flexible application of basic commands, in particular, the selection and use of basic command parameters is worth studying.
 

Supplement: disk iops knowledge

Iops(Input/output per second) is the input output per second (or read/write count), which is one of the main indicators to measure disk performance. Iops refers to the number of I/O requests that the system can process per second, i/O requests are generally read or write data operation requests. Applications with frequent random reads and writes, such as OLTP (online
Transaction processing), iops is a key indicator. Another important indicator isData throughput(Throughput) refers to the amount of data that can be successfully transferred per unit of time. For applications with a large number of sequential reads and writes, such as VOD (video on demand), more attention is given to throughput indicators.

A traditional disk is essentially a type of mechanical device, such as FC, SAS, and SATA disks, with a speed of 5400/7200/10 k/15 K rpm. The key factor affecting the disk is the disk service time, that is, the time it takes for the disk to complete an I/O request. It consists of three parts: Seeking time, rotation delay, and data transmission time.
Seek timeTseek refers to the time required to move the read/write head to the correct track. The shorter the tracing time, the faster I/O operations are. Currently, the average tracing time of a disk is generally 3-15 ms.
Rotation DelayTrotation refers to the time required for disk rotation to move the sector where the request data is located to the bottom of the read/write head. The rotation delay depends on the disk speed, which is usually expressed by 1/2 of the time required for disk rotation for one week. For example, the average rotation latency of a 7200 RPM disk is about 60*1000/7200/2 = 4.17 ms, while the average rotation latency of a 15000 rpm disk is about 2 ms.
Data transmission timeTtransfer refers to the time required to complete the data requested for transmission. It depends on the data transmission rate, and its value is equal to the data size divided by the data transmission rate. At present, IDE/ATA can reach 133 Mb/s, and sata ii can reach the interface data transmission rate of 300 MB/S. The data transmission time is usually far earlier than the first two parts.

Therefore, theoretically, the maximum iops of the disk can be calculated, that isIops = 1000 MS/(tseek + troatation ),Ignore the data transmission time. Assuming that the average physical tracing time of a disk is 3 ms, and the disk speed is, 10 K, and 15 K rpm, the theoretical maximum iops of the disk is,
Iops = 1000/(3 + 60000/7200/2) = 140
Iops = 1000/(3 + 60000/10000/2) = 167
Iops = 1000/(3 + 60000/15000/2) = 200

Iops mainly depends on the array algorithm, cache hit rate, and number of disks. The array algorithms vary with different arrays. there is no difference in read iops between RAID5 and raid10. However, for the same business, write iops eventually falls on each disk. If the write iops limit for each disk is reached, performance will be affected. For RAID5, there are actually four Io operations for each write, and only two Io operations are performed for raid10. Therefore, raid10 is faster than RAID5.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.