Before introducing the disk I/O monitoring command, we need to understand the disk I/O performance monitoring metrics, as well as the disk performance of each indicator. Disk I/O performance metrics include:
Metric 1: I/O per second (iops or TPS)
For a disk, the continuous read or write operation of a disk is called a disk I/O. The iops of the disk is the sum of the number of consecutive read operations and write operations of the disk per second. This indicator is of important reference significance when small pieces of discontinuous data are transmitted.
Metric 2: Throughput)
It refers to the speed at which data streams are transmitted by hard disks. The transmitted data is the sum of read and write data. The Unit is generally kbps, MB/s, etc. This indicator provides an important reference for transmitting large pieces of discontinuous data.
Indicator 3: Average I/O data size
The average I/O data size is the throughput divided by the I/O quantity. This indicator is important for revealing the disk usage mode. Generally, if the average I/O data size is smaller than 32 KB, the disk usage mode is primarily random access. If the average I/O data size is greater than 32 KB, it can be considered that the disk usage mode focuses on sequential access.
Metric 4: Percentage of disk activity time (utilization)
The percentage of the time when the disk is active, that is, the disk usage. The disk is active in data transmission and processing commands (such as seeking. The disk utilization is proportional to the resource contention degree and is inversely proportional to the performance. That is to say, the higher the disk utilization, the more serious the resource contention, the worse the performance, and the longer the response time. In general, if the disk usage exceeds 70%, the application process will spend a long time waiting for I/O to complete, because most processes will be blocked or sleep while waiting.
Metric 5: service time)
The execution time of disk read or write operations, including seek, rotation delay, and data transmission time. The size is generally related to disk performance, and the CPU/memory load will also affect it. Too many requests will indirectly lead to an increase in service time. If the value lasts for more than 20 ms, it may affect upper-layer applications.
Metric 6: I/O queue length (queue length)
The number of I/O requests to be processed. This value increases if the I/O Request pressure continuously exceeds the disk processing capability. If the queue length of a single disk exceeds 2, it is generally considered that the disk has an I/O performance problem. Note that if the disk is a virtual logical drive of the disk array, You need to divide the value by the actual number of physical disks that constitute the logical drive, to obtain the length of the I/O wait queue for an average single hard disk.
Metric 7: Waiting Time (wait time)
It refers to the time when the disk read or write operations are waiting for execution, that is, the waiting time in the queue. If I/O requests continue to exceed the disk processing capacity, it means that I/O requests that cannot be processed have to wait for a long time in the queue.
By monitoring the preceding metrics, and comparing these metrics with historical data, empirical data, and disk nominal values, if necessary, the CPU, memory, and swap partition usage are combined, it is not difficult to find potential or problems with disk I/O. But what if we want to avoid and solve these problems? This requires the use of knowledge and technology in disk I/O performance optimization. Limited by the subject and length of this article, we only list some common Optimization Methods for your reference:
1. Adjust the Data Layout and allocate I/O requests to all physical disks as much as possible.
2. For raid disk arrays, try to make the application Program I/O equal to the Strip size or a multiple of the Strip size. Select appropriate raid methods, such as raid 10 and RAID 5.
3. Increase the queue depth of the disk driver, but do not exceed the disk processing capacity. Otherwise, some I/O requests will be resent due to loss, which will reduce the performance.
4. Application Cache Technology reduces the number of times an application accesses a disk. The cache technology can be applied at the file system level or application level.
5. because most databases already include optimized cache technology, database I/O should directly access the original disk partition (raw partition) or use the dio Technology (direct Io) that bypasses the file system cache)
6. the memory read/write bandwidth is far superior to the direct disk I/O operation performance, placing frequently accessed files or data in the memory.