Some time ago, a problem caused me to study Io queue depth.
The mpt2sas driver module of the Linux Kernel on the Storage Server sets max_queue_depth to 1024, which causes the system to get stuck when loading the driver, but it is okay to adjust it to 512.
After reading a lot of information in this area, I finally figured it out.
We often set max_queue_depth to a large value in pursuit of system performance. However, the larger the value is, the more helpful the performance is.
All of the following content is from reprinted. I am so lazy!
(1)
ExploringEffect of I/O queue on disk Performance
Repost the source text in the first article:EMC Chinese support forumHttps://community.emc.com/go/chinese
Introduction
Data is stored in the disk queue during information transmission. Experiments show that as the server performance continues to improve, disk I/O queues often become the primary bottleneck affecting the disk response speed. This article takes the AIX system as an example to describe how I/O queues work on disks, monitoring commands, and how to optimize them to improve disk performance.
UseI/O queue meaning:
Why parallel processing of disk I/O? The main purpose is to improve the performance of applications. This is especially important for Virtual Disks (or Luns) composed of multiple physical disks. If one I/O is submitted at a time, although the response time is short, the system throughput is very small. In comparison, submitting multiple I/O operations at a time can shorten the head movement distance (through the elevator algorithm) and increase iops. If an elevator can only take one person at a time, each person can quickly reach the destination (Response Time), but it takes a long wait time (queue length ). Therefore, submitting multiple I/O requests to the disk system at one time can balance the throughput and overall response time.
Theoretically, the disk iops depends on the queue length, which is the average I/O response time of the queue. Assume that the queue length is 3 and the average Io response time is 10 ms, the maximum throughput is 300 iops.
Where the IO queue is located:
Taking the AIX system as an example, the IO Stack from the application layer to the disk physical layer is shown as follows. Io traverses the stack in the order from top to bottom:
- Application Layer
- File System layer (optional)
- LVM device driver layer (optional)
- SDD, sddpcm, or other multi-path driver layers (if used)
- Hdisk device driver layer
- Adapter device driver layer
- Disk Interface Layer
- Disk Subsystem Layer
- Disk Layer
AIX monitors Io at each layer of the stack, so each layer of the stack has an IO queue. Generally, if the number of I/O operations on each layer exceeds the maximum number of queue lengths, these I/O operations will be saved in the waiting queue until the requested resources are obtained. At the file system layer, the file system cache limits the maximum number of executable I/O operations for each file system. The maximum number of I/O operations that can be performed on the driver layer of the LVM device is limited by the hdisk cache. On the SDD layer, if the qdepth_enable attribute of the DPO device is set to yes, an I/O queue is created, but some versions cannot set a queue. No queue processing is performed before sddpcm sends Io to the drive layer of the disk device.Hdisk uses the queue_depth parameter to set the maximum number of response IoWhile the FC adaptation layer parameter is num_cmd_elems. The disk subsystem layer has an I/O queue. A single physical disk can receive multiple I/O requests but can process only one I/O at a time.
Io queue monitoring command:
Take AIX as an example. For AIX 5.3 and later versions, the iostat and Sar-D commands can be used to monitor the hdisk queue. The iostat-D command output is as follows:
Hdisk6 xfer: % tm_act bps tps bread bwrtn
4.7 2.2 m 19.0 0.0 2.2 m
Read: RPS avgserv minserv maxserv timeouts fails
0.0 0.0 0.0 0.0 0 0
Write: WPS avgserv minserv maxserv timeouts fails
19.0 38.9 1.1 190.2 0 0
Queue: avgtime mintime maxtime avgwqsz avgsqsz sqfull
15.0 0.0 83.7 0.0 0.0 136
Here, avgwqsz is the average length of the waiting queue, and avgsqsz is the average length of the response queue. The average waiting time in the waiting queue is avgtime. The sqfull value indicates the number of I/O requests submitted to the full queue per second. For disk subsystems with cache, the IO response time varies. The iostat-D command displays statistics after the system starts.
From the application perspective, the total time for processing Io is the response time plus the time in the hdisk waiting queue.
The output of the SAR-D command is as follows:
16:50:59 device % busy avque R + w/s KBS/s avwait avserv
16:51:00 hdisk1 0 0.0 0 0 0.0 0.0
Hdisk0 0 0.0 0 0.0 0.0
Avwait and avserv are the waiting queue and response queue time respectively. avque in AIX 5.3 and later versions represents the average Io quantity in the waiting queue.
Optimization Method:
First, you should not blindly Add the preceding queue parameter values. This may cause Disk Subsystem overload or device configuration error at startup. Therefore, increasing the queue_depths value of hdisk alone is not the best method. You should adjust the maximum number of I/O submissions at the same time. When the number of I/O messages sent to the queue_depths and the disk subsystem increases at the same time, the I/O response time may increase, but the throughput will also increase. When the IO response time is close to the disk timeout time, it means that the submitted Io exceeds the limit that the disk can process. If I/O times out and an error log shows that I/O cannot be completed, it indicates that there may be hardware problems or the queue needs to be shortened.
One rule for adjusting queue_depths is: for random read/write or insufficient queues, if the IO response time exceeds 15 ms, the queue_depths value cannot be increased. Once the IO response time increases, the bottleneck will be transferred from the disk and adapter queue to the disk subsystem. Adjust the queue length based on: 1) Number of IO requests generated by the actual application, 2) use test tools to observe the processing capacity of the disk subsystem. Among them, 1) is the main basis.
Io queues have the following statuses:
- The queue is full and I/O is in the hdisk or Adapter Driver layer.
- The queue is not full, and the IO response time is short.
- The queue is not full and the IO response time is long.
- When the queue is not full, the I/O submission speed is faster than the storage processing speed and causes Io loss.
We need to adjust the queue status to 2 or 3. Case 3 indicates that the bottleneck is not in the hdisk driver layer, but may be in the disk subsystem itself, or in the Adapter Driver layer or San.
4th cases should be avoided. Limited by the memory size for storing IO requests and data, all disks and disk subsystems have a limit on the number of I/O executions. When storage Io is lost, the host times out, Io is committed again, and events waiting for this Io are paused. The CPU has done a lot of work to handle Io, which should be avoided. If I/O fails, the application will crash or cause more serious results. Therefore, you must carefully confirm the storage processing limit.
Reasonable average Io Response Time:
If there is no IO in the queue, a read operation takes 0 to 15 ms, depending on the addressing time, disk speed, and data transmission time. Then the data is moved from the storage to the host. Sometimes the data is stored in the disk read cache. In this case, the IO response time is about 1 ms. The average Io response time for a large disk system is 5-10 ms under normal operation. When the random reading of small data takes more than 15 ms, it indicates that the storage is busy.
Write operations usually write data into the cache, and the average time consumption is less than 2.5 ms. But there are also exceptions: If the storage is synchronized to mirror the data to the remote end, the write operation will take a longer time. If the amount of data written is large (more than 64 KB), the data transmission time will increase significantly. If no cache is available, the write time is about the same as the read time.
If I/O is a mass sequential read/write operation, except for a long transmission time, I/O will be stored in the Disk Physical Layer queue, and the IO response time is much higher than the average value. For example, if the application submits 50 io (50 64 kB sequential reads), the first few Io will get a fast response time, and the last Io must wait for the other 49 to complete, this takes a long response time.
(2)
Queue_depth parameter introduction and adjustment steps
10:01:30 | category: AIx | report | font size subscription
Queue_depth parameter size |
|
|
In the AIX environment, setting the queue depth (queue_depth) of the fastt Logical Disk is very important to system performance. For large fastt configurations, many volumes are connected to the host. This setting is more critical for high reliability. The queue depth is too large, leading to file system loss or host crashes. The following describes how to correctly set the disk queue depth and its calculation method. |
We can use the following formula to determine the maximum queue depth: 512/(number of hosts * Number of Luns per host) For example, if a system has four hosts and each has 32 Luns (this is the maximum number of Luns per AIX host), the maximum queue depth should be 4: 512/(4*32) = 4 In this case, you should set the queue_depth attribute of hdiskx to the following: # Chdev-l hdiskx-A queue_depth = 4-P X indicates the corresponding disk number. You can use iostat-D to view Sqfull indicates the number of times the queue_deeth has exceeded since the system was started. IBM engineers recommend that the queue_depth value be between 40 and How to Set: |
|
|
The queue_depth parameter affects disk I/O performance, especially in I/O intensive applications such as databases. You can adjust this parameter to improve the overall application performance. The following are the steps and precautions for adjusting this parameter on AIX 5.3 and IBM ds4300.
The following physical disk hdisk2 is based on the RAID 5 on IBM storage, which belongs to the VG datavg.
1. Back up datavg first. Make any adjustments in the production environment. Be sure to keep in mind that security first, backup is essential.
# Smit savevg
2. view the value of queue_depth on hdisk2 to be modified.
# Lsattr-El hdisk2 | grep queue_depth
3. First, the file system on umount datavg.
# Umount/U2
4. Vary off VG.
# Varyoffvg datavg
5. Delete the disk hdisk2.
# Rmdev-l hdisk2
6. Modify the disk hdisk2 queue_depth parameter.
# Chdev-l hdisk2-A queue_depth = 16 (this value is the specific queue_depth value to be modified)-P
7. Add disk hdisk2.
# Mkdev-l hdisk2
8. vary on VG.
# Varyonvg datavg
9. Mount the file system on datavg
# Mount/U2
10. Check whether the queue_depth parameter is successfully modified.
# Lsattr-El hdisk2 | grep queue_depth
As shown in the preceding figure, if the queue_depth value has been changed to the required value, the entire process is completed. If there are conditions, it is best to be heavy? Click machine. It should be noted that if this value is set improperly, it may cause the system to hang or crash. I personally reported that the system encountered an exception due to this value being too large. The INIT process always occupies about 20% of the CPU, the syscal is more than K for a long time, and the waitqueue value is also high, this seriously affects system performance. Therefore, you should pay attention to monitoring for a period of time after adjusting this value until it is adjusted to a suitable value. Citation address: http://www.aixchina.net/home/space.php? Uid = 10273 & Do = Blog & id = 19603 http://www-900.ibm.com/cn/support/viewdoc/detail? Docid = 1314043000000 http://www.360doc.com/content/10/1112/10/2245786_68687833.shtml