Detailed description of Linux I/O performance monitoring tool iostat

Source: Internet
Author: User
Tags disk usage
Linux I/O performance monitoring tool iostat detailed description of Linux system performance problems, generally we can use top, iostat, free, vmstat and other commands to check the initial problem. Iostat provides richer IO performance status data. 1. basic use $ iostat-d-k1... linux I/O performance monitoring tool iostat detailed description of Linux system performance problems, generally we can use top, iostat, free, vmstat and other commands to check the initial problem. Iostat provides richer IO performance status data. 1. basically, the $ iostat-d-k 1 10 parameter-d is used to display the usage status of the device (disk).-k uses the Kilobytes as the unit by some large powers that use block; 1 10 indicates that the data display is refreshed once every 1 second, and 10 times in total. $ Iostat-d-k 1 10 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda 39.29 21.14 1.44 441339807 0.00 0.00 0.00 1623 1.32 1.43 4.54 29834273 6.30 0.85 24.95 17816289 0.85 0.46 3.40 9543503 0.00 0.00 0.00 550 0.00 0.00 0.00 406 0sda8 0.00 0.00 0.00 406 0sda9 0.00 0.00 0.00 406 0sda10 60.68 18.35 71.43 383002263 Device: tps KB_read/s kB_wrtn/s kB_read kB_wrtnsda 327.55 5159.18 102.04 5056 100sda1 0.00 0.00 0.00 0 0tps: the number of transmissions per second (Indicate the number of transfers per second that were issued to the device .). "One transmission" means "one I/O request ". Multiple logical requests may be merged into one I/O request ". The size of a "one-time transmission" request is unknown. KB_read/s: the amount of data read from the device (drive expressed) per second; kB_wrtn/s: the amount of data written to the device (drive expressed) per second; kB_read: the total amount of data read; kB_wrtn: the total amount of data written. these units are Kilobytes. In the above example, we can see the statistics of the disk sda and its various partitions. at that time, the total TPS of the disk is 39.29. below is the TPS of each partition. (Because it is an instantaneous value, the total TPS is not exactly equal to the Total TPS of each partition.) 2. use the-x parameter to obtain more statistics. Iostat-d-x-k 1 10 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm % utilsda 1.56 28.31 7.80 31.49 42.51 21.26 1.46 1.16 0.03 0.79 2.62 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm % utilsda 2.00 20.00 381.00 7.00 12320.00 6160.00 108.00 32.31 1.75 4.50 2.17 84.20 rrqm/s: the number of read requests related to this device per second is Merge (when the system call needs to read data) VFS sends requests to various FS. If FS finds that different read requests read data in the same Block, FS merges the requests into Merge); wrqm/s: the number of write requests related to this device per second is Merge. Rsec/s: Number of read sectors per second; wsec/: number of write sectors per second. R/s: The number of read requests that were issued to the device per second; w/s: The number of write requests that were issued to the device per second; await: the average time (in milliseconds) for processing each IO request ). It can be understood as the IO response time. Generally, the system IO response time should be less than 5 ms. if it is greater than 10 ms, it will be relatively large. % Util: all IO processing time within the statistical time, divided by the total statistical time. For example, if the statistical interval is 1 second, the device processes IO for 0.8 seconds, and the device is idle for 0.2 seconds, % util = 0.8/1 = 80%, therefore, this parameter implies the degree to which the device is busy. Generally, if this parameter is set to 100%, it indicates that the device is nearly running at full capacity (of course, if it is a multi-disk, even if % util is 100%, because of the concurrency of the disk, so the disk usage may not be a bottleneck ). 3. -The iostat parameter can also be used to obtain the cpu status value: iostat-c 1 10avg-cpu: % user % nice % sys % iowait % idle1.98 0.00 0.35 11.45 86.22avg-cpu: % user % nice % sys % iowait % idle1.62 0.00 0.25 34.46 63.674. common usage $ iostat-d-k 1 10 # View TPS and throughput information iostat-d-x-k 1 10 # View device usage (% util), response time (await) iostat-c 1 10 # View cpu status 5. instance Analysis $ iostat-d-k 1 | grep sda10Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtnsda10 60.72 18.95 71.53 14 93241908sda10 299.02 4266.67 129.41 4352 132sda10 483.84 4589.90 4117.17 4544 4076sda10 218.00 3360.00 100.00 3360 100sda10 546.00 8784.00 124.00 8784 124sda10 827.00 13232.00 136.00 13232 the above shows that the average number of disk transfers per second is about 136; the disk reads about 5 MB per second and writes about 1 MB. Iostat-d-x-k 1 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm % utilsda 1.56 28.31 7.84 31.50 43.65 21.82 1.58 1.19 0.03 0.80 2.61 10.29sda 1.98 24.75 419.80 6.93 13465.35 253.47 6732.67 126.73 32.15 2.00 4.70 85.25sda 2.00 3.06 41.84 444.90 54.08 14204.08 2048.98 7102.04 1024.49 32.57 2.10 4.21 1.85 see the average disk response time <5 ms, disk Usage> 80. The disk response is normal, but it is busy. Extended: rrqm/s: Number of merge read operations per second. delta (rmerge)/swrqm/s: Number of merge write operations per second. that is, delta (wmerge)/sr/s: the number of read I/O devices completed per second. that is, delta (rio)/sw/s: the number of write I/O devices completed per second. that is, delta (wio)/srsec/s: Number of read sectors per second. that is, delta (rsect)/swsec/s: number of write sectors per second. that is, delta (wsect)/srkB/s: the number of K bytes read per second. it is half of rsect/s because the size of each slice is 512 bytes. (to be calculated) wkB/s: number of K bytes written per second. half of wsect/s. avgrq-sz: average data size (slice) of each device I/O operation ). delta (rsect + wsect)/delta (rio + wio) avgqu-sz: average I/O queue length. that is, delta (ave Q)/s/1000 (because aveq is measured in milliseconds ). await: average wait time (milliseconds) for each device I/O operation ). that is, delta (ruse + wuse)/delta (rio + wio) svctm: average service time per device I/O operation (MS ). that is, delta (use)/delta (rio + wio) % util: the percentage of time in one second for I/O operations, or the number of I/O queues in one second is not empty. that is, delta (use)/s/1000 (because the unit of use is millisecond) if % util is close to 100%, it indicates that too many I/O requests are generated and the I/O system is fully loaded, this disk may have a bottleneck. when the idle is less than 70% I/O, the load is high. Generally, the read speed is wait. you can also view the parameters B (number of processes waiting for resources) and wa in combination with vmstat (percentage of CPU time occupied by I/O wait, higher than 30% when I/O pressure is high) in addition, more await parameters are required than svc. Tm for reference. IO problems may occur if the difference is too high. avgqu-sz is also a place to note when performing IO optimization. this is the data size of each operation. if the number of times is large but the data size is small, in fact, IO will be very small. if the data is big, the IO Data will be high. you can also use avgqu-sz × (r/s or w/s) = rsec/s or wsec/s. that is to say, the speed of reading is determined by this. in addition, we can also refer to svctm which is generally smaller than await (because the wait time of the simultaneously waiting request is calculated repeatedly). The size of svctm is generally related to disk performance, the CPU/memory load will also affect it, and too many requests will indirectly lead to the increase of svctm. the size of await generally depends on the service time (svctm), the length of the I/O queue, and the mode in which I/O requests are sent. if svctm is close to await, it means that I/O has almost no waiting time. if await is much larger than svctm, it means I/O The queue is too long, and the response time of the application is slow. if the response time exceeds the allowable range, you can consider changing the faster disk, adjust the kernel elevator algorithm, and optimize the application, or upgrade the CPU. the queue length (avgqu-sz) can also be used as an indicator to measure the system I/O load. However, because avgqu-sz is based on the average per unit time, therefore, it cannot reflect the instantaneous I/O flood. A good example for others. (I/O system. supermarket queuing) for example, how do we decide which payment platform to pay when we queue for checkout in the supermarket? First, let's look at the number of people in the queue. is the total number of five people faster than 20? In addition to the number of people, we often look at the number of items purchased by the previous person. if there is a big mom who has purchased food for a week, we can consider changing the team. there is also the cashier's speed. if you have a newbie who doesn't even know about the money, you will have to wait. in addition, the timing is also very important. it may be 5 minutes ago, but the money was crowded, and now people go to the building, but it is refreshing to pay the money. of course, the premise is that what has been done in the past five minutes is more meaningful than queuing (but I have not found anything boring than queuing ). the I/O system has many similarities with supermarket queues: r/s + w/s is similar to the average queue length of the total number of payers (avgqu-sz) similar to the average number of queues per unit time average service time (svctm) similar to the cashier's receipt speed average wait time (await) similar to the average wait time of each person, the average I/O data (avgrq-sz) is similar to the average I/O operation rate (% util) of the items bought by each person) it is similar to the time ratio when someone queues before the cashier. we can analyze the I/O request mode and the I/O speed and response time based on the data.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.