Linux system load check method

Source: Internet
Author: User
Method 1: loadAverage1.1 in linux: What is Load? What is LoadAverage? Load is the measure of how much the computer is working (WikiPedia: thesystemLoadisameasureoftheamountofworkthataco... linux system load check Method 1: Load Average 1.1: What is Load? What is Load Average? Load is the length of the process queue (WikiPedia: the system Load is a measure of the amount of work that a compute system is doing. Load Average is the Average Load within a period of time (1 minute, 5 minutes, 15 minutes. 1.2: View command: www.2cto.com w or uptime or procinfo or top load average: 0.02, 0.27, 0.17 1 per/minute 5 per/minute 15 per/minute 1.3: how can I determine if the system has been Over Load? Generally, the system determines the number of CPUs. If the average load is always 1.2, and you have two CPUs. There will be no insufficient cpu. That is, the average Load is less than 1.4 of the number of CPUs: the average Load and Capacity Planning in 15 minutes is the first. 1.5: Load misunderstanding: www.2cto.com 1: the system load is always faulty in performance. Truth: high Load may be due to cpu-intensive computing. 2. when the system Load is high, it must be due to CPU capacity problems or insufficient quantity. Truth: high Load only indicates that too many queues need to be run. However, tasks in the queue may actually consume Cpu or I/0 milk. 3: The system loads high for a long time. First, increase the CPU usage. The truth is: Load is just a representation, not a substance. When you increase the CPU usage, you will see a temporary decrease in Load. 2: how to identify system bottlenecks when Load average is high. Is there insufficient CPU or I/O speed or insufficient memory? 2.1: view system load vmstatVmstatprocs ----------- memory ---------- --- swap -- ----- io ---- system -- ---- cpu ---- r B swpd free buff cache si so bi bo in cs us sy id wa0 100152 2436 97200 0 1 34 45 99 33 0 0 99 0 procs r column indicates the number of processes running and waiting for the cpu time slice, if the cpu usage exceeds 1 for a long time, the cpu usage is insufficient and you need to increase the cpu usage. Column B indicates the number of processes waiting for resources, such as waiting for I/O or memory switching. Cpu indicates the cpu usage status. the us column shows the percentage of CPU time consumed in user mode. When the value of us is high, it indicates that the user process consumes a lot of cpu time. However, if it is longer than 50%, you need to consider optimizing your program. The sy column shows the percentage of cpu time consumed by the kernel process. Here, the reference value of us + sy is 80%. if the value of us + sy is greater than 80%, there may be insufficient CPU. The wa column shows the percentage of CPU time occupied by IO wait. The reference value of wa is 30%. if wa exceeds 30%, the IO wait is serious, which may be caused by a large number of random access to the disk, it may also be caused by the bandwidth bottleneck of the disk or disk access controller (mainly block operations ). The id column shows the percentage of time when the cpu is idle. The system displays the number of interrupts occurred during the collection interval. the in column shows the number of device interruptions per second observed during a certain time interval. The cs column indicates the number of context switches generated per second. If cs is much higher than the disk I/O and network information packet rate, further investigation should be conducted. The number of memories that memoryswpd switches to the memory swap area (k indicates ). If the value of swpd is not 0, or it is relatively large, for example, if it exceeds 100 m, as long as the value of si and so is 0 for a long time, system performance is still normal free the number of memory in the current idle page List (k indicates) the number of buff memory used as the buffer cache. generally, buffer is required for reading and writing block devices. Cache: the amount of memory that is used as the page cache. generally, it is used as the cache of the file system. if the cache is large, it indicates that there are many files in the cache. if the bi in IO is small at this time, it indicates that the file system is more efficient. The number of swapsi switches from memory to memory swap zone. So the number of memory entries in the memory swap zone. The total amount of data IObi reads from block devices (read disk) (kb per second ). Total data written to bo block devices (disk write) (kb per second) Here we set the bi + bo reference value to 1000. if it exceeds 1000, in addition, if the wa value is large, you should consider disk load balancing. you can analyze it with iostat output. 2.2: View disk load iostat statistics every 2 seconds disk IO information until you press Ctrl + C to terminate the program, The-d option indicates disk statistics, -k indicates that the data is displayed in KB per second,-t indicates that the time information is printed, and 2 indicates that the data is output every 2 seconds. The disk I/O load status output for the first time provides statistics about the disk I/O load since the system was started. Each subsequent output is the average IO load between each interval. # Iostat-x 1 10 Linux 2.6.18-92. el5xen 02/03/2009 avg-cpu: % user % nice % system % iowait % steal % idle 1.10 0.00 4.82 39.54 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm % util sda 0.00 3.50 0.40 2.50 5.60 48.00 18.48 0.00 0.97 0.97 0.28 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.00 0.0 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sde 0.00 0.10 0.30 0.20 2.40 2.40 9.60 0.00 1.60 1.60 0.08 sdf 17.40 0.50 102.00 0.20 12095.20 5.60 118.40 0.70 6.81 2.09 sdg 21.36 232.40 1.90 379.70 0.50 76451.20 19.20 201.13 4.94 13.78 2.45 93.16 rrqm/s: the number of merge read operations per second. That is, delta (rmerge)/s wrqm/s: number of write operations performed on merge per second. That is, delta (wmerge)/s r/s: the number of read I/O devices per second. That is, delta (rio)/s w/s: the number of write I/O devices completed per second. That is, delta (wio)/s rsec/s: Number of read sectors per second. That is, delta (rsect)/s wsec/s: number of write sectors per second. That is, delta (wsect)/s rkB/s: the number of Read K bytes per second. It is half of rsect/s because the size of each slice is 512 bytes. (To be calculated) wkB/s: number of write K bytes per second. Half of wsect/s. Avgrq-sz: average data size (slice) of each device I/O operation ). Delta (rsect + wsect)/delta (rio + wio) avgqu-sz: average I/O queue length. That is, delta (aveq)/s/1000 (because aveq is measured in milliseconds ). Await: average wait time (in milliseconds) for each device I/O operation ). That is, delta (ruse + wuse)/delta (rio + wio) svctm: average service time per device I/O operation (MS ). That is, delta (use)/delta (rio + wio) % util: the percentage of time in one second for I/O operations, or the number of I/O queues in one second is not empty. That is, delta (use)/s/1000 (because the unit of use is millisecond) if % util is close to 100%, it indicates that too many I/O requests are generated and the I/O system is fully loaded, this disk may have a bottleneck. When the idle is less than 70% I/O, the load is high. Generally, the read speed is wait. you can also view the parameters B (number of processes waiting for resources) and wa in combination with vmstat (percentage of CPU time occupied by I/O wait, higher than 30% when I/O pressure is high) in addition, you can also refer to general: svctm <await (because the wait time of the simultaneously waiting request is calculated repeatedly). The size of svctm is generally related to disk performance: the CPU/memory load will also affect it, and too many requests will indirectly lead to the increase of svctm. Await: the size of await generally depends on the service time (svctm), the length of the I/O queue, and the mode in which I/O requests are sent. If svctm is close to await, it means that I/O has almost no waiting time. if await is much larger than svctm, it means that the I/O queue is too long and the response time of the application is slow, if the response time exceeds the allowable range, you can consider replacing a faster disk, adjusting the kernel elevator algorithm, optimizing the application, or upgrading the CPU. The queue length (avgqu-sz) can also be used as an indicator to measure the system I/O load. However, because avgqu-sz is based on the average per unit time, therefore, it cannot reflect the instantaneous I/O flood. A good example of others (I/O system vs. supermarket queue). For example, how can we decide which payment platform to pay when we queue for checkout in the supermarket? First, let's look at the number of people in the queue. is the total number of five people faster than 20? In addition to the number of people, we often look at the number of items purchased by the previous person. if there is a big mom who has purchased food for a week, we can consider changing the team. There is also the cashier's speed. if you have a newbie who doesn't even know about the money, you will have to wait. In addition, the timing is also very important. it may be 5 minutes ago, but the money was crowded, and now people go to the building, but it is refreshing to pay the money. of course, the premise is that what has been done in the past five minutes is more meaningful than queuing (but I have not found anything boring than queuing ). The I/O system has many similarities with supermarket queues: r/s + w/s is similar to the average queue length of the total number of payers (avgqu-sz) similar to the average number of queues per unit time average service time (svctm) similar to the cashier's receipt speed average wait time (await) similar to the average wait time of each person, the average I/O data (avgrq-sz) is similar to the average I/O operation rate (% util) of the items bought by each person) it is similar to the time ratio when someone queues before the cashier. We can analyze the I/O request mode and the I/O speed and response time based on the data. The following is an analysis of the output of this parameter written by someone else # iostat-x 1 avg-cpu: % user % nice % sys % idle 16.24 0.00 4.31 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm % util/dev/cciss/c0d0 0.00 44.90 1.02 27.55 8.16 579.59 4.08 289.80 20.57 22.35 78.21 5.00 14.29/dev/cciss/c0d0p1 0.00 44.90 1.02 27.55 8.16 579.59 4.08 289.80 20.57 22.35 78.21 5.00/dev/cciss/c0d0p2 14.29 0.00 0.00 0.00 0.00 0. 00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 The iostat output above indicates that the second has 28.57 device I/O operations: total IO/s = r/s (read) + w/s (write) = 1.02 + 27.55 = 28.57 (times/second) where write operations occupy the main body (w: r = 27 ). On average, only 5 ms is required for each device I/O operation, but 78 ms is required for each I/O request. why? Because too many I/O requests are sent (about 29 requests per second), assuming these requests are sent at the same time, the average wait time can be calculated as follows: average wait time = single I/O service time * (1 + 2 +... + total number of requests-1)/total number of requests applied to the above example: average wait time = 5 ms * (1 + 2 +... + 28)/29 = 70 ms, which is very close to the average waiting time of 78 ms given by iostat. This in turn indicates that I/O is initiated at the same time. There are many I/O requests per second (about 29), but the average queue is not long (only about 2). This indicates that the arrival of these 29 requests is uneven, i/O is idle most of the time. In one second, 14.29% of the time I/O queues have Requests. that is to say, the I/O system has nothing to do in 85.71% of the time, all 29 I/O requests are processed within 142 milliseconds. Delta (ruse + wuse)/delta (io) = await = 78.21 => delta (ruse + wuse)/s = 78.21 * delta (io) /s = 78.21*28.57 = 2232.8, indicating that I/O requests per second need to wait for a total of 2232.8 ms. Therefore, the average queue length should be 2232.8 ms/1000 ms = 2.23, while the average queue length (avgqu-sz) provided by iostat is 22.35. why ?! Because there is a bug in iostat, the avgqu-sz value should be 2.23 instead of 22.35.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.