LINUX-CPU Analysis-vmstat_

LINUX-CPU Analysis-vmstat__linux

Last Update:2018-07-27 Source: Internet

Author: User

Tags cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Preface in order to understand the meaning of this article more conveniently, it is better to look at the following cumbersome concepts and make it easier to understand. Do not understand these concepts, even after reading you can only know, run the next vmstat, look at the results of Linux feedback, but also to know why ~ First said the concept of memory. It's not about the CPU. Because the memory will consume CPU when it is converted to each other. As for why to convert. Be patient and look down. The memory of Linux system is divided into physical memory and virtual memory two kinds. Physical memory is real, that is, memory on the physical memory strip. and virtual memory is the use of hard disk space to replenish physical memory (key, they are different speed difference AH), will temporarily not use the memory page to the hard disk to free more physical memory for the necessary process to use. The memory is read back from the hard disk (virtual memory) when the freed pages need to be reused. All this is transparent to the user. Typically, for Linux systems, virtual memory is the swap partition.
Well, here's the play: Vmstat. Vmstat (virtualmeomorystatistics, virtual memory statistics) is a common tool for monitoring memory in Linux, can monitor the operating system's virtual memory, processes, CPUs, etc. . This command can display information about the brief information about the performance of various resources in the system, here we mainly use it to look at a CPU load situation.
Every process running in the system needs to use physical memory, but not every process needs to use the system allocated memory space all the time. when the system is running with more memory than actual physical memory, the kernel frees some or all of the physical memory that is occupied by some processes but not used, store this part of the data on disk until the process is next called and provide the freed memory for use by the process that is needed. "This is the process of memory conversion mentioned above" in Linux memory management, it is mainly through "paging paging" and "swap swapping" to complete the memory scheduling. The paging algorithm is to swap pages in memory that are infrequently used to disk, leaving the active page in memory for the process to use. Exchange technology is the entire process, not part of the page, all switched to disk. The process of paging (page) writing to disk is called Page-out, and Paging (page) back to memory from disk is called page-in. When the system kernel discovers that running memory becomes smaller, it releases some of the physical memory through Page-out. Managing Page-out does not happen frequently, but if the page-out occurs frequently, the system will drop dramatically when the kernel manages paging more than the time it takes to run the program. The system is already running very slow or in a paused state, which is also called thrashing (bump). "Why does the top consume CPU?"
Two. Effect Display Vmstat 3 5//Three seconds output a piece of information, total output 5

Novice, is not some of the Mongolian, do not say joint data analysis bottlenecks, first of all to say what the meaning of the parameters. Or, to put it another way, the attention below is important:

Three. Actual Analysis 1. R: Number of wait processes running queues
R (run: the number of processes the running queue is executing) and B (the number of processes the block waits for CPU resources). when R exceeds the number of CPUs, there is a CPU bottleneck. .
To view the number of cores for CPUs: Cat/proc/cpuinfo|grep processor|wc-l

In the evaluation of the performance of the CPU completely copied on the internet several times is not accurate, can not only look at the top of the parameters, but also you have to do a look at the Vmstat display of the run value and blocked value, when there are significantly more blocked, it shows that the CPU generated bottlenecks. The load mean, shown by the top command and the uptime command, can only be used as a reference to the state of a system in the last period of time, and is not related to CPU performance.
when the R value exceeds the number of CPUs, there will be CPU bottlenecks, the solution is generally several:1. The simplest is to increase the number of CPUs and the number of cores 2. By adjusting the task execution time, such as large tasks to be carried out in the event of the system is not busy, the incoming Balance system task 3. Prioritize an existing task
(The CPU metric in Tips:vmstat is a percentage.) When the value of Us+sy is close to 100, it means that the CPU is approaching full load. Note, however, that the CPU's full workload does not explain what Linux always tries to make the CPU as busy as possible, maximizing the throughput of the task. The only thing that can determine the CPU bottleneck is the value of R (Run queue). )
2.CPU Usage RateIf the CPU's ID (idle rate) is below 10% for a long time, the resource for the CPU is already very tight, and you should consider process optimization or add more CPUs. WA (wait io) indicates that the CPU was forced to idle while waiting for IO resources, and the CPU was not in an operational state, but was wasted, so "wait io should be as small as possible." ”

The load mean shown by the top command and the uptime command can only be used as a reference to the state of a system in the past period of time, and is not related to CPU performance. 】
Article recommended: about the running queue and system load of the CPU: http://www.cnblogs.com/hecy/p/4128605.html Vmstat Detailed(contains instance analysis): http://blog.chinaunix.net/uid-20775448-id-3668337.html

2. SAR command the second tool to check CPU performance is that the Sar,sar feature is powerful and allows separate statistics on each aspect of the system, but the use of SAR commands increases overhead, but these costs can be evaluated and do not have a significant impact on the statistical results of the system. Here is the SAR command's CPU statistics output for a system:
Click ( here ) collapse or open[Root@webserver ~]# sar-u 3 5 Linux 2.6.9-42.elsmp (webserver) 11/28/2008 _i686_ (8 CPU) 11:41:24 AM CPU%user%nice TEM%iowait%steal%idle 11:41:27 AM all 0.88 0.00 0.29 0.00 0.00 98.83 11:41:30 AM All 0.13 0.00 0.17 0.21 0.00 99.50 11: 41:33 am All 0.04 0.00 0.04 0.00 0.00 99.92 11:41:36 AM All 0.29 0.00 0.13 0.00 0.00 99.58 11:41:39 AM all 0.38 0.00 0.17 0.04 0.00 99.41 Average:all 0.34 0.00 0.16 0.05 0.00 99.45
The output of each item above is explained as follows: The %user column shows the percentage of CPU time consumed by the user process. The %nice column shows the percentage of CPU time that is consumed by running a normal process. The %system column shows the percentage of CPU time consumed by the system process. The %iowait column shows the percentage of CPU time the IO wait takes up %steal column shows the Pagein forcing steal operations on different pages in a relatively tight memory environment. The %idle column shows the percentage of time that the CPU is idle. This output is the overall CPU usage of the system statistics, the output of each item is very intuitive, and the last line of average is a summary row, is an average of the above statistics. One thing to note is that the statistics in the first row contain the statistical consumption of the SAR itself, so the value of the%user column is slightly higher, but this does not have much impact on the statistical results. In a multiple-CPU system, if the program uses a single thread, there will be such a phenomenon, the overall CPU usage is not high, but the system application is slow response, this may be due to the use of a single-threaded program, single-threaded use only one CPU, resulting in this CPU occupancy rate of 100%, unable to process other requests , while the rest of the CPU is idle, which leads to the overall CPU usage is not high, and application slow phenomenon occurs. To solve this problem, each CPU of the system can be queried separately to count the usage of each CPU:
Click ( here ) collapse or open[Root@webserver ~]# sar-p 0 3 5 Linux 2.6.9-42.elsmp (webserver) 11/29/2008 _i686_ (8 CPU) 06:29:33 PM CPU%user%nice%s Ystem%iowait%steal%idle 06:29:36 PM 0 3.00 0.00 0.33 0.00 0.00 96.67 06:29:39 PM 0 0.67 0.00 0.33 0.00 0.00 99.00 06:29 : 0 0.00 0.00 0.33 0.00 0.00 99.67 06:29:45 PM 0 0.67 0.00 0.33 0.00 0.00 99.00 06:29:48 PM 0 1.00 0.00 0.33 0.33 0.0 0 98.34 average:0 1.07 0.00 0.33 0.07 0.00 98.53 This output is the first CPU of the system information statistics, it should be noted that the SAR in the CPU count is starting from 0, therefore, "Sar-p 0 3 5" for the system's first CPU For information statistics, "Sar-p 4 3 5" means the system's fifth CPU statistics. by analogy. As you can see, the system above has eight CPUs.
3 iostat Command Iostat instruction is mainly used to statistic disk IO status, but can also view CPU usage information, its limitation is only displays the average information of all CPU of system, look at one of the output below:
Click ( here ) collapse or open[Root@webserver ~]# iostat-c Linux 2.6.9-42.elsmp (webserver) 11/29/2008 _i686_ (8 CPU) Avg-cpu:%user,%nice%system%iow AIT%steal%idle 2.52 0.00 0.30 0.24 0.00 96.96 Here we use the "-C" parameter to display only the statistics of the system CPU, and the meaning of each representation in the output is exactly the same as the output of the SAR command, no longer detailed.

1.4 Uptime Command Uptime is one of the most commonly used commands for monitoring system performance, mainly used to statistics the current operation of the system, the output of the information in order: System now time, system from the last boot to now run how long time, the system currently has how many landing users, the system in a minute, five minutes, Average load within 15 minutes. Look at one of the following outputs:
Click ( here ) collapse or open[Root@webserver ~]# uptime 18:52:11 up, 19:44, 2 users, load average:0.12, 0.08, 0.08
The note here is the load average this output value, the size of these three values is generally not larger than the number of system CPUs, for example, the system has 8 CPUs in this output, if the load average three values longer than 8 o'clock, the CPU is busy, high load, may affect system performance , but occasionally greater than 8 o'clock, do not worry, generally does not affect system performance. Conversely, if the output value of the load average is less than the number of CPUs, it means that the CPU has free time slices, such as the output in this example, the CPU is very idle.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More