CPU load observation and performance monitoring in CentOS

Last Update:2017-01-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

CPU load and Utilization

CPU load and utilization are two different concepts, but they can all be viewed in the top command. CPU utilization shows the percentage of CPU occupied by the program in real time during running, while the CPU load shows the average number of tasks that are being used and waiting for CPU usage for a period of time. High CPU utilization does not mean high CPU load. There is no inevitable relationship between the two.

Common commands:

* Uptime

First, we need to understand what the three numbers behind load average represent. They represent the CPU load for the previous minute, five minutes, and fifteen minutes respectively. Generally, the most important indicator is the last one, because we need to avoid unexpected situations as much as possible. So what is the normal interval between these numbers? First, we need to know that these numbers are closely related to the number of CPU cores on your server. If you only have one core, if the number is 1.0, it means it is within the range of the CPU. The smaller the number, the better, but theoretically it is 0.0 ~ The range between 1.0 and is normal. Based on experience, if the number remains above or below 0.7, it means that you may need to spend some time investigating. If the number remains at or below 1.0 for a long time, you need to fix it!

The command for viewing the logical CPU in linux is as follows:

* Top

The top command not only displays the average load of the current system, but also the usage of resources such as CPU and memory by different processes.

By default, the top command displays process information in a descending order of CPU usage. On the top information page, press the K key and enter the PID to terminate, you can directly kill the specified process.
The-B Option of top enables the batch processing mode and prints all refreshes to stdout.
The-n option of top specifies the number of times the information is refreshed before the top Command is exited.
Top Command output:
Row 3: Same as uptime;
Row 3: Current CPU running status:
Us: CPU usage rate of non-nice user processes
Sy: CPU usage ratio of kernel and kernel processes;
Ni: if some user processes have changed their priorities, the CPU usage rate of these processes is displayed;
Id: CPU idle rate. If the system is slow and the value is high, it indicates that the system is slow because the CPU load is not high;
Wa: the ratio of time when the CPU waits for the I/O operation to be executed. This indicator can be used to troubleshoot disk I/O problems. It is usually determined by wa and id.
Hi: CPU processing hardware terminal time ratio;
Si: CPU processing software terminal time ratio;
St: the elapsed time, the percentage of CPU time occupied by other tasks in the virtual machine;

User processes account for a high proportion, wa is low, indicating that the system is slow because the process occupies a large amount of CPU, usually accompanied by a low id, indicating that the CPU idling time is very small.
Low wa and high id can eliminate the possibility of CPU resource bottlenecks.
High wa, indicating that I/O occupies a large amount of CPU time. Check the usage of swap space. The swap space is on the disk, and the performance is much lower than the memory. When the memory is exhausted, the swap space is used up, this will seriously affect the performance. Therefore, we recommend that you disable swap space for servers with high performance requirements. On the other hand, if the memory is sufficient but wa is very high, it is necessary to check which process occupies a large amount of I/O resources.

* Iostat

If iowait is too long, it indicates a disk bottleneck; if the system is too long, it indicates a kernel bottleneck.

On the Device line, you can see some IO metrics:

Tps: the number of I/O transmission requests per second;
Blk_read/s: the number of KB read per second;
Blk_wrtn/s: the number of KB written per second;
Blk_read: Total number of KB read;
Blk_wrtn: Total number of KB written

* Sar

Sar command to view CPU, memory, and disk records. By default, the sar command displays statistics of the current day, CPU statistics without parameters, memory records collected by parameter-r, and disk I/O records by-B.

View the CPU usage for the current day:

View the memory usage for the current day:

View the I/O statistics for the current day:

In addition, you can use the-s and-e parameters to limit the viewing time, and use the-f parameter to view the historical statistics of a day before this month, for example, sar-s 20:00:00; sar-f/var/log/sysstat/sa08

* Vmstat

Compared with top, this shows the CPU, memory, and I/O usage of the entire machine, rather than the CPU usage and memory usage of each process (different use cases)

Vmstat is followed by two parameters. The first parameter is the number of sampling intervals, in seconds, and the second parameter is the number of samples (which can be set by default ).

R indicates the running Queue (that is, how many processes are actually allocated to the CPU). Currently, the CPU of the server I tested is relatively idle and no program is running. When this value exceeds the number of CPUs, the CPU bottleneck may occur.

B Indicates the blocked process.

The size of the swpd virtual memory used. If it is greater than 0, the physical memory of your machine is insufficient. If it is not the cause of program memory leakage, you should upgrade the memory or migrate the memory-consuming tasks to other machines.

The size of free physical memory.

The buff Linux/Unix system is used to store the cache of contents and permissions in the directory.

Cache is directly used to remember the opened files, buffer the files, and cache some idle physical memory to improve program execution performance.

The size of the virtual memory read by si from the disk per second. If this value is greater than 0, it indicates that the physical memory is insufficient or the memory is leaked. Find out the memory-consuming process to solve the problem.

So the size of the virtual memory written to the disk per second. If the value is greater than 0, the same as above.

The number of blocks received by bi Block devices per second. The Block devices here refer to all disks and other Block devices in the system. The default block size is 1024 bytes. I have no IO operations on this machine, so it is always 0.

The number of blocks sent by bo Block devices per second. For example, if we read files, bo must be greater than 0. Bi and bo are generally close to 0. Otherwise, IO is too frequent and needs to be adjusted.

The number of CPU interruptions per second in, including time interruptions.

Cs context switching times per second. For example, if we call a system function, we need to perform context switching, thread switching, and process context switching. The smaller the value, the better, the larger the value, we need to reduce the number of threads or processes. For example, on a web server such as apache and nginx, we generally perform thousands or even tens of thousands of concurrent tests during performance tests, the process of selecting the web server can be lowered from the process or thread peak until cs reaches a relatively small value. This process and the number of threads are a suitable value. The same is true for system calls. Every time we call a system function, our Code will enter the kernel space, resulting in context switching. This is resource-consuming and we should try to avoid frequent calls to system functions. Too many context switches indicate that most of your CPU is wasted on context switches, resulting in less time for proper CPU operations and insufficient CPU utilization.

CPU time of the us user.

If the CPU time of the sy system is too high, it indicates that the system call time is long, for example, frequent IO operations.

D. idle CPU time. Generally, id + us + sy = 100. Generally, id indicates idle CPU usage, us indicates user CPU usage, and sy indicates system CPU usage.

Wt waits for the io cpu time.

The commonly used commands for obtaining Cpu load and performance are summarized as many. Some of them are about using taskset to bind processes to the specified CPU. I have never used them myself, not here.

Refer:

Https://zhangge.net/3257.html

Http://blog.csdn.net/longxibendi/article/details/44625703

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

CPU load observation and performance monitoring in CentOS

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

CPU load observation and performance monitoring in CentOS

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support