Linux performance monitoring and analysis-CPU

Last Update:2018-12-03 Source: Internet

Author: User

Tags disk usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

CPU performance indicators

1. CPU usage ratio of user processes

2. CPU usage ratio of System Processes

3. WIO, waiting for I/O is the ratio of the CPU in idle state.

4. CPU idle rate

5. Percentage of CPU used for context switching

6. Nice

7. Real-time

8. Length of the running process queue

9. Average Load

Tools commonly used in Linux to monitor CPU performance include:

1. iostat

Only the average information of all CPUs can be viewed.

2. vmstat

Can view the average information of all CPUs,

View CPU queue Information

3. mpstat

View Single and all CPU information.

4. SAR

Similar to mpstat

5. Top

6. nmon

Iostat

$ iostatLinux 2.6.18-92.el5          08/30/2012avg-cpu:  %user   %nice %system %iowait  %steal   %idle           1.16    0.01    0.62    0.18    0.00   98.03

Vmstat

$ vmstat -n 5procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st 0  0     96 1261196 981892 3638872    0    0     0    16    1    1  1  1 98  0  0

-The meaning of the N 5 parameter is to refresh every 5 seconds.

Procs

R -- the numbers below represent the running sequence. If this value is greater than the number of CPUs in a row, the system runs slowly and most processes are waiting for the CPU. If the number of R is greater than 4 times that of the CPU, the system is facing a CPU shortage or the CPU speed is too low, resulting in slow system operation.

System

In -- the number of interruptions per second

CS: the number of context switches generated per second.

The larger the two values, the larger the CPU time consumed by the system process.

CPU

US-Percentage of CPU consumed by user processes. If it remains high for a long time, you need to optimize the program.

Sy-Percentage of CPU consumed by system processes. The high Sy value is not benign.

Wa -- percentage of CPU time consumed by I/O wait. When the value is high, the I/O wait is serious. This may be caused by a large number of random disk accesses or disk bottlenecks.

Id -- percentage of idle CPU time. If it continues to be 0 and Sy is twice that of us, the system is facing a shortage of CPU resources. When this problem occurs, first adjust the CPU usage of the application so that the application can use the CPU more effectively. You can also consider adding more CPUs.

Mpstat-(multiprocessor statistics)

Implement monitoring. information is stored in the/proc/STAT file.

$ mpstat -P ALL 2 10Linux 2.6.18-92.el5 ()         08/30/201208:16:34 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s08:16:36 PM  all    0.78    0.00    0.26    0.26    0.00    0.26    0.00   98.44   1058.8508:16:36 PM    0    0.52    0.00    0.52    0.00    0.00    0.52    0.00   98.44   1058.8508:16:36 PM    1    0.52    0.00    0.00    0.00    0.00    0.00    0.00   99.48      0.00

The preceding figure shows the usage of all CPUs every two seconds. The total oversampling is 10 times. Syntax:

Mpstat [-P {| all}] [internal [count]

-P is used to monitor which CPU. Generally, all is used.

Internal Interval

Count number of samples

Output parameter meaning

% USER -- User-mode CPU time ratio

% Nice -- CPU time of the negative Process

% SYSTEM-core state time

Iowait -- IO wait time

IRQ --

Soft

Idle

Number of intr/s CPU reception interruptions per second

SAR

$ sar -u 2 10Linux 2.6.18-92.el5 ()         08/30/201208:28:36 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle08:28:38 PM       all      0.26      0.00      0.00      0.78      0.00     98.9708:28:40 PM       all      0.52      0.00      0.52      0.00      0.00     98.97

SAR [Options] [-A] [-o file] T [N]

In the command line, the N and T parameters are combined to define the sampling interval and number of times. t indicates the sampling interval, which must be
N is the number of samples and is optional. The default value is 1.-o file indicates that the command result is in binary format.
Stored in a file. The file is not a keyword but a file name. Options is the command line option, sar command
There are many options, and only common options are listed below:

-A: total of all reports.
-U: CPU usage
-V: process, I node, file, and lock table status.
-D: hard disk usage report.
-R: memory and swap space usage statistics.
-G: serial port I/O.
-B: Buffer usage.
-A: file read/write status.
-C: System Call status.
-Q: Report queue length and average system load
-R: Process activity.
-Y: terminal device activity.
-W: system exchange activity.
-X {pid | self | all}: reports statistics of the specified process ID. The self keyword is the statistics of the SAR process, and the All keyword is the statistics of all system processes.

% USER: Percentage of CPU time in user mode.
% Nice: Percentage of CPU time in user mode with nice value.
% SYSTEM: Percentage of CPU time in system mode.
% Iowait: Percentage of CPU waiting for input/output completion time.
% Steal: Percentage of unconscious waiting time of the virtual CPU when the hypervisor maintains another virtual processor.
% Idle: Percentage of idle CPU time.
In all the displays, we should pay attention to % iowait and % idle. The value of % iowait is too high, indicating that the hard disk has an I/O bottleneck, and the value of % idle is high, indicating that the CPU is idle, if the % idle value is high but the system response is slow, it may be that the CPU is waiting for memory allocation. In this case, the memory capacity should be increased. If the value of % idle is lower than 10, the CPU processing capability of the system is relatively low, indicating that the most important resource to be solved in the system is the CPU.

Analyze the queue length of running processes using SAR:
# Sar-Q 2 10
Linux 2.6.18-53. el5pae (localhost. localdomain) 03/28/2009
07:58:14 runq-SZ plist-SZ ldavg-1 ldavg-5 ldavg-15
07:58:16 PM 0 493 0.64 0.56 0.49
07:58:18 pm 1 491 0.64 0.56 0.49
07:58:20 pm 1 488 0.59 0.55 0.49
07:58:22 PM 0 487 0.59 0.55 0.49
07:58:24 PM 0 485 0.59 0.55 0.49
07:58:26 pm 1 483 0.78 0.59 0.50
07:58:28 PM 0 481 0.78 0.59 0.50
07:58:30 pm 1 480 0.72 0.58 0.50
07:58:32 PM 0 477 0.72 0.58 0.50
07:58:34 PM 0 474 0.72 0.58 0.50
Average: 0 484 0.68 0.57

Runq-SZ: The process running queue to be run.
Plist-SZ Number of processes and threads in the process queue
Average system load (average) of the last minute of the ldavg-1)
Average system load for the first five minutes of the ldavg-5 (load average)
Average System Load 15 minutes before ldavg-15 (load average)

By the way, the meaning of load avarage
Load average can be understood as the number of processes that the CPU is waiting to run per second.
In Linux systems, commands such as SAR-Q, uptime, W, and top all have average system load and average output. What is the average system load?
The average system load is defined as the average number of tasks in the queue during a specific time interval. If a process meets the following conditions, it will be in the running queue:
-It has no results waiting for I/O operations.
-It does not take the initiative to enter the waiting state (that is, it does not call 'wait ')
-Not stopped (for example, waiting for termination)
For example:
# Uptime
20:55:40 up 24 days, 1 user, load average: 8.13, 5.90, 4.94
The final content of the command output indicates the average number of processes in the queue in the past 1, 5, and 15 minutes.
Generally, as long as the number of active processes of each CPU is not greater than 3, the system performance is good. If the number of tasks of each CPU is greater than 5, it indicates that the performance of this machine has a serious problem. In the preceding example, if the system has two CPUs, the current number of tasks for each CPU is 8.13/2 = 4.065. This indicates that the system performance is acceptable.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More