CPU Performance Metrics
1. Ratio of CPU used by user process
2. Ratio of CPU used by system process
3. Wio, waiting for I/O but the CPU is in idle state ratio.
4. CPU Idle Rate
5. CPU Ratio for context exchange
6,nice
7,real-time
8, run the length of the process queue
9, average load
the common tools for monitoring CPU performance under Linux are
1. Iostat
Only the average information for all CPUs can be viewed
2. Vmstat
Can view the average information of all CPUs,
Ability to view CPU queue information
3. Mpstat
The ability to view individual and all CPU information.
4. Sar
Similar to the Mpstat
5. Top
6. Nmon
Iostat
$ iostat
Linux 2.6.18-92.el5 08/30/2012
avg-cpu: %user %nice %system%iowait%steal % Idle
1.16 0.01 0.62 0.18 0.00 98.03
Vmstat
$ vmstat-n 5
procs-----------memory-------------Swap-------io------System-------CPU------
R B swpd free buff cache Si-so bi-bo in CS US sy ID WA St
0 0 1261196 981892 3638872 0 0 0 1 1 1 1 + 0 0
The meaning of the-N 5 parameter is to refresh every 5 seconds
Procs
R--The following number represents the running sequence. If this value is continuously larger than the number of CPUs in the system, the system is running slowly, and most processes are waiting for the CPU. If the number of R is greater than 4 times times the CPU, the system is facing a CPU shortage or CPU speed is too low, causing the system to run too slow.
System
In--number of interrupts generated per second
CS-The number of context switches generated per second.
The larger the two values, the greater the CPU time the system process consumes.
Cpu
US-The percentage of time that the user process consumes CPU. Long-term high, you need to optimize the program.
SY-The percentage of time that the system process consumes CPU. SY is a high value, not a benign performance.
WA-The percentage of CPU time that IO waits to consume, when the value is high, indicates that IO waiting is more serious, possibly due to a large amount of random access to the disk, or disk bottlenecks.
ID-The percentage of the CPU in idle time. If it lasts 0 and Sy is us twice times the situation, the system is facing a shortage of CPU resources. When this problem occurs, adjust the CPU usage for the application. Enables the application to use the CPU more efficiently. At the same time, you can consider adding more CPUs.
Mpstat-(Multiprocessor Statistics)
Implementation monitoring, information stored in the/proc/stat file
$ mpstat-p All 2
Linux 2.6.18-92.el5 () 08/30/2012
08:16:34 PM CPU %user %nice%sys% Iowait %irq %soft %steal %idle intr/s 08:16:36 PM all 0.78 0.00 0.26 0.26 0.00 0.26 0.00 98.44 1058.85 08:16:36
PM 0 0.52 0.00 0.52 0.00 0.00 0.52 0.00 98.44 1058.85 08:16:36
PM 1 0.52 0.00 0.00 0.00 0.00 0.00 0.00 99.48 0.00
The above means: every 2 seconds to sample all CPU usage, total sampling 10 times. The syntax is as follows:
Mpstat [-p {| All}] [internal [count]]
-p to monitor which CPU, general use of all on it
Time of Internal interval
Number of Count samples
Output parameter meaning
%user--User state CPU time ratio
%nice--CPU time for negative processes
%system-Nuclear mentality time
Iowait-IO Wait Time
IRQ--
Soft
Idle
INTR/S number of CPU receive interrupts per second
SAR
$ sar-u 2
Linux 2.6.18-92.el5 () 08/30/2012
08:28:36 PM CPU %user %nice%system% Iowait %steal %idle
08:28:38 PM all 0.26 0.00 0.00 0.78 0.00 98.97
08:28:40 PM all 0.52 0.00 0.52 0.00 0.00 98.97
SAR [Options] [-A] [-o file] t [n]
In the command line, the N and t two parameters are grouped together to define the sampling interval and the number of times, T is the sampling interval, and the
The parameter, n is the number of samples, is optional, the default is 1,-o file represents the command result in binary format
stored in a file that is not a keyword here, file name. Options for command line option, SAR command
A lot of options, the following list only the common options:
-A: Sum of all reports.
-U:CPU Utilization
-V: Processes, I nodes, files, and lock table states.
-D: Hard drive usage reports.
-R: Usage statistics for memory and swap space.
-G: Case of serial I/O.
-B: Buffer usage.
-A: File read/write status.
-C: System call condition.
-Q: Report Queue Length and system average load
-R: The activity of the process.
-Y: terminal equipment activity.
-W: System Exchange activity.
-X {PID | SELF | All}: Reports the statistics for the specified process ID, the SELF keyword is the statistic of the SAR process itself, the ALL keyword is the statistics for all system processes
%user:cpu the percentage of time in user mode.
%nice:cpu the percentage of time in user mode with nice values.
%system:cpu the percentage of time in system mode.
%iowait:cpu the percentage of time that the input and output finish is waiting.
%steal: The percentage of the virtual CPU's unconscious wait time when the Hypervisor maintains another virtual processor.
%idle:cpu percent of idle time.
In all of the display, we should mainly note that the%iowait and%idle,%iowait values are too high, indicating that there is an I/O bottleneck on the hard disk, high%idle value, indicating that the CPU is more idle, if the%idle value is high but the system response is slow, it is possible that the CPU waiting to allocate memory, If the%idle value continues below 10, the CPU processing power of the system is relatively low, indicating that the most needed resource in the system is CPU.
Using SAR to run Process Queue Length analysis:
#sar-Q 2 10
Linux 2.6.18-53.el5pae (Localhost.localdomain) 03/28/2009
07:58:14 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
07:58:16 PM 0 493 0.64 0.56 0.49
07:58:18 PM 1 491 0.64 0.56 0.49
07:58:20 PM 1 488 0.59 0.55 0.49
07:58:22 PM 0 487 0.59 0.55 0.49
07:58:24 PM 0 485 0.59 0.55 0.49
07:58:26 PM 1 483 0.78 0.59 0.50
07:58:28 PM 0 481 0.78 0.59 0.50
07:58:30 PM 1 480 0.72 0.58 0.50
07:58:32 PM 0 477 0.72 0.58 0.50
07:58:34 PM 0 474 0.72 0.58 0.50
average:0 484 0.68 0.57 0.49
Runq-sz the process that is ready to run runs the queue.
Number of processes and threads in the Plist-sz process queue
Ldavg-1 the system average load (load average) before a minute
System average load (load average) for the first five minutes of ldavg-5
System average load (load average) for the first 15 minutes of ldavg-15
By the way, the meaning of load avarage
The load average can be understood as the number of processes per second that the CPU waits to run.
In Linux systems, SAR-Q, uptime, W, top commands will have the system average load average output, then what is the system average load.
The system average load is defined as the average number of tasks running in a queue at a specific time interval. A process is located in the run queue if the following conditions are true:
-It is not waiting for the I/O operation results
-It does not actively enter the waiting state (that is, no ' wait ' is invoked)
-Not stopped (for example: waiting to terminate)
For example:
# uptime
20:55:40 up, 3:06, 1 user, load average:8.13, 5.90, 4.94
The final content of the command output represents the average number of processes running in the queue in the past 1, 5, and 15 minutes.
In general, as long as the current number of active processes per CPU is not greater than 3 then the performance of the system is good, if the number of tasks per CPU is greater than 5, then the performance of the machine is a serious problem. For the above example, assuming the system has two CPUs, the current number of tasks for each CPU is: 8.13/2=4.065. This means that the performance of the system is acceptable.