Objective:
Performance optimization is a commonplace topic, typical performance problems such as slow page response, interface timeout, high server load, low number of concurrency, database frequent deadlock and so on. Especially in the "rough fast" Internet development model of the big line today, with the increasing number of system access and code bloated, a variety of performance problems began to pour.
There are three factors that can affect application performance at the system level: CPU, memory, and Io, and today we'll talk about CPU performance monitoring and tuning.
CPU Performance Monitoring
When the program response slows, first use the top, Vmstat, PS and other commands to see if the system's CPU utilization is abnormal, so as to determine whether the CPU is busy caused by the performance problem.
Among them, the main through the US (% of user process) this data to see the abnormal process information. When US approaches 100% or higher, you can be sure that the CPU is busy and the response is slow. Generally speaking, the CPU is busy for the following reasons:
Infinite empty loops, non-blocking, regular matching, or simple calculations in a thread
Frequent GC
Multi-threaded Frequent context switches
Top Command
Top command
For multiple or multicore CPUs, the above display will be the sum of the percentages used by multiple CPUs. To see the consumption of each core, you can press 1 after entering the top view to display the CPU usage, as shown in.
US represents the percentage of user process processing
SY represents the percentage of kernel thread processing
NI represents the percentage of tasks that are changed by the Nice command to prioritize
ID indicates the percentage of CPU idle time
WA represents the percentage of waiting IO to be executed during execution
Hi represents the percentage of hardware interrupts
Si represents the percentage of software interrupts
St represents the percentage of time that the virtual CPU waits for the actual CPU
Vmstat
Vmstat command
In CPU interrupts per second, including time interrupts
CS the number of context switches per second, the smaller the better, too big, to consider lowering the number of threads or processes. Each time a system function is called, our code enters the kernel space, causing context switching, which is resource-intensive and avoids frequent calls to system functions. Too many context switches means that most of your CPU is wasted in context switching, resulting in less time for the CPU to do serious work, and the CPU not being fully utilized, is undesirable.
US user CPU time.
SY system CPU time, if too high, indicates a long system call time, for example, the IO operation is frequent.
ID Idle CPU time, in general, ID + US + sy = 100, generally I think ID is idle CPU usage, US is the user CPU usage, SY is the system CPU utilization.
WT waits for IO CPU time.
Gstat-gcutil
If you find that the Java process is CPU-intensive, you can use this command to see if the process is frequently GC, as shown in.
Jstat command
Percentage of space used in Survivor space zone 0 on S0-heap
Percentage of space used in Survivor space Zone 1 on S1-heap
Percentage of space used in Eden space on E-heap
The percentage of space used in the old area on the O-heap
Percentage of space already used in the P-perm space area
ygc-number of young GC occurrences from application boot to sample
ygct– the time (in seconds) used by the young GC when booting from the application to sampling
fgc-the number of full GC occurrences from application startup to sampling
fgct– time (in seconds) for full GC from application boot to sampling
gct-total time (in seconds) for garbage collection from application startup to sampling
Problem Analysis
Based on the several common commands provided above, after locating the problem, we can analyze the reason of it according to the specific problem.
CPU bottlenecks are represented in two areas: User-configured CPU bottlenecks and system-state CPU bottlenecks. The bottleneck caused by running software outside the operating system kernel is the user-configured CPU bottleneck, which is caused by a system-state CPU bottleneck when running the operating system kernel.
User-State CPU and System state CPU time ratios between 3:1 and 4:1 are normal. If in a system with a bottleneck, the user and system time ratios are higher than this interval, you should analyze the reason for the increase in user-state CPU time.
US High
When the US value is too high, it means that the running app consumes most of the CPU. In this case, the most important thing for Java applications is to find the code executed by the CPU-consuming thread, which can be done in the following way.
1. Use Gstat-gcutil to see if the JVM frequently performs GC.
2. If the GC is not frequent according to Gcutil, check what code the CPU is executing to locate the problem according to the way it is doing when the CPU is high.
sy too high
When the SY value is too high, use Vmstat to view the number of thread transitions. It is likely that Linux has spent more time on thread switching. The main cause of this behavior in Java applications is that more threads are being started, and these threads are constantly blocking (such as lock waits, Io waits) and changes in execution state, which causes the operating system to constantly switch execution threads, resulting in a large number of context switches.
In this case, for Java applications, the most important thing is to find out the reason for the constantly switching state, the method can be used by kill-3 pid or jstack-l PID method to dump the Java application thread information, view the thread status information and lock information, Identify the threads that are waiting for a state or lock to compete too much.
CPU Tuning
set priority for program execution
You can use nice and renice to set the priority of the program execution.
Format: Nice [-n value] command the NICE directive can change the priority level of program execution. The instruction allows the user to specify a priority level, called the nice value, when executing the program. This value is from the highest priority of 20 to the lowest priority of 19.
Negative values are the only root power. For general users, you can also use the Nice command to prioritize the execution of your program, but you can only make the nice value higher and lower.
using Ulimit to limit CPU time
Note that ulimit restricts the current shell process and its derived child processes. Therefore, you can call Ulimit in the script to limit the CPU usage time. For example, limit the CPU time for tar in seconds.
If the tar takes longer than 100 seconds, tar exits, which may result in incomplete packaging, so it is not recommended to use Ulimit to limit CPU time. In addition, the user can be restricted by modifying the system's/etc/security/limits configuration file.
Use the program's own function to adjust the CPU usage
Some programs have the ability to adjust CPU usage, such as Nginx server, through its configuration file, you can specify the CPU for the worker process, as follows:
Here 0001 0010 0100 1000 is a mask that represents the 1th, 2, 3, and 4 CPU cores, which makes the CPU usage average to each core.
When using Nginx, this optimization method is more common.
Read the original
Performance monitoring and tuning of Linux system CPUs