Performance monitoring and tuning of Linux system CPU

Last Update:2017-06-20 Source: Internet

Author: User

Tags switches cpu usage nginx server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective:

Performance optimization is a commonplace topic, typical performance problems such as slow page response, interface timeout, high server load, low number of concurrency, database frequent deadlock and so on. Especially in the "rough fast" Internet development model of the big line today, with the increasing number of system access and code bloated, a variety of performance problems began to pour.

There are three factors that can affect application performance at the system level: CPU, memory, and Io, and today we'll talk about CPU performance monitoring and tuning.

CPU Performance Monitoring

When the program response slows, first use the top, Vmstat, PS and other commands to see if the system's CPU utilization is abnormal, so as to determine whether the CPU is busy caused by the performance problem.

Among them, the main through the US (% of user process) this data to see the abnormal process information. When US approaches 100% or higher, you can be sure that the CPU is busy and the response is slow. Generally speaking, the CPU is busy for the following reasons:

Infinite empty loops, non-blocking, regular matching, or simple calculations in a thread
Frequent GC
Multi-threaded Frequent context switches

Top Command

Top command

For multiple or multicore CPUs, the above display will be the sum of the percentages used by multiple CPUs. To see the consumption of each core, you can press 1 after entering the top view to display the CPU usage, as shown in.

US represents the percentage of user process processing

SY represents the percentage of kernel thread processing

NI represents the percentage of tasks that are changed by the Nice command to prioritize

ID indicates the percentage of CPU idle time

WA represents the percentage of waiting IO to be executed during execution

Hi represents the percentage of hardware interrupts

Si represents the percentage of software interrupts

St represents the percentage of time that the virtual CPU waits for the actual CPU

Vmstat

Vmstat command

In CPU interrupts per second, including time interrupts

CS the number of context switches per second, the smaller the better, too big, to consider lowering the number of threads or processes. Each time a system function is called, our code enters the kernel space, causing context switching, which is resource-intensive and avoids frequent calls to system functions. Too many context switches means that most of your CPU is wasted in context switching, resulting in less time for the CPU to do serious work, and the CPU not being fully utilized, is undesirable.

US user CPU time.

SY system CPU time, if too high, indicates a long system call time, for example, the IO operation is frequent.

ID Idle CPU time, in general, ID + US + sy = 100, generally I think ID is idle CPU usage, US is the user CPU usage, SY is the system CPU utilization.

WT waits for IO CPU time.

Gstat-gcutil

If you find that the Java process is CPU-intensive, you can use this command to see if the process is frequently GC, as shown in.

Jstat command

Percentage of space used in Survivor space zone 0 on S0-heap

Percentage of space used in Survivor space Zone 1 on S1-heap

Percentage of space used in Eden space on E-heap

The percentage of space used in the old area on the O-heap

Percentage of space already used in the P-perm space area

ygc-number of young GC occurrences from application boot to sample

ygct– the time (in seconds) used by the young GC when booting from the application to sampling

fgc-the number of full GC occurrences from application startup to sampling

fgct– time (in seconds) for full GC from application boot to sampling

gct-total time (in seconds) for garbage collection from application startup to sampling

Problem Analysis

Based on the several common commands provided above, after locating the problem, we can analyze the reason of it according to the specific problem.

CPU bottlenecks are represented in two areas: User-configured CPU bottlenecks and system-state CPU bottlenecks. The bottleneck caused by running software outside the operating system kernel is the user-configured CPU bottleneck, which is caused by a system-state CPU bottleneck when running the operating system kernel.

User-State CPU and System state CPU time ratios between 3:1 and 4:1 are normal. If in a system with a bottleneck, the user and system time ratios are higher than this interval, you should analyze the reason for the increase in user-state CPU time.

US High

When the US value is too high, it means that the running app consumes most of the CPU. In this case, the most important thing for Java applications is to find the code executed by the CPU-consuming thread, which can be done in the following way.

1. Use Gstat-gcutil to see if the JVM frequently performs GC.
2. If the GC is not frequent according to Gcutil, check what code the CPU is executing to locate the problem according to the way it is doing when the CPU is high.

sy too high

When the SY value is too high, use Vmstat to view the number of thread transitions. It is likely that Linux has spent more time on thread switching. The main cause of this behavior in Java applications is that more threads are being started, and these threads are constantly blocking (such as lock waits, Io waits) and changes in execution state, which causes the operating system to constantly switch execution threads, resulting in a large number of context switches.

In this case, for Java applications, the most important thing is to find out the reason for the constantly switching state, the method can be used by kill-3 pid or jstack-l PID method to dump the Java application thread information, view the thread status information and lock information, Identify the threads that are waiting for a state or lock to compete too much.

CPU Tuning set priority for program execution

You can use nice and renice to set the priority of the program execution.

Format: Nice [-n value] command the NICE directive can change the priority level of program execution. The instruction allows the user to specify a priority level, called the nice value, when executing the program. This value is from the highest priority of 20 to the lowest priority of 19.

Negative values are the only root power. For general users, you can also use the Nice command to prioritize the execution of your program, but you can only make the nice value higher and lower.

using Ulimit to limit CPU time

Note that ulimit restricts the current shell process and its derived child processes. Therefore, you can call Ulimit in the script to limit the CPU usage time. For example, limit the CPU time for tar in seconds.

If the tar takes longer than 100 seconds, tar exits, which may result in incomplete packaging, so it is not recommended to use Ulimit to limit CPU time. In addition, the user can be restricted by modifying the system's/etc/security/limits configuration file.

Use the program's own function to adjust the CPU usage

Some programs have the ability to adjust CPU usage, such as Nginx server, through its configuration file, you can specify the CPU for the worker process, as follows:

Here 0001 0010 0100 1000 is a mask that represents the 1th, 2, 3, and 4 CPU cores, which makes the CPU usage average to each core.

When using Nginx, this optimization method is more common.

Read the original

Performance monitoring and tuning of Linux system CPUs

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More