Causes and troubleshooting of excessive CPU load

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Before the interview was asked, the cause of the CPU load is too high. How to quickly troubleshoot the cause.
Open a post and summarize the relevant knowledge what is CPU load value

The load average shown in the top command is the system average load for the last 1 minutes, 5 minutes, and 15 minutes.

The system average load is defined as the average number of processes running in the queue during a specific time interval (running on the CPU or waiting to be run). If a process satisfies the following conditions, it is located in the run queue: it is not waiting for the I/O operation result it does not actively enter the wait state (that is, no call ' wait ') has not been stopped (for example: waiting to terminate)

In Linux, processes are divided into three states, one is blocked process blocked processes (waiting for the I/O device data or system tuning), one is a running process runnable processes, and the other is running processes running process.

when a process is operational, it is in a running queue run, competing with other running processes for CPU time. System load refers to the total number of processes running and ready to run. For example, now the system has 2 running processes, 3 can run processes, then the system load is 5. The load average is the load quantity for a certain amount of time. What factors make up the size of the CPU load

The metric for measuring CPU load is load,load is the measure of how much load the computer system can bear, simply the length of the process queue. request is greater than the current processing power, there will be waiting, causing the load to rise.
For the load average 0.21 0.10 0.03 that is just shown at the beginning of this article

A lot of people understand that. Load mean: Three numbers represent the average system load (one minute, five minutes, and 15 minutes) for different periods of time, and their numbers are, of course, the smaller the better. The higher the number, the greater the load on the server, which may also be a signal of some kind of problem with the server. And that's not exactly the case, what is the size of the load mean, and how do you tell whether they are "good" or "bad"? When should you pay attention to what is not the normal value?

Before you answer these questions, you first need to know some of the facts behind these values. Let's start with the simplest example of a server with a single core processor.

Driving across the bridge

A single core processor can be likened to a cycling path. Imagine that you now need to charge a toll on the road-busy dealing with the vehicles that are going to cross the bridge. The first thing you need to know, for example, is the load on the vehicle and how many vehicles are waiting to cross the bridge. If there is no vehicle waiting in front of you, then you can tell the driver behind the pass. If there are many vehicles, then you need to tell them that they may need to wait a little longer.

Therefore, some specific codes are required to indicate current traffic conditions, for example:

0.00 means there is no traffic on the bridge. In fact, this situation is the same as between 0.00 and 1.00, in short, the past vehicles can not wait to pass.

1.00 indicates that it is within the scope of the bridge. This is not a bad situation, but there are some traffic jams, but this may lead to more and more slow transportation.
　　
More than 1.00, it shows that the bridge has exceeded the load and traffic congestion is heavy. So how bad is it? For example, 2.00 of the situation shows that the traffic is more than the bridge can withstand, then there will be redundant bridge one times the vehicle is anxiously waiting. 3.00, the situation is even worse, indicating that the bridge is basically almost unbearable, and more than twice times the bridge load is waiting for vehicles.
The above situation is very similar to the load condition of the processor. A car's bridge time is like the actual time the processor is working on a thread. The Unix system-defined process runs the length of time for all processor cores plus the time the thread waits in the queue.

As with the toll collector, you certainly hope that your car will not be anxiously waiting. So, ideally, you want the load average to be less than 1.00. Of course not excluding part of the peak will be more than 1.00, but in the long term to maintain this state, it will be a problem, this time you should be very anxious.
"So you say the ideal load is 1.00?" Well, that's not exactly true. Load 1.00 indicates that the system has no remaining resources. In practice, an experienced system administrator would draw this line at 0.70. If your system's load is 0.70 or so long, you need to take some time to understand why before things get worse.

multi-core processors: The system still calculates load mean with the core number of processors
In multi-core processing, your system should not mean more than the total number of cores in the processor. Continue to the bridge over the above problems, if it is a dual-core CPU, then load in 2.0 is the load fulfilment. The following is the output for a dual-core processor

Uptime
17:57 up  ,  8:29, 3 users, load averages:2.04 2.04 2.01

high CPU load causes and troubleshooting

Cause the CPU load is too high. The increase in the number of full GC or dead loops from the programming language level is likely to result in higher CPU load

The specific investigation a sentence description is

First find out which threads are consuming the CPU and then look up specific threads in the stack file with the thread's ID value to see what's going on.

looking for the most CPU-taking process through the command PS UX displays a list of process running information through the TOP-C command (key p is sorted by CPU Resource)

looking for the CPU-consuming thread TOP-HP process ID displays a list of thread running information for a process ID (key p is sorted by CPU Resource)
If the process is a Java process, and you need to see exactly which code is causing the CPU load to be too high, you can use the Jstack under the JDK to view the stack based on the thread IDs obtained above.

Because the thread ID in the stack is represented in 16, you can convert the threads above into a 16-binary representation.

Jstack java process ID  | grep 16 thread ID  -c5--color

Part of the content reference from: http://blog.csdn.net/u011183653/article/details/19489603

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Causes and troubleshooting of excessive CPU load

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Causes and troubleshooting of excessive CPU load

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support