Article Title: Understand the average processor load of Linux. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
You may have a good understanding of the load averages in Linux. The average load value can be seen in the uptime or top command. They may look like this:
Load average: 0.09, 0.05, 0.01
Many people will understand the average load as follows: three numbers represent the average load of the system in different time periods (one minute, five minutes, and fifteen minutes). The smaller the number, the better. The higher the number, the higher the server load, which may be a signal of some problems on the server.
This is not exactly the case. What factors constitute the average load size and how can we tell whether the current conditions are "good" or "bad "? When should I pay attention to the abnormal values?
Before answering these questions, you must first understand the knowledge behind these values. The simplest example is to describe a server with only one single-core processor.
Bridge
A single-core processor can look like a single lane. Imagine that you now need to charge a bridge fee for this road-busy handling the vehicles that will bridge the road. First of all, you need to know more information, such as the load of a vehicle and how many vehicles are waiting for crossing the bridge. If no vehicle is waiting, you can tell the driver to pass. If there are a large number of vehicles, You need to inform them that it may take a while.
Therefore, some specific codes are required to indicate the current traffic conditions, such:
0.00 indicates that there is no traffic flow on the current bridge deck. In fact, this situation is the same as that between 0.00 and 1.00. In short, it is very smooth and vehicles in the past do not have to wait for the vehicle to pass.
1.00 indicates that it is within the bearing range of the bridge. This is not a bad situation, but traffic may be blocked, but this may cause slower and slower traffic.
If the traffic exceeds 1.00, it indicates that the bridge has exceeded the load and serious traffic congestion. How bad is the situation? For example, the 2.00 case indicates that the traffic flow has already doubled beyond the bridge's capacity, so there will be vehicles that are twice as busy as they are waiting. 3.00 is even worse, it means that the bridge is basically unable to handle it, and there are more than twice the load of the bridge.
The above situation is very similar to the processor load. The bridge time of a car is like the actual time when the processor processes a thread. The Unix system defines the process running duration as the processing time of all processor kernels plus the waiting time of threads in the queue.
Like the administrator who receives the bridge fee, you certainly hope your car will not be waiting anxiously. Therefore, ideally, the average load is less than 1.00. Of course, it is not ruled out that part of the peak value will exceed 1.00. However, if you keep this status for a long time, it means there will be problems. At this time, you should be very anxious.
"So what do you say is the ideal load of 1.00 ?"
Well, this is not exactly the case. Load 1.00 indicates that the system has no remaining resources. In actual situations, experienced System Administrators place this line at 0.70:
"Requires investigation rules": if your system load is around 0.70 for a long time, you need to spend some time understanding the cause before it gets worse.
"Now we need to repair the rule": 1.00. If your server system load lingers at 1.00 for a long time, you should solve this problem immediately. Otherwise, you will receive a call from your boss in the middle of the night. This is not a pleasant task.
"Exercise at half past three a.m.": 5.00. If your server load exceeds 5.00, you will lose your sleep, and you have to explain the cause in the meeting. In short, never let it happen.
So what about multiple processors? My mean value is 3.00, but the system is running normally!
Wow, you have four processor hosts? Therefore, the average load is 3.00.
In a multi-processor system, the average load is determined by the number of kernels. In 100% load computing, 1.00 represents a single processor, while 2.00 represents two dual processors, so 4.00 indicates that the host has four processors.
Return to our metaphor for crossing the bridge. 1.00 I said it was a "one-lane road ". In the 1.00 case of a single lane, it indicates that the bridge has been filled with cars. In a dual-processor system, this means that the load is doubled, that is to say, there are still 50% of the remaining system resources-because there are other lanes available.
Therefore, when a single processor is under load, the full load of the dual processor is 2.00, and it has twice the resource available.
Multi-core and multi-processor
Let's take a look at the differences between a multi-core processor and a multi-processor. From the perspective of performance, a master machine with multiple cores can basically be considered as having the same number of processing performance as another one. Of course, the actual situation will be much more complicated. Different Quantities of cache, processor frequency and other factors may cause performance differences.
However, even if the actual performance of these factors is slightly different, the system still calculates the average load based on the number of cores of the processor. This gives us two new rules:
"How many cores is the load" rule: in multi-core processing, your system average value should not be higher than the total number of processor cores.
"Core" principle: it is not important that the core is distributed in several physical processes. In fact, two quad-core processors are equal to four dual-core processors and eight single-core processors. Therefore, it should have eight processor kernels.
Examine ourselves
Let's take a look at the uptime output.
~ $ Uptime
Up 14 days, 7 users, load averages: 0.65 0.42 0.36 this is a dual-core processor, the results also show that there are a lot of idle resources. The actual situation is that even if its peak value reaches 1.7, I have never considered its load problems.
So how can there be three numbers that are really disturbing. We know that 0.65, 0.42, and 0.36 respectively indicate the average system load for the last minute, the last five minutes, and the last fifteen minutes. This raises another problem:
Which number shall we use? One minute? Five minutes? Or 15 minutes?
In fact, we have talked a lot about these numbers. I think you should focus on the average value of five or fifteen minutes. Frankly speaking, if the load in the previous minute is 1.00, it can still indicate that the server is still normal. However, if the value remains at 1.00 in 15 minutes, it is worth noting (based on my experience, you should increase the number of processors at this time ).
So how do I know how many core processors my system is equipped?
In Linux, you can use
Cat/proc/cpuinfo gets information about each processor on your system. If you only want numbers, use the following command:
Grep 'model name'/proc/cpuinfo | wc-l