Original link: http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages
You may have a good understanding of the load average (load averages) for Linux. The load mean can be seen in the uptime or the top command, and they may appear like this:
Many people will understand the load mean: Three numbers represent the average system load (one minute, five minutes, and 15 minutes) for different time periods, and their numbers are, of course, as small as possible. The higher the number, the greater the load on the server, which could be a signal of some kind of problem on the server.
And that's not the case, what are the factors that make up the load mean, and how do you tell if they are "good" or "bad" in their current situation? When should you pay attention to what are the abnormal values?
Before you answer these questions, you first need to understand the knowledge behind these values. Let's start with the simplest example of a server with a single core processor.
Driving across the bridge
A single-core processor can be figuratively likened to a cycle path. Imagine that you now need to charge the bridge for the road--busy dealing with the vehicles that will cross the bridge. First of all, you need to know some information, such as the load of the vehicle, and how many vehicles are waiting for the bridge. If there is no car waiting in front, then you can tell the driver to pass. If you have a large number of vehicles, you need to tell them that they may need to wait a moment.
Therefore, specific codes are needed to indicate current traffic conditions, such as:
- *0.00 says there is no traffic on the bridge. * In fact, this is the same as between 0.00 and 1.00, in short, it is very smooth, the past can not wait for the vehicle to pass.
- 1.00 means it's just within the reach of the bridge. This is not a bad situation, but the traffic will be a bit congested, but this situation may cause more and more slow transportation.
- More than 1.00, that means the bridge has exceeded the load and traffic congestion is severe. So how bad is the situation? For example, 2.00 of the situation shows that the traffic is more than the bridge can withstand a double, then there will be more than a few times the bridge is anxiously waiting for the vehicle. 3.00, the situation is even worse, indicating that the bridge is almost unable to withstand, there are more than twice times more than the bridge load of vehicles are waiting.
The above situation is very similar to the load situation of the processor. The bridge time of a car is like the actual time the processor handles a thread. The Unix system-defined process runtime is the processing time for all processor cores plus the time the thread waits in the queue.
Like the manager of the bridge fee, you certainly want your car (operation) not to be anxiously awaited. Therefore, ideally, the load average is expected to be less than 1.00. Of course, the peak will not rule out more than 1.00, but in the long-time to maintain this state, it indicates that there will be problems, you should be very anxious.
"So you say the ideal load is 1.00?" 」
Well, it's not exactly right. Load 1.00 indicates that the system has no resources left. In practice, an experienced system administrator will draw this line at 0.70:
- "The law of investigation is needed":* if your system loads up to 0.70 in the long term, then you need to take some time to understand the cause before things get worse.
- "Now it's time to fix the law":1.00. * If your server system load hovers for a long time at 1.00, then you should solve this problem immediately. Otherwise, you'll get a call from your boss in the middle of the night, which is not a pleasant thing to do.
- 3:30 A.M. Exercise the body rule ":5.00. * If your server load exceeds 5.00 This number, then you will lose your sleep, but also in the meeting to explain the cause of the situation, in short, do not let it happen.
What about multiple processors? My average value is 3.00, but the system is working properly!
Wow, do you have a four-processor mainframe? Then its load mean value at 3.00 is very normal.
In multiprocessor systems, the load average is based on the number of cores. With a 100% load calculation, 1.00 representing a single processor, and 2.00 indicating two dual processors, 4.00 indicates that the host has four processors.
Back to our analogy on the bridge of the vehicle. 1.00 I said, "a single-lane road." So in the case of bike lanes 1.00, the bridge has been stuffed with cars. In a dual-processor system, this means a load of more than 50% of the remaining system resources-because there is another lane to pass.
So, with a single processor already under load, the dual-processor load fulfilment situation is 2.00, and it has one more resource to use.
Multi-core and multi-processor
To get out of the next topic, let's discuss the difference between a multi-core processor and a multiprocessor. From a performance perspective, a host with a multi-core processor with the same number of different processing performance can basically be considered similar. Of course, the actual situation will be much more complex, different amounts of cache, processor frequency and other factors can cause performance differences.
But even if the actual performance of these factors is slightly different, the system still calculates the load mean at the core of the processor. This brings us to two new laws:
- "How many cores are the number of loads" rule: in multicore processing, your system's average value should not be higher than the total number of processor cores.
- "Core core" principle: the core distribution in a few individual physical processing is not important, in fact, two quad-core processor equals four dual-core processor equals eight single processor. So, it should have eight processor cores.
Look at our own
Let's take a look at the output of uptime.
~ $ uptime23:05 up + days, 6:08, 7 users, load averages:0.65 0.42 0.36
This is a dual-core processor, and the results also show a lot of free resources. The reality is that even if it peaked at 1.7, I never thought about its load problem.
So, how can there be three numbers that are really disturbing. We know that 0.65, 0.42, and 0.36 describe the system load mean for the last minute, the last five minutes, and the final 15 minutes respectively. Then this brings up a problem:
- Which number shall we take? One minute? Five minutes? or 15 minutes? *
In fact, we've talked a lot about these numbers, and I think you should look at the average of five minutes or 15 minutes. Frankly, if the load in the previous minute is 1.00, it can still indicate that the server condition is still normal. But if the 15-minute value remains at 1.00, then it is worth noting (according to my experience, this time you should increase the number of processors).
- So how do I know how many core processors my system is equipped with? *
Under Linux, you can use the
Cat/proc/cpuinfo
Get the information for each processor on your system. If you just want to get the numbers, then use the following command:
grep ' model name '/proc/cpuinfo | Wc-l
Understanding processor load mean for Linux