CPU Metric Load Average1. Concept Introduction 1.1 Linux system process status
In Linux, process has the following states: Runnable (ready state); blocked waiting for a event to complete (blocking waits a time for completion, the process of this state may be waiting for one I/O operation to get the data , or the result of a system call, etc.); running (executing).
If a process is in the runnable state, that is, it and other process in the same runnable state wait for CPU time, instead of getting CPU time immediately, i.e. "ready" state, which means that it can be executed at any time but not in execution, this state does not consume CPU time. The waiting queue they form is known as the Run queue,run queue, which indicates the longer queues to wait for. Linux scheduling process, from the runnable queue (Run queue), select a process next execution, then this process will get CPU time, become running state.
The runnable and running states of the process are represented in the Linux system with the task_running global variable. Task_running indicates that the process is being executed by the CPU, or that it is ready to be dispatched by the scheduler at any time, and that it is in a ready state (runnable) if it is not executed by the CPU at this time, if the process is being executed by the CPU. It is said to be in the execution state (running). When a process runs in kernel code, we call it in the kernel state, and when a process is executing the user's own code, we call it user-state. When the system resource is already available, the process is awakened and ready to run, which is the ready state. These states are represented in the kernel in the same way, and are called task_running states. When a process has just been created, it is in the task_running state. The monitored load average is the task_running value, which refers to the sum of the process in running and runnable. For example: if there are 2 processes in running and 3 runnable, then the system load is 5.
1.2 Load Average Overview
Load average is the load of the CPU, which contains information that is not CPU usage, but rather statistics on the sum of the number of processes that the CPU is processing and waiting for the CPU to process over a period of time, that is, the length of the CPU usage queue. It is also simple to say that the length of the process queue, load average data is to check the number of active processes every 5 seconds, and then calculate the value by the specific algorithm.
There are two different concepts to differentiate between CPU load and CPU utilization. CPU utilization shows the percentage of CPU that the program consumes in real time during runtime, while the CPU load shows the average number of tasks that are in use and waiting for the CPU over time. High CPU utilization does not mean that the load is necessarily large. PS: A public telephone kiosk, there is a person on the phone, four people waiting, each limit the use of the phone for one minute, if someone does not finish the phone within a minute, can only hang up the phone line, waiting for the next round. The phone is the equivalent of the CPU, and the person who is or is waiting for the call is the equivalent of the task. In the use of telephone booths, it is certain that someone will be out of the phone, some people do not call and choose to re-queue, there will be new people in the line here, the number of changes is equivalent to the number of tasks increase or decrease.
To count the average load, we counted the number of people for 5 seconds and averaged the statistics at 1th, 5, 15 minutes, resulting in an average load of 1th, 5 and 15 minutes. Some people pick up the phone to play, has been playing 1 minutes, and some people may be in the first 30 seconds to find the phone number, or hesitate to play, after 30 seconds is really on the phone. If you think of the phone as a CPU, the number of people as a task, we say the previous person (Task) CPU utilization is high, the latter one (Task) CPU utilization is low.
Of course, the CPU does not work in the first 30 seconds, after 30 seconds to rest, just said, some programs involve a lot of computation, so CPU utilization is high, and some programs involved in the calculation of the few, CPU utilization is naturally low. However, regardless of the CPU utilization is high is low, with the number of tasks behind the queue does not necessarily have a relationship.
2. How to view the current load average situation
You can use system commands "W" View
Use Uptime command Output
Numerical description
First bit 1.30: Indicates the last 1 minutes average load
Second bit 1.48: Indicates the last 5 minutes average load
Third digit 1.69: Indicates the last 15 minutes average load
3. How to determine if over load
Theoretically, the smaller the average CPU load in the first 15 minutes, the better.
Indicator:< Number of CPUs * cores * 0.7
There are indicators:< number of CPUs * cores * 0.5
Note: There is a/proc directory in Linux, storing the virtual map of the current running system, and the file cpuinfo the information of the CPU. Processor: Identification of logical CPU, model name: Real CPU model information, physical ID: Real CPU and identity, CPU cores: Number of cores of real CPU
Use the command grep ' model name '/proc/cpuinfo | Wc–l View CPU Cores
3.1 Single-core processors
Suppose our system is a single CPU single core, it is likened to a one-way bridge, the CPU task compared to a car. When the car was not much, load <1; load=1 when the car occupied the whole road, when the road was full, and the road was full of cars, load>1
Load < 1
Load = 1
Load >1
3.2 Multi-core processors
We often find that server load > 1 is still good, because the server is a multi-core processor (multi-core). Assuming that our server CPU is 2 cores, then it will mean we have 2 roads, our load = 2 o'clock, all roads are full of vehicles.
Load = 2 o'clock the road is full.
#查看CPU Core
grep ' model name '/proc/cpuinfo | Wc-l
3.3 What kind of load average value to be vigilant
ü0.7 < Load < 1: This is a good state, and if you come in more cars, your bridge can still cope.
Üload = 1: Your bridge is about to jam, and there's no more resources for extra tasks, just look at what's going on.
Üload > 5: Very serious congestion, our bridge is very busy, each car can not run quickly
3.4 Three kinds of load values, which should I see?
Usually we look at load for 15 minutes, if load is high, then look at 1 minutes and 5 minute load to see if there is a downtrend. 1 minutes Load value > 1, then we don't have to worry, but if the 15-minute load is more than 1, we have to hurry and see what's going on. So we have to look at these three values according to the actual situation.
4. Load average's role in stress testing 4.1 CPU time slices
In order to improve the efficiency of program execution, we have adopted multi-threading mode in many applications, so that the original serialization execution can be changed into parallel execution, the decomposition of tasks and parallel execution can greatly improve the running efficiency of the program. But this is all code-level performance, and how does hardware support it? That depends on the CPU's time slice mode to illustrate all this. The execution of any instruction in a program often competes with the most valuable resource of the CPU, and no matter how many threads your program is divided into to perform different tasks, they must queue up for the resource to compute and process the command. Let's look at the single CPU situation. The following two graphs describe the execution of threads in non-time-slice mode and time-slice mode:
As you can see in figure one, if any thread is queued to wait for the thread that is using the CPU to end, then the so-called multithreading has no practical significance. The CPU manager in Figure two is just one of my virtual role, it is to allocate and manage CPU usage, at this time multithreading will have the opportunity to get CPU resources in the running process, also really realize the multi-threaded parallel processing in the case of single CPU.
Multi-CPU is only a single CPU expansion, when all CPUs are operating at full load, it will be a time slice for each CPU to improve the efficiency of the way.
Figure 1: Non-time slice mode thread execution
Figure 2: Time slice mode thread execution
During Linux kernel processing, each process defaults to a fixed time slice to execute the command (default is 1/100 seconds), during which time the process is assigned to the CPU and then used exclusively. If the use is complete, and not the time slice of the specified time, then the active abandonment of the CPU, if the time slice has not completed the work, then the use of the CPU will be retracted, the process will be interrupted pending waiting for the next time slice.
Description: Time slice setting. Switching from one process to another takes a certain amount of time (saving and loading register values and memory images, updating various tables and queues, and so on), if process switching-sometimes called context Switch-takes 5 milliseconds, Assuming that the time slice is set to 20 milliseconds, the CPU will spend 5 milliseconds to process the switch after 20 milliseconds of useful work is done. 20% of CPU time is wasted on administrative overhead. To improve CPU efficiency, you can set the time slice to 500 milliseconds. The time wasted was only 1%. But in a time-sharing system, what happens if 10 interactive users press ENTER almost simultaneously? Assuming that all other processes are using their time slices, the last unfortunate process has to wait 5 seconds to get the chance to run. Most users cannot tolerate a short command for 5 seconds to respond. The same problem can occur on a personal computer that supports multi-channel programs. That is, time slices are too short to cause excessive process switching, reducing CPU efficiency, and too long and may cause poor response to short interaction requests. Typically set to 100 milliseconds.
4.2 The difference between CPU utilization and load average
CPU utilization, as the name implies is the use of the CPU, this is a time period of CPU usage statistics, through this indicator can be seen in a certain period of time the CPU is occupied, if the occupied time is high, then need to consider whether the CPU is already overloaded operation, Long-term overload operation is a kind of damage to the machine itself, so the utilization of CPU must be controlled at a certain proportion to ensure the normal operation of the machine.
Load average is the load of the CPU, which contains information that is not CPU usage, but rather statistics on the sum of the number of processes that the CPU is processing and waiting for the CPU to process over a period of time, that is, the length of the CPU usage queue.
We use the CPU analogy as a telephone booth, and each process is a person who needs to make a phone call. Now there are 4 phone booths (just like our machines have 4 cores) and 10 people need to call. Now the rule of using the telephone is that the administrator will give each person in order to take 1 minutes to use the telephone time, if the user in 1 minutes to complete, then you can immediately return the call to the Administrator, if the 1-minute phone users have not been used, then need to re-queued, waiting for redistribution to use.
Figure 3 Phone usage Scenario
In the use of the telephone users have also made a classification, 1min on behalf of these users occupy the phone time is less than or equal to 1min,2min indicates that the consumer takes up the phone time is less than or equal to 2min, and so on. According to the phone usage rules, 1min users only need to get one allocation to complete the call, while the other two types of users need to queue two to three times.
utilization of the telephone = SUM (Activeuse CPU time)/period
Each user assigned to the phone uses the sum of the telephone time to be removed with a statistical time period. It is important to note that the sum of time used by the telephone (sum (active use CPUTime)) is different from the sum of the elapsed time (sum (Occupy CPU). (for example, a user got a one-minute right to use, in 10 seconds to make a call, and then went to query the number of 20 seconds, and then the remaining 30 seconds to hit another phone, then took up the phone 1 minutes, actually only used 40 seconds)
The average load of the telephone represents the average number of people who use the phone and those who are waiting for a telephone assignment during a statistical period.
Telephone utilization statistics can reflect the use of the telephone situation, when the phone is in use for a long time without adequate rest, then for the telephone hardware is an overloaded operation, need to adjust the frequency of use. And the phone Average load from another perspective. For a description of the status of the phone usage, the higher the Average load, the more competitive the phone resource, the more scarce the telephone resources. In fact, the application and maintenance of resources also need a great cost, so in this high average load situation, the long-term "hot competition" of telephone resources is also a kind of damage to hardware.
Is there a case of high load average in case of low utilization? Understanding occupancy and usage time can be known when the time slice is allocated and whether the use is entirely dependent on the user, so there is a good chance of a low utilization high load average. From this point of view, only from the CPU utilization to determine whether the CPU is in an overloaded state of work or not enough, you must combine the load average to see the overall CPU usage and application situation.
5. Common Mistake 5.1 system Load high must be a performance problem
Truth: The system Load is also probably due to CPU-intensive computations (such as compiling)
5.2 System Load High must be CPU capacity problem or insufficient quantity
The truth: Load High simply means that the queue that needs to run accumulates too much. But the tasks in the queue may actually be CPU-intensive or I/O or even other factors.
5.3 System long-term Load high, preferred to increase CPU
The truth: Load is just an appearance, not a substance. Additional CPU will temporarily see the system load drop, but the symptoms do not cure.
6. References
Http://www.blogjava.net/cenwenchu/archive/2008/06/30/211712.html
http://heipark.iteye.com/blog/1340384
Http://dbanotes.net/arch/unix_linux_load.html
Http://blog.sina.com.cn/s/blog_4d661a8c0100gozb.html
Http://www.51testing.com/html/52/n-852752.html
http://share.blog.51cto.com/278008/495067/
Http://blog.sina.com.cn/s/blog_690c46500100k1n4.html
Performance Analysis _linux Server Cpu_load Average