yesterday, we looked at Nagios alert information and found that one of the servers was overloaded and the machine was a centos system. The information is as follows:
- 2011-2-15 (Tuesday) 17:50
- Warning-load average:9.73, 10.67, 10.49
There are also alert messages issued in the first two hours:
- 2011-2-15 (Tuesday) 16:50
- Warning-load average:10.52, 10.10, 10.06
- 2011-2-15 (Tuesday) 15:40
- Warning-load average:8.27, 9.23, 9.48
first, what does the three parameters of the alarm message mean? 9.73, 10.67, 10.49 represents the previous minute, five minutes, 15 minutes of the average CPU load, the most important indicator is the last number, that is, the average CPU load of the first 15 minutes, the smaller the better. The so-called CPU load refers to the length of the task queue for a period of time, in layman's terms, is a period of time a total number of tasks in use or wait for the use of CPU. second, besides Nagios, what other tools are available to view CPU load? You can use the top command, the uptime command, and especially the top command, to be powerful, not just to see the CPU load. third, how to understand the CPU load? Is CPU utilization not? There are two different concepts to differentiate between CPU load and CPU utilization, but their information can be displayed in the same top command. CPU utilization shows the percentage of CPU that the program consumes in real time during runtime, while the CPU load shows the average number of tasks that are in use and waiting for the CPU over time. High CPU utilization does not mean that the load is necessarily large. There is an interesting analogy on the internet, and call to explain the difference between the two, I explained in my own understanding. a public telephone kiosk, there is a person on the phone, four people waiting, each limit the use of the phone for one minute, if someone does not finish the phone within a minute, can only hang up the phone line, waiting for the next round. The phone is the equivalent of the CPU, and the person who is or is waiting for the call is the equivalent of the task. in the use of telephone booths, it is certain that someone will be out of the phone, some people do not call and choose to re-queue, there will be new people in the line here, the number of changes is equivalent to the number of tasks increase or decrease. To count the average load, we counted the number of people for 5 seconds and averaged the statistics at 1th, 5, 15 minutes, resulting in an average load of 1th, 5 and 15 minutes. Some people pick up the phone to play, has been playing 1 minutes, and some people may be in the first 30 seconds to find the phone number, or hesitate to play, after 30 seconds is really on the phone. If you think of the phone as a CPU, the number of people as a task, we say the previous person (Task) CPU utilization is high, the latter one (Task) CPU utilization is low. of course, the CPU does not work in the first 30 seconds, after 30 seconds to rest, just said, some programs involve a lot of computation, so CPU utilization is high, and some programs involved in the calculation of the few, CPU utilization is naturally low. However, regardless of the CPU utilization is high is low, with the number of tasks behind the queue does not necessarily have a relationship. Four, understand the meaning of CPU load, how can we reduce the CPU load of the server? The simplest way is to replace the better performance of the server, do not want to just improve the performance of the CPU, it is not used, the CPU to play its best performance also requires the coordination of other hardware and software. in other aspects of the server configuration, the number of CPUs and CPU cores (that is, the number of cores) will affect the CPU load, because the task is ultimately allocated to the CPU core to be processed. Two CPUs are better than one CPU, and dual cores are better than single cores. Therefore, we need to keep in mind that the CPU load is calculated based on the number of cores, except for the difference in CPU performance! There is a saying, "How many cores, that is, how much load". v. So, what is the load on each CPU that the CPU load at the beginning of this article shares ? It depends on how many cores I have on this server. Linux has a/proc directory, which is the virtual map of the current operating system, one of which is Cpuinfo, which contains information about the CPU. We can directly open the view, or filter the keywords to view, because the content of the file is more, so we usually need to filter the keyword. The/proc/cpuinfo file displays information by logical CPU, not real CPU, and each logical CPU information occupies one paragraph, and the first logical CPU ID starts at 0. We first need to understand this, as to what is the logical CPU, which is mentioned below. To understand the CPU information in the file, there are several related concepts to know:Processor: Identification of logical CPUsModel Name: type information for the real CPUPhysical ID: Real CPU and IdentityCPU Cores: Number of cores for real CPUs
- $>grep ' model name '/proc/cpuinfo |uniq
- Model Name:intel (R) Xeon (r) CPU E5320 @ 1.86GHz
- $>grep ' physical id '/proc/cpuinfo |sort |uniq |wc-l
- 2
- $>grep ' CPU cores '/proc/cpuinfo |uniq
- 2
as can be seen, the server CPU model is Intel (r) Xeon (r) CPU E5320, dual CPU, each CPU is dual core, equivalent to the server has 4 cores. we said that CPU load is based on the number of CPU cores, then the average load of 15 minutes before 10.49 For example, we can conclude that this server each CPU load of 5.245, and then allocated to the kernel, each core load of about 2.6. is this load reasonable? It depends on what the standard of the ideal CPU load looks like. six, how much CPU load is more ideal? This controversial, each has its own argument, I personally agree that the CPU load is less than or equal to 0.7 is an ideal state. No matter how good the performance of a particular CPU is, how many tasks can be handled in 1 seconds, we can think of it as irrelevant, although that is not the case. When we evaluate the CPU load, we only count the task queue length in 5 seconds. If the task queue length is 1 when counted every 5 seconds, the CPU load is 1. If we only have a single-core CPU, the load is always 1, which means that no task is queued and not bad. the above-mentioned server, which is a dual-core CPU, equals 4 cores with a load of 1 per core, and a total load of 4. This means that if my server's CPU load remains around 4 for a long time, it can be accepted. But in fact the CPU load has reached more than 9, so it is very troublesome. but a load of 1 per core is not an ideal state! This means that our CPU is always busy and not idle. The network has said that the ideal state is about 0.7 of the load per core, I agree, 0.7 times the number of cores, the server ideal CPU load, such as I this server, load under 3.0 can be. Seven, the following description of the logical CPU, all from the online:today's servers generally use "Hyper-threading" (hyper-threading, or HT) technology to improve CPU performance. Hyper-Threading Technology is a CPU executing multiple programs at the same time to share a single CPU resources, in theory, like two CPUs at the same moment to execute two threads. while Hyper-Threading technology can execute two threads at the same time, it does not have separate resources for each CPU, as is the case with two real CPUs. When two threads require a resource at the same time, one of them is temporarily stopped and the resources are given up until the resources are idle before they can continue. Therefore, hyper-threading performance is not equal to the performance of two CPUs. CPUs with Hyper-Threading technology have some other limitations.
Linux CPU Load