Http://www.dbanotes.net/arch/unix_linux_load.html
Almost every engineer who has access to Unix-like operating systems knows how to view system loads. However, the working mechanism of this item may not be clear enough. I compared some related information, added my own understanding, and made some notes.
What is load? What is load average?
Load is the measurement of how much computer work (Wikipedia
: The system load is a measure of the amount of work that a computer system is doing ). In other words, the length of the process queue. Load average is the average load for a period of time (1 minute, 5 minutes, 15 minutes. [Best reference: Unix load average Part 1: How It Works
]
The following is an uptime command output:
$ uptime
18:57:48 up 423 days, 3:55, 2 users, load average: 1.16, 1.12, 1.20
Although the definitions of various information sources are not definite. One thing you can determine is that you cannot accurately obtain the load of the current time. The minimum computing granularity is 5 seconds (calc_load is calculated every 5Hz)
, 5Hz is 5 seconds, Hz is the system-defined variable). See the Linux kernel code
:
869 count -= ticks;
870 if (unlikely(count < 0)) {
871 active_tasks = count_active_tasks();
872 do {
873 CALC_LOAD(avenrun[0], EXP_1, active_tasks);
874 CALC_LOAD(avenrun[1], EXP_5, active_tasks);
875 CALC_LOAD(avenrun[2], EXP_15, active_tasks);
876 count += LOAD_FREQ;
877 } while (count < 0);
878 }
879}
How can I determine if the system has been over load?
For general systems
Count to judge. In the above example, if the average load is always below 1.2, and you are two CPUs
. There will be no CPU
Insufficient. That is, the average load is smaller than the CPU
.
This is a recommended Evaluation Method for Solaris Performance Tools. [I would like to recommend this book here, although it is not as detailed as I expected in load. However, the book reveals a lot of performance information. A required document for each dBA and architect .]
In this case, there are two other questions:
1. multi-core CPU
/How can I determine the hyper-threading machine?
For such a machine, my suggestion is to see how the operating system recognizes the CPU, according to the logic CPU identified by the System
Quantity. If you want to consider the performance coefficient, we recommend that you refer to Oracle for multi-core CPU under different architectures
.
2. How can I determine if the application is thread-oriented?
This actually works with M: n threads.
Model. What is your system? Take this issue into consideration.
In most cases, if the load is too high, it is not necessarily the same as the CPU
. An exception may be the application scenario. For example, a single CPU
Machine to create high-concurrency Web servers, the trouble is coming.
Capacity Planning)
Any relatively mature site will use tools such as cacti (rrdtool) for capacity planning. The captured load will pass the column values of 1, 5, and 15 minutes. Which of the three measurements is used? 15 minutes is the first choice.
].
Load and system warning
Many environments with high availability requirements have established email or SMS alarm mechanisms. The setting of the load alarm threshold is not reasonable. Here we recommend that you set the critical value (if you use tools such as Nagios to understand what this is) to a physical CPU.
(Of course, you can set a lower value than this value ). However, if it is higher than this value, it will be of little significance. For example, if the database server has 4 CPUs, an alarm should be triggered when the load is higher than 4. setting a high ratio is unlikely because someone else receives an alarm will respond to the alarm...
Misunderstanding 1: system load is always a problem with performance.
Truth: the high system load may also be caused by CPU usage.
Intensive Computing (such as compilation)
Misunderstanding 2: system load must be CPU
Insufficient capacity issues or quantity.
Truth: high load only indicates that too many queues need to be run. However, tasks in the queue may actually consume CPU, I/O, or other factors.
Misunderstanding 3: The system has a high load for a long time, and the CPU is preferred to be added.
Truth: load is just a representation, not a substance. Increase CPU
In some cases, the system load is temporarily decreased, but the symptoms are not the root cause.
A little load is actually a lot of attention. In fact, the English information is comprehensive. Try to add a little new information to this article. Please confirm or inform me of any unreasonable reason for writing or any objection.
-- EOF
--
FAQ
1: Database Server sudden CPU usage
100% busy. What should I do?
In general, this is caused by bad SQL
Cause. We recommend that you capture slow query logs.
SQL statements with high I/O overhead (focusing on full table scanning)
. Based on experience, each CPU
Core can process 100-400 mb of data in one second. For a large number of concurrent I/O operations, although the storage throughput may not be that large, the CPU
"Full ".