Explanation of top and load in CentOS

Last Update:2014-12-25 Source: Internet

Author: User

Tags superuser permission

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Explanation of top and load in CentOS
The top command is a common performance analysis tool in Linux. It can display the resource usage of various processes in the system in real time, similar to the Windows Task Manager. The following describes how to use it.
Top-01:06:48 up1: 0.06 user, load average: 0.60, 0.48
Tasks: 29 total, 1 running, 28 sleeping, 0 stopped, 0 zombie
Cpu (s): 0.3% us, 1.0% sy, 0.0% ni, 98.7% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 191272 k total, 173656 k used, 17616 k free, 22052 kbuffers
Swap: 192772 k total, 0 k used, 192772 k free, 123988 kcached

Pid user prnivirtresshr s % CPU % mem time + COMMAND
1379 root 16 07976 2456 1980 S0.71.3. 03 sshd
14704 root 16 02128980796 R0.70.5. 72 top
1 root 16 01992632544 S0.00.3. 90 init
2 root 3419 0 0 0 S0.00.0 0: 00. 00 ksoftirqd/0
3 root RT 0 0 0 0 S0.00.0 0: 00. 00 watchdog/0
The first five lines in the statistical information area are the overall statistical information of the system. The first line is the task queue information, which is the same as the execution result of the uptime command. The content is as follows:
01:06:48 current time up system running time, format: minute 1 user current login user load average: 0.06, 0.60, 0.48 system load, that is, the average length of the task queue.
The three values are the average values from 1 minute, 5 minutes, and 15 minutes ago to the present.
Second and Third, information about the process and CPU. When multiple CPUs exist, the content may exceed two rows. The content is as follows:
Tasks: 29 total processes 1 running processes running 28 sleeping processes 0 stopped processes 0 zombie processes Cpu (s ): 0.3% us user space CPU usage 1.0% sy kernel space CPU usage 0.0% ni user process space CPU usage of processes that have changed their priorities 98.7% id idle CPU usage 0.0% wa CPU waiting for Input and Output time percentage 0.0% hi0.0 % si
Memory information of the last two behaviors. The content is as follows:
Mem: 191272 k total physical memory total 173656 k used total physical memory used 17616 k free memory total 22052 k buffers memory used for Kernel cache Swap: 192772 k total swap areas total 0 k used swap areas total 192772 k free swap areas total 123988 k cached buffer swap areas total.
The content in the memory is swapped out to the swap zone and then into the memory, but the used swap zone has not been overwritten,
This value indicates the size of the SWAp zone where the content already exists in the memory.
When the corresponding memory is swapped out again, you do not have to write data to the swap zone.
Detailed information about each process is displayed at the bottom of the process information area. First, let's take a look at the meaning of each column.
The serial number column name indicates the terminal name of the startup process by using the username fGROUP process owner of the ideUSER process owner of The aPID process idbPPID parent process idcRUSERReal user namedUID process owner. Processes not started from the terminal are displayed? HPR priority iNInice value. A negative value indicates a high priority, and a positive value indicates the CPU used at the low priority jP, in a multi-CPU environment only, the percentage of CPU time consumed by the lTIME process since the last update of the CPU is significant is the total CPU time used by the lTIME process. The unit is mTIME + the total CPU time used by the process, the Unit is 1/100 seconds. n % MEM the percentage of physical memory used by the oVIRT process. The unit is kb. In the virtual memory used by the VIRT = SWAP + RESpSWAP process, the SWAP size, in kb. Physical memory used by the qRES process, Not swapped out, in kb. RES = CODE + DATArCODE: The physical memory occupied by executable CODE. The unit is the physical memory occupied by parts other than the executable CODE (Data Segment + stack), and the Unit is the kbtSHR shared memory, unit: kbunFLT page error count. Number of pages that have been modified since the last time vndrc was written to the present. WS Process status.
D = uninterrupted sleep
R = run
S = sleep
T = tracking/stopping
Z = zombie process xCOMMAND command name/command line yWCHAN if the process is sleeping, the system function name zFlags task mark in sleep is displayed. For details, refer to sched. h.
By default, only important PID, USER, PR, NI, VIRT, RES, SHR, S, % CPU, % MEM, TIME +, and COMMAND columns are displayed. You can use the shortcut keys below to change the display content.
Change the display content throughF key can select the displayed content. Press f to display the column list. Press a-z to display or hide the corresponding column, and press enter to confirm.
PressThe o key can change the Column Display sequence. A lower-case a-z can move the corresponding column to the right, while an upper-case A-Z can move the corresponding column to the left. Press enter.
In upper caseF orO key, and then press a-z to sort processes by corresponding columns. In upper caseThe R key can reverse the current sorting.
Command usage
1. Tool (command) Name
Top
2. Functions of tools (commands)
Displays the current process and other conditions of the system. top is a dynamic display process, which allows you to refresh the current state by pressing the buttons. if you execute this command on the foreground, it excludes the foreground until the user terminates the program. more accurately, the top command provides real-time monitoring of the status of the system processor. it displays the list of CPU-sensitive tasks in the system. this command can be used by CPU. the memory usage and execution time are used to sort tasks. Many features of this command can be set through interactive commands or in a custom file.
3. Environment Settings
In Linux.
4. Usage
4.1 format
Top [-] [d] [p] [q] [c] [C] [S][N]
4.2 parameter description
D. Specify the interval between two screen information refreshes. Of course, you can use the s interactive command to change it.
P only monitors the status of a process by specifying the monitoring process ID.
Q This option will refresh top without any delay. If the caller has the superuser permission, top runs with the highest possible priority.
S indicates the accumulative mode.
S enables the top command to run in safe mode. This removes the potential danger of interactive commands.
I so that top does not show any idle or dead processes.
C. display the entire command line, not just the command name.
4.3 others
The following describes some interactive commands that can be used during top command execution. From the perspective of usage, mastering these commands is more important than mastering the options. These commands are single-letter. If the s option is used in the command line option, some of these commands may be blocked.
Ctrl + L erase and override the screen.
H or? The help screen is displayed, and some brief command summary is provided.
K. terminate a process. The system prompts the user to enter the PID of the process to be terminated and the signal to be sent to the process. Generally, 15 signals can be used to terminate a process. If the process cannot end normally, use signal 9 to forcibly end the process. The default value is signal 15. This command is blocked in security mode.
I ignore idle and dead processes. This is a switch-on command.
Q: exit the program.
R reschedules the priority of a process. The system prompts the user to enter the process PID to be changed and the process priority value to be set. Entering a positive value will lower the priority, and vice versa will give the process a higher priority. The default value is 10.
S switches to the accumulative mode.
S changes the delay time between two refreshes. The system prompts the user to enter a new time in seconds. If there is a decimal number, it is converted to m s. If the input value is 0, the system will be refreshed continuously. The default value is 5 s. It should be noted that if the setting is too small, it is likely to cause constant refresh, so it is too late to see the display, and the system load will increase significantly.
F or F: add or delete a project from the current display.
O or O changes the order of projects displayed.
L switching displays average load and startup time information.
The m switch displays the memory information.
T Switch displays the process and CPU status information.
C switch to display the command name and complete command line.
M is sorted Based on the resident memory size.
P is sorted by CPU usage percentage.
T is sorted by time/accumulative time.
W write the current settings ~ /. Toprc file. This is a recommended method for writing top configuration files.
-----
1. Obtain cpu usage

[Root @ localhost utx86] # top-n 1 | grep Cpu
Cpu (s): 1.9% us, 1.3% sy, 0.0% ni, 95.9% id, 0.6% wa, 0.1% hi, 0.2% si, 0.0% st

Explanation: 1.9% us indicates the cpu usage of users.

1.3% sy: cpu usage

2. Obtain memory usage

[Root @ localhost utx86] # top-n 1 | grep Mem
Mem: 2066240 k total, 1515784 k used, 550456 k free, 195336 k buffers

You may encounter many problems when learning the Linux operating system. Here we will explain how to Load the average load in Linux) I have a full understanding. The average load value can be seen in the uptime or top command. They may look like this:

Load average: 0.09, 0.05, 0.01

Many people will understand the average load as follows: three numbers represent the average load of the system in different time periods (one minute, five minutes, and fifteen minutes). The smaller the number, the better. The higher the number, the higher the server load, which may be a signal of some problems on the server.

This is not exactly the case. What factors constitute the average load size and how can we tell whether the current conditions are "good" or "bad "? When should I pay attention to the abnormal values?

Before answering these questions, you must first understand the knowledge behind these values. The simplest example is to describe a server with only one single-core processor.

Bridge

A single-core processor can look like a single lane. Imagine that you now need to charge a bridge fee for this road-busy with handling the vehicles that will bridge the road. First of all, you need to know more information, such as the load of a vehicle and how many vehicles are waiting for crossing the bridge. If no vehicle is waiting, you can tell the driver to pass. If there are a large number of vehicles, You need to inform them that it may take a while.

Therefore, some specific codes are required to indicate the current traffic conditions, such:

0.00 indicates that there is no traffic flow on the current bridge deck. In fact, this situation is the same as that between 0.00 and 1.00. In short, it is very smooth and vehicles in the past do not have to wait for the vehicle to pass.

1.00 indicates that it is within the bearing range of the bridge. This is not a bad situation, but traffic may be blocked, but this may cause slower and slower traffic.

If the traffic exceeds 1.00, it indicates that the bridge has exceeded the load and serious traffic congestion. How bad is the situation? For example, the 2.00 case indicates that the traffic flow has already doubled beyond the bridge's capacity, so there will be vehicles that are twice as busy as they are waiting. 3.00 is even worse, it means that the bridge is basically unable to handle it, and there are more than twice the load of the bridge.

The above situation is very similar to the processor load. The bridge time of a car is like the actual time when the processor processes a thread. The Unix system defines the process running duration as the processing time of all processor kernels plus the waiting time of threads in the queue.

Like the administrator who receives the bridge fee, you certainly hope your car will not be waiting anxiously. Therefore, ideally, the average load is less than 1.00. Of course, it is not ruled out that part of the peak value will exceed 1.00. However, if you keep this status for a long time, it means there will be problems. At this time, you should be very anxious.

"So what do you say is the ideal load of 1.00 ?"

Well, this is not exactly the case. Load 1.00 indicates that the system has no remaining resources. In actual situations, experienced System Administrators place this line at 0.70:

"Requires investigation rules": if your system load is around 0.70 for a long time, you need to spend some time understanding the cause before it gets worse.

"Now we need to repair the rule": 1.00. If your server system load lingers at 1.00 for a long time, you should solve this problem immediately. Otherwise, you will receive a call from your boss in the middle of the night. This is not a pleasant task.

"Exercise at half past three a.m.": 5.00. If your server load exceeds 5.00, you will lose your sleep, and you have to explain the cause in the meeting. In short, never let it happen.

So what about multiple processors? My mean value is 3.00, but the system is running normally!

Wow, you have four processor hosts? Therefore, the average load is 3.00.

In a multi-processor system, the average load is determined by the number of kernels. In 100% load computing, 1.00 represents a single processor, while 2.00 represents two dual processors, so 4.00 indicates that the host has four processors.

Return to our metaphor for crossing the bridge. 1.00 I said it was a "one-lane road ". In the 1.00 case of a single lane, it indicates that the bridge has been filled with cars. In a dual-processor system, this means that the load is doubled, that is, there are still 50% of the remaining system resources-because there are other lanes available.

Therefore, when a single processor is under load, the full load of the dual processor is 2.00, and it has twice the resource available.

Multi-core and multi-processor

Let's take a look at the differences between a multi-core processor and a multi-processor. From the perspective of performance, a master machine with multiple cores can basically be considered as having the same number of processing performance as another one. Of course, the actual situation will be much more complicated. Different Quantities of cache, processor frequency and other factors may cause performance differences.

However, even if the actual performance of these factors is slightly different, the system still calculates the average load based on the number of cores of the processor. This gives us two new rules:

"How many cores is the load" rule: in multi-core processing, your system average value should not be higher than the total number of processor cores.

"Core" principle: it is not important that the core is distributed in several physical processes. In fact, two quad-core processors are equal to four dual-core processors and eight single-core processors. Therefore, it should have eight processor kernels.

Examine ourselves

Let's take a look at the uptime output.

~ $ Uptime

Up 14 days, 7 users, load averages: 0.65 0.42 0.36

This is a dual-core processor. The result also shows that there are a lot of idle resources. The actual situation is that even if its peak value reaches 1.7, I have never considered its load problems.

So how can there be three numbers that are really disturbing. We know that 0.65, 0.42, and 0.36 respectively indicate the average system load for the last minute, the last five minutes, and the last fifteen minutes. This raises another problem:

Which number shall we use? One minute? Five minutes? Or 15 minutes?

In fact, we have talked a lot about these numbers. I think you should focus on the average value of five or fifteen minutes. Frankly speaking, if the load in the previous minute is 1.00, it can still indicate that the server is still normal. However, if the value remains at 1.00 in 15 minutes, it is worth noting (based on my experience, you should increase the number of processors at this time ).

So how do I know how many core processors my system is equipped?

In Linux, you can use

Cat/proc/cpuinfo

Obtain information about each processor on your system. If you only want numbers, use the following command:

Grep 'model name'/proc/cpuinfo | wc-l

Popularity: 11% [?]

The above is the content of the Linux server Load average.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More