Linux system monitoring and diagnostic tools

Source: Internet
Author: User

Linux system monitoring and diagnostic tools
Linux users may not be unfamiliar with top commands (different system names may be different, such as topas in IBM aix ), it is mainly used to monitor the system's real-time load rate, process resource usage, and other system status attributes. Next let's take a look at the top: (1) system and task statistics: the first eight rows are the overall statistics of the system. The second line is the task queue information, which is the same as the execution result of the uptime command. The content is as follows:

01:06:48 current time up system running time, format: minute 1 user current login user load average: 0.06, 0.60, 0.48 system load, that is, the average length of the task queue. The three values are the average values from 1 minute, 5 minutes, and 15 minutes ago to the present. Note: These three values can be used to determine whether the system load is too high-if the value continues to exceed the number of system CPUs, You need to optimize your program or architecture. (2) process and cpu statistics: 2nd ~ 6. Information about the process and CPU. When multiple CPUs exist, the content may exceed two rows. Content: Tasks: 29 total process count 1 running processes running 28 sleeping sleep processes 0 stopped processes 0 zombie processes Cpu (s ): 0.3% us user space CPU usage 1.0% sy kernel space CPU usage 0.0% ni user process space CPU usage of processes that have changed their priorities 98.7% id idle CPU usage 0.0% wa CPU waiting for Input and Output time percentage 0.0% hiHardware IRQ0.0 % siSoftware IRQ Note: (1) IRQ: IRQ is called Interrupt Request, which means "Interrupt Request. (2) st (Steal Time): Steal time is the percentage of time a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor. it's only relevant in receivalized environments. it represents time when the real CPU was not available to the current virtual machine-it was "stolen" from that VM by the hypervisor (either to run another VM, or for its own needs ). so, relatively spe Aking, what does this mean? A high steal percentage may mean that you may be outgrowing your virtual machine with your hosting company. other virtual machines may have a larger slice of the CPU's time and you may need to ask for an upgrade in order to compete. also, a high steal percentage may mean that your hosting company is overselling virtual machines on your particle server. if you upgrade your virtual machine and your steal percentage doesn't drop, you may want to seek another provider. A low steal percentage can mean that your applications are working well with your current virtual machine. since your VM is not wrestling with other VM's constantly for CPU time, your VM will be more responsive. this may also suggest that your hosting provider is underselling their servers, which is definitely a good thing.0.0 % sisi (Software Interrupts) (3) Last two behavior memory information:
Mem: 191272 k total physical memory total 173656 k used total physical memory used 17616 k free memory total 22052 k buffers memory used for Kernel cache Swap: 192772 k total swap areas total 0 k used swap areas total 192772 k free swap areas total 123988 k cached buffer swap areas total. The content in the memory is swapped out to the swap zone and then into the memory, but the used swap zone has not been overwritten. This value is the size of the SWAp zone where the content already exists. When the corresponding memory is swapped out again, you do not have to write data to the swap zone. PS: how to calculate available memory and used memory? In addition to free-m, you can also see top: Mem: 255592 k total, 167568 k used, 88024 k free, 25068 k buffers
Swap: 524280 k total, 0 k used, 524280 k free, 85724 k cached3.1 how to calculate the actual number of available memory for the program? The answer is: free + (buffers + cached) 88024 k + (25068 k + 85724 k) = 198816k3. 2 How does one calculate The number of memory used by The program? The answer is: used-(buffers + cached) 167568 k-(25068 k + 85724 k) = 56776k3. 3 How can I determine whether The system has insufficient memory? If your swap used value is greater than 0, you can basically determine that you have encountered a memory bottleneck, either optimizing your code or adding memory. (4) process information area: detailed information about each process is displayed at the bottom of the statistical information area. First, let's take a look at the meaning of each column. The serial number column name indicates the terminal name of the startup process by using the username fGROUP process owner of the ideUSER process owner of The aPID process idbPPID parent process idcRUSERReal user namedUID process owner. Processes not started from the terminal are displayed? HPR priority iNInice value. A negative value indicates a high priority, and a positive value indicates the CPU used at the low priority jP, in a multi-CPU environment only, the percentage of CPU time consumed by the lTIME process since the last update of the CPU is significant is the total CPU time used by the lTIME process. The unit is mTIME + the total CPU time used by the process, the Unit is 1/100 seconds. n % MEM the percentage of physical memory used by the oVIRT process. The unit is kb. In the virtual memory used by the VIRT = SWAP + RESpSWAP process, the SWAP size, in kb. Physical memory used by the qRES process, Not swapped out, in kb. RES = CODE + DATArCODE: The physical memory occupied by executable CODE. The unit is the physical memory occupied by parts other than the executable CODE (Data Segment + stack), and the Unit is the kbtSHR shared memory, unit: kbunFLT page error count. Number of pages that have been modified since the last time vndrc was written to the present. WS Process status. D = non-disruptive sleep state R = running S = sleep T = tracking/stopping Z = zombie process xCOMMAND command name/command line yWCHAN if the process is sleeping, the system function name zFlags task mark in sleep is displayed. For more information, see sched. h
(5) by default, only important PID, USER, PR, NI, VIRT, RES, SHR, S, % CPU, % MEM, TIME +, and COMMAND columns are displayed.
You can use the following shortcut key to change the display content: 5.1 f key to select the display content. You can use the f key to select the displayed content. Press f to display the column list. Press a-z to display or hide the corresponding column, and press enter to confirm. 5.2 The order in which the o key is displayed can be changed by the o key. A lower-case a-z can move the corresponding column to the right, while an upper-case A-Z can move the corresponding column to the left. Press enter. 5.3 The F/O key sorts processes by column by uppercase F or O key, and then by a-z, processes can be sorted by corresponding column. The uppercase R key can reverse the current sorting. (6) from the perspective of using common interactive commands, mastering these commands is more important than mastering the options. These commands are single-letter. If the s option is used in the command line option, some of these commands may be blocked.
Ctrl + L erase and override the screen.
H or? The help screen is displayed, and some brief command summary is provided.
K. terminate a process. The system prompts the user to enter the PID of the process to be terminated and the signal to be sent to the process. Generally, 15 signals can be used to terminate a process. If the process cannot end normally, use signal 9 to forcibly end the process. The default value is signal 15. This command is blocked in security mode.
I ignore idle and dead processes. This is a switch-on command.
Q: exit the program.
R reschedules the priority of a process. The system prompts the user to enter the process PID to be changed and the process priority value to be set. Entering a positive value will lower the priority, and vice versa will give the process a higher priority. The default value is 10.
S switches to the accumulative mode.
S changes the delay time between two refreshes. The system prompts the user to enter a new time in seconds. If there is a decimal number, it is converted to m s. If the input value is 0, the system will be refreshed continuously. The default value is 5 s. It should be noted that if the setting is too small, it is likely to cause constant refresh, so it is too late to see the display, and the system load will increase significantly.
F or F: add or delete a project from the current display.
O or O changes the order of projects displayed.
L switching displays average load and startup time information.
The m switch displays the memory information.
T Switch displays the process and CPU status information.
C switch to display the command name and complete command line.
M is sorted Based on the resident memory size.
P is sorted by CPU usage percentage.
T is sorted by time/accumulative time.
W write the current settings ~ /. Toprc file. This is a recommended method for writing top configuration files. (7) Final skills: top Command Tips 1. Input uppercase P, and the results are sorted by CPU usage in descending order.
2. Input uppercase M, and the results are sorted by memory usage in descending order.
3. Press number 1 to display the load of all CPU cores.
4. Refresh the top-d 5 every 5 seconds. The default value is 1 second.
5. top-p 4360,4358 monitors the specified process
6. top-U johndoe 'U' is the real/valid/save/file system username.
7. top-u 500 'U' is a valid user ID
8. top-bn 1 displays information about all processes, and top-n 1 displays only one screen for the MPs queue to call.
9. top-M # show memory summary in megabytes not kilobytes
10. top-p 25097-n 1-B #-B avoids output control characters and garbled characters in MPs queue calls
11. top pages: top-bn1 | less
12. enhanced version of top: htop, a more powerful interactive process manager:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.