Common commands for linux system monitoring and Common commands for linux monitoring

Source: Internet
Author: User

Common commands for linux system monitoring and Common commands for linux monitoring

Original article: http://blog.sina.com.cn/s/blog_68f1c17001016uvy.html

Linux provides many tools used to monitor the system. These tools can be used to find the bottleneck that may lead to reduced system performance. Slow system response is not always caused by slow CPU usage. It may also be caused by slow disk speed, low memory size installed in the system, network congestion, or other slow system components.

I, Top- Process activity monitoring

The most widely used system performance monitoring tool is top. After you start top, it displays a screen every five seconds, dynamically displaying the system status in real time.

The output of top is shown as follows:

The system name and information are collected and displayed on the top line. The output of the Top command is divided into three main parts. The three parts are CPU, memory, and process.

CPUThe following information is displayed:

The following information is displayed in the memory area:

Process data consists of many columns, which are sorted in descending order based on CPU utilization, with the most CPU-used processes at the top.

The process area displays the following information:

  • CPU indicates the CPU number of the executed process
  • TTY process Terminal
  • PID process ID
  • USERNAME: owner name of the process
  • PRI process priority
  • NI nice Value
  • SIZE total SIZE of processes in memory
  • The resident size of the RES process, which is an approximate value.
  • The current STATE of the STATE process.
  • Time cpu time consumed by the Process
  • % WCPU process CPU usage weight percentage
  • % Percentage of original CPU utilization of the CPU Process
  • COMMAND to start the process

Exit top and press q.

II, Vmstat- Collect system activity, hardware, and system information

The vmstat command displays statistics about the virtual memory. It displays process and page errors. It can also be used to view CPU and disk I/O information.

A. domain under the subheading of procs:

· Number of processes running by r

· Number of processes that B is blocked by resources (may be waiting for I/O or memory)

· W processes waiting for running have been switched out of the main memory (due to memory shortage)

Note that the w field indicates the number of swap out processes. If the value of this column is not 0, it indicates that your memory is faulty.

B. domain under the memory Subtitle:
· Avm active virtual memory refers to the Memory Page allocated to the Process
· Free indicates the actual available memory page size.

C. domain under the page Subtitle:

· On the re recycle page, a large number indicates insufficient memory.

· At address conversion error

· Page in)

· Po-generated pages

· Fr pages released per second

· De short-term estimated memory Insufficiency

· Sr refers to the number of pages (san rate) that need to be scanned when the page background program finds available memory space ).

In these domains, the most important are pi, po, de, and sr domains. When the program starts, we can see that there is a page in activity under pi, which is normal. However, if the process still finds a page in activity from the disk after the program is started, this shows that the situation is not good. Po refers to the removal of processes by the system, so as to leave memory space for other processes. If there is activity, it is also a bad signal. If you find that the value of the de item is not 0, it indicates that there is a big problem. This value indicates that there is an expected memory shortage.

D. fields under the fault sub-title (showing the traps and interruption rates for the last 5 seconds ):

· In device interruption per second

· Sy system calls per second

· Cs CPU context Conversion Rate

E. domain under the cpu subtitle

· User time for normal or low-priority processes of us

· Sy system time

· Id idle time

You can use the following vmstat command to collect one-time data every five seconds. The data is displayed three times in total.

III, Uptime- How long has the system been running?

The simplest command for system load is uptime, which is usually used to see how long the machine has been running:

It provides three pieces of information: First, you can know how long the server has been running. If you find that the server has been running for a short period of time, but you have not recently scheduled a server restart, the server may be faulty, causing the system to automatically restart recently. The next information is the number of users. Because application and database users do not directly access the server through the operating system, they cannot actually reflect how many users are using the server, but you can still understand that pay attention to numbers that are particularly large or small. Finally, the average load average of the system is shown. The average load of the system in the past minute is 0.04, the average load in the past five minutes is 0.11, and the average load in the past 15 minutes is 0.14,

IV, W- Find out which users are accessing the system and what they are doing

The W command displays the user information and their processes on the current machine.

V, Ps- Display Process Information

Use the ps command to list processes. Without parameters, the ps command will list all processes that call the command. The typical output of this command is as follows:


To list all processes, you can use the ps command with the-ef option.

VI, Iostat- Statistics CPU Average load and disk Activity

The iostat command can be used to monitor the I/O behavior of a disk drive:

  • Device displays the actual disk device that provides the report
  • Bps displays the input/output kilobytes per second
  • Sps list the number of queries per second
  • Msps average number of good descriptions required for each search

The following command displays the statistics of three I/O operations every 5 seconds.

VII, Sar- Collect and report system activities

You can use the sar (System Activity Report) System behavior Report to check disk I/O.

The following sar command displays the disk I/O statistics every 3 seconds, which is collected for 5 times.

  • Bread/s read operations from disk to buffer memory per second
  • Lread/s reads from the buffer storage per second
  • % Rcache's read buffer memory hit rate
  • Bwrite/s number of write operations per second from buffer storage to disk
  • Lwirt/s writes to the buffer memory per second
  • % Wcache buffer memory hit rate for write operations
  • Pread/s reads from bare devices per second
  • Pwrit/s writes to bare devices per second

Is the % busy value of a disk often greater than 50? Whether the disk has its avwait> avserv

? (Because it involves the configuration balance between physical IO and logical IO, and buffer page/swap space/asynchronous read/write, it is difficult to determine the disk bottleneck through a single factor, 50% is only a rough evaluation standard and should be analyzed based on specific situations. Sometimes, if % busy is only 20, it is already a disk bottleneck. In addition, we think that the disk is working normally, and the % busy value is likely to be 80 ).

You can also use the sar command to monitor CPU loads. The-u option of the sar command displays CPU statistics, the output shows CPU time allocation for users, systems, waiting for I/O, and idle states. The following command displays the CPU statistics for five times every 3 seconds.

The CPU utilization rate is reported as a percentage. The percentage of the system process % sys, user process % usr, and idle process % idel, in addition, % wio indicates how much time is waiting for disk I/O. If the CPU is idle, you do not need to do anything. However, if the value of % idle is <5 in a long time, it indicates that the CPU utilization is very high, indicating that the CPU may have a bottleneck ,, you need further analysis.

In general, we do not want the system process to occupy a high CPU load. The CPU should serve more user processes. In general, I want to see that the system process accounts for 20% or 30%, user processes account for 70% or 80%.

If % usr> 80 is used for a long period of time, the CPU resources are basically occupied by the user process, and the CPU has a significant bottleneck.

If the % usr value is less than 80, the possible system bottleneck exists in the middle of CPU, memory, or I/O.

If the value of % wio is greater than 15, this is a signal that the disk has a bottleneck.

8, Netstat -Network status statistics

Monitors network behavior such as accepted and sent network traffic, protocol usage, and IP addresses specified to the interface card adapter.

Netstat-I output the NIC status report.

Netstat-in displays the IP Address rather than the host name in the Address column.


What indicators and operation commands are used for Linux system monitoring?

Ps top.
 
How can I explain the top commands of common analysis tools in Linux?

The top command is a common performance analysis tool in Linux. It can display the resource usage of various processes in the system in real time, similar to the Windows Task Manager. The following describes how to use it.

Top-01:06:48 up, 1 user, load average: 0.06, 0.60, 0.48

Tasks: 29 total, 1 running, 28 sleeping, 0 stopped, 0 zombie

Cpu (s): 0.3% us, 1.0% sy, 0.0% ni, 98.7% id, 0.0% wa, 0.0% hi, 0.0% si

Mem: 191272 k total, 173656 k used, 17616 k free, 22052 k buffers

Swap: 192772 k total, 0 k used, 192772 k free, 123988 k cached

Pid user pr ni virt res shr s % CPU % mem time + COMMAND

1379 root 16 0 7976 2456 S 1980 0.7. 03 sshd

14704 root 16 0 2128 980 R 796 0.7. 72 top

1 root 16 0 1992 632 S 544 0.0. 90 init

2 root 34 19 0 0 S 0.0 0.0. 00 ksoftirqd/0

3 root RT 0 0 0 S 0.0 0.0. 00 watchdog/0

The first five lines in the statistical information area are the overall statistical information of the system. The first line is the task queue information, which is the same as the execution result of the uptime command. The content is as follows:

01:06:48 current time

Up system running time, format: minute

1 user current Login user count

Load average: 0.06, 0.60, 0.48 system load, that is, the average length of the task queue.

The three values are the average values from 1 minute, 5 minutes, and 15 minutes ago to the present.

Second and Third, information about the process and CPU. When multiple CPUs exist, the content may exceed two rows. The content is as follows:

Tasks: 29 total process count

1. Number of running Processes of running

28 sleeping sleep Processes

0 stopped process count

0 zombie botnets

Cpu (s): 0.3% us CPU usage

1.0% sy CPU usage in kernel space

0.0% percentage of CPU used by processes that have changed their priorities in ni user process space

98.7% id idle CPU percentage

0.0% wa CPU time percentage waiting for Input and Output

0.0% hi

0.0% si

Memory information of the last two behaviors. The content is as follows:

Mem: 191272 k total physical memory

Total physical memory used by 173656 k used

17616 k free Memory Total

Memory usage of 22052 k buffers as kernel Cache

Swap: 192772 k total number of Swap Areas

Total number of swap areas used by 0 k used

192772 k free swap zone total

The total number of swap areas of the 123988 k cached buffer.

The content in the memory is swapped out to the swap zone and then into the memory, but the used swap zone has not been overwritten,

This value indicates the size of the SWAp zone in which the content already exists.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.