Original: http://blog.sina.com.cn/s/blog_68f1c17001016uvy.html
Linux provides a number of tools for monitoring systems that can be used to identify bottlenecks that lead to reduced system performance. Slow system response is not always due to slow CPU, it may also be because the disk speed is too slow, the system installed less memory, network congestion or other slow-reacting system components caused.
First,
top–
Process Activity Monitoring
The most widely used system performance monitoring tool is top, and when you start top, it displays a screen of information every five seconds, dynamically displaying the status of the system in real time.
The output of top is shown below:
The top-most line shows the time that the system name and information were collected and displayed. The output of the top command is divided into 3 main sections. These 3 parts are CPU, memory, and process.
CPU section displays the following information:
- After load average, there are three data, the average load of the current 1 minutes, the last 5 minutes, and the last 15 minutes. This information is useful for checking for abrupt changes in system load.
- The number of processes active in the current system.
- Each state of the process is shown in 254 processes, 220 in the sleep state, and 34 in a running state.
- Percentage that is used in each CPU time state. If the system has more than one CPU, a row is provided for each CPU. In 26% of the CPU time is idle, where CPU0 's idle rate is only 0.4%, it seems to be busy.
The Memory Area section displays the following information:
- Total amount of physical memory installed
- Active physical Memory
- Virtual memory
- Available virtual memory
- Total available memory
The data for the process consists of a number of columns, sorted in descending order of CPU utilization, with the most CPU-based processes at the top.
The Process Area section displays the following information:
- CPU indicates the CPU number of the process being executed
- Terminal used by the TTY process
- PID Process ID
- Owner name of the USERNAME process
- PRI Process Priority
- NI Nice value
- Size of total process in memory
- The size of the RES process resident, which is a approximate value.
- Current status of the state process
- CPU time consumed by the timing process
- %WCPU Process CPU Utilization weight percentage
- Raw CPU Utilization Percentage of the%CPU process
- Command name to start the process
Exit using top, press Q key
Second,
vmstat–
Collect system activity, hardware and system Information
The Vmstat command displays statistics about the virtual memory, which displays errors about the process, the page, and this command can also be used to look at CPU and disk I/O information
A. Fields under the Procs sub-heading:
• R the number of processes running
・the the number of processes that are blocked by the resource (may be waiting for I/O or memory)
• W are waiting for running processes that have been swapped out of main memory (due to memory shortages)
Note that the W field, which represents the number of processes being exchanged (swap out), if the value of this column is not 0, indicates that there is a problem with your memory.
B. Fields under the memory sub-title:
AVM Active virtual memory, which is the memory page assigned to the process
free indicates the actual available memory page size
C. Fields under Page sub-headings:
re recycled pages, a large number indicates insufficient memory
at address translation error
Pi page in
po page to swap out
fr pages Released per second
De Short-term expected memory shortage
SR refers to the number of pages (san rate) that a page daemon needs to scan for when it finds available memory space.
The most important of these domains is the Pi, Po, DE, and SR domains, and when the program starts, you can see the activity of page in under Pi, which is normal. However, if the process is still discovered after the program starts, there is an activity from disk page in, which indicates that the situation is not very good. PO refers to the system moves the process out, in order to allow other processes to set aside memory space, if there is activity, is also a bad signal. If you find that the value in the De entry is not 0, this indicates a significant problem, and this value indicates that there is expected to be a memory shortage
D. Fields under the fault sub-heading (shows the trap and interrupt rate per second for the last 5 seconds):
Combinatorial per second of device interrupts
sy system calls per second
• CS The conversion rate of the CPU context
E. Domains under the CPU sub-headings
US user time for normal or low-priority processes
SY system Time
ID Idle time
can use the following Vmstat command mode, collect one-time energy data every 5 seconds, total display 3 times
Third,
uptime–
See how long the system has been running.
The simplest look at the system load command is uptime, which is usually used to see how long the machine has been running:
It gives three aspects of information, first you can understand how long the server has been running, if you find that the server is running a short time, and recently did not schedule a server restart, there may be problems with the server, causing the system to automatically restart recently. The next information is the number of users, because the application and database users do not directly through the operating system to access the server, so it does not really reflect how many people are using this server, but you can still understand, for particularly large or very small number to pay attention to. Finally, the average load of the system loads average, as shown in the last 1 minutes of the system load is 0.04, the last 5 minutes of the average load is 0.11, the last 15 minutes of the average load is 0.14,
Four,
W –
Find out what users are accessing the system and what they are doing
The W command displays information about the users on the current machine and their processes.
Five,
PS –
Show Process Information
Using the PS command to list processes, the PS command without parameters lists all processes that invoke the command user. The typical output of this command is as follows:
In order to list all the processes, you can use the PS command with the-EF option.
Six,
iostat–
Statistics
CPU
average load and disk activity
The Iostat command can be used to monitor the I/O behavior of disk drives:
- Device displays the actual disk devices given to the report
- BPS Displays the input/output kilobytes per second
- SPS lists the number of lookups per second
- MSPs average number of good descriptors per lookup required
The following command displays statistics for 3 io, once every 5 seconds.
Seven,
SAR –
collecting and reporting system activities
You can use the SAR System Activity report to check the I/O for the disk.
The following SAR command, which displays the statistics of disk IO every 3 seconds, is collected 5 times.
- BREAD/S number of read operations per second from disk to buffer memory
- LREAD/S number of read operations per second from the buffer store
- %rcache hit rate for read-operation Buffer memory
- BWRITE/S times per second from buffer memory to disk write operations
- LWIRT/S number of operations written to buffer memory per second
- %wcache hit rate for write buffer memory
- PREAD/S read operations per second from bare devices
- PWRIT/S number of write operations per second to bare devices
Are the%busy values for disks often greater than 50? For this disk, whether it has its avwait>avserv
Phenomenon? (because of the physical IO and logical IO configuration balance, and the buffer Page/swap space/asynchronous read and write problems, disk bottlenecks are difficult to judge by a single factor, 50% is only a general evaluation criteria, to combine the specific situation of comprehensive analysis. Sometimes,%busy is already a disk bottleneck for just 20, and another system that we think the disk is working on is likely to have a%busy value of 80.
You can also monitor the load on the CPU with the SAR command, the-u option of the SAR command displays the statistics for the CPU, and the output shows CPU time allocations such as users, systems, waiting I/O, and idle states. The following command displays 5 CPU statistics, once every 3 seconds.
The CPU utilization is reported as a percentage of the system process%sys, the user process%usr, and the idle process%idel each of the percentages, plus%wio indicates how much time is waiting for the disk I/O. If the CPU is idle very high, then you can not do anything, but if the long time%idle value <5, indicating high CPU utilization, indicating that the CPU is likely to have bottlenecks, you need further analysis.
Generally speaking, we do not want the system process to occupy a high CPU load, the CPU should be more service user process, in general, I would like to see the system process accounted for 20% or 30%, user processes accounted for 70% or 80%.
If the%usr>80 for a long time indicates that CPU resources are basically consumed by user processes, there is a noticeable bottleneck in the CPU.
If the%USR value is rarely >80, the system's possible bottlenecks exist in the middle of the CPU, memory, or I/O.
If the%wio value is >15, this is a signal that the disk has a bottleneck.
Eight,
netstat
– Network Status statistics
Used to monitor network behavior such as accepted and emitted network traffic, protocol usage, IP addresses assigned to interface card adapters, and so on.
Netstat-i output NIC status report.
Netstat-in Displays the address column for the IP addresses rather than the host names.
Common commands for Linux system monitoring