Use ten commands to check Linux server performance within one minute
If the load on your Linux Server suddenly surges and an alert message is sent to your mobile phone, how can I find out the Linux performance problem in the shortest time? Let's take a look at this blog post by the Netflix performance engineering team and see that they use ten commands to diagnose machine performance problems within one minute.
Overview
By executing the following command, you can get a rough idea of system resource usage within one minute.
- Uptime
- Dmesg | tail
- Vmstat 1
- Mpstat-p all 1
- Pidstat 1
- Iostat-xz 1
- Free-m
- Sar-n DEV 1
- Sar-n TCP, ETCP 1
- Top
Some of these commands need to install the sysstat package, some of which are provided by the procps package. The output of these commands helps to quickly locate performance bottlenecks and check the utilization of all resources (CPU, memory, disk IO, etc.), saturation, and error) measurement, that is, the USE method.
Next we will introduce these commands one by one. For more parameters and instructions on these commands, refer to the Command manual.
Uptime
$ uptime23:51:26 up 21:31, 1 user, load average: 30.02, 26.43, 19.02
This command can quickly view the server load. In Linux, the data indicates the number of processes waiting for CPU resources and the number of congested I/O processes that cannot be interrupted (Process status is D. These data give us a macro understanding of the use of system resources.
Command output indicates the average load of 1 minute, 5 minutes, and 15 minutes. With these three data, you can see whether the server load is getting tight or regional. If the average load within one minute is high, and the average load within 15 minutes is low, it indicates that the server is running a command for high load. You need to further investigate the CPU resource consumption. If the average load within 15 minutes is high and the average load within one minute is low, it may be that the CPU resource shortage time has passed.
The output in the above example shows that the average load in the last minute is very high and far higher than the load in the last 15 minutes. Therefore, we need to continue to check what processes in the current system consume a lot of resources. You can use the vmstat, mpstat, and other commands described below for further troubleshooting.
Use the tuptime tool to view the Linux server system boot time history and statistics
Dmesg | tail
$ dmesg | tail[1880957.563150] perl invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0[...][1880957.563400] Out of memory: Kill process 18694 (perl) score 246 or sacrifice child[1880957.563408] Killed process 18694 (perl) total-vm:1972392kB, anon-rss:1953348kB, file-rss:0kB[2320864.954447] TCP: Possible SYN flooding on port 7001. Dropping request. Check SNMP counters.
This command outputs the last 10 lines of system logs. The output in the example shows the oom kill and TCP packet loss of the kernel. These logs can help you troubleshoot performance problems. Never forget this step.
Vmstat 1
$ vmstat 1procs ---------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st34 0 0 200889792 73708 591828 0 0 0 5 6 10 96 1 3 0 032 0 0 200889920 73708 591860 0 0 0 592 13284 4282 98 1 1 0 032 0 0 200890112 73708 591860 0 0 0 0 9501 2154 99 1 0 0 032 0 0 200889568 73712 591856 0 0 0 48 11900 2459 99 0 0 0 032 0 0 200890208 73712 591860 0 0 0 0 15898 4840 98 1 1 0 0^C
With the vmstat (8) command, each line outputs some system core metrics, which give us a more detailed understanding of the system status. The following parameter 1 indicates that the statistical information is output every second. The header prompts the meaning of each column, which introduces some performance tuning-related columns:
- R: Number of processes waiting for CPU resources. This data is more representative of the CPU load than the average load. The data does not contain processes waiting for IO. If this value is greater than the number of machine CPU cores, the machine's CPU resources are saturated.
- Free: number of available system memory (in kilobytes). If the remaining memory is insufficient, the system performance may also be affected. The free command described below gives you a more detailed understanding of the system memory usage.
- Si, so: Number of writes and reads in the SWAp zone. If the value is not 0, the system is already using the swap zone (swap), and the physical memory of the machine is insufficient.
- Us, sy, id, wa, st: all of these represent CPU time consumption, which respectively indicate user time, system (kernel) Time (sys) idle, I/O wait time (wait), and stolen time (stolen, usually consumed by other virtual machines ).
The above CPU time allows us to quickly know whether the CPU is busy. Generally, if the user's time and system time are greatly increased, the CPU is busy executing commands. If the IO wait time is long, the System Bottleneck may be in disk IO.
The output of the sample command shows that a large amount of CPU time is consumed by the user State, that is, the CPU time consumed by the user application. This is not necessarily a performance issue. It needs to be analyzed together with the r queue.
Linux vmstat command details
Detailed description of vmstat display results in Linux
Vmstat for Linux monitoring tools
Linux vmstat commands
Linux vmstat monitors system load
Vmstat command details-Linux Performance Analysis
Mpstat-p all 1
$ mpstat -P ALL 1Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)07:38:49 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle07:38:50 PM all 98.47 0.00 0.75 0.00 0.00 0.00 0.00 0.00 0.00 0.7807:38:50 PM 0 96.04 0.00 2.97 0.00 0.00 0.00 0.00 0.00 0.00 0.9907:38:50 PM 1 97.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 2.0007:38:50 PM 2 98.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 1.0007:38:50 PM 3 96.97 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.03[...]
This command can display the usage of each CPU. If a CPU usage is particularly high, it may be caused by a single-threaded application.
Pidstat 1
$ pidstat 1Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)07:41:02 PM UID PID %usr %system %guest %CPU CPU Command07:41:03 PM 0 9 0.00 0.94 0.00 0.94 1 rcuos/007:41:03 PM 0 4214 5.66 5.66 0.00 11.32 15 mesos-slave07:41:03 PM 0 4354 0.94 0.94 0.00 1.89 8 java07:41:03 PM 0 6521 1596.23 1.89 0.00 1598.11 27 java07:41:03 PM 0 6564 1571.70 7.55 0.00 1579.25 28 java07:41:03 PM 60004 60154 0.94 4.72 0.00 5.66 9 pidstat07:41:03 PM UID PID %usr %system %guest %CPU CPU Command07:41:04 PM 0 4214 6.00 2.00 0.00 8.00 15 mesos-slave07:41:04 PM 0 6521 1590.00 1.00 0.00 1591.00 27 java07:41:04 PM 0 6564 1573.00 10.00 0.00 1583.00 28 java07:41:04 PM 108 6718 1.00 0.00 0.00 1.00 0 snmp-pass07:41:04 PM 60004 60154 1.00 4.00 0.00 5.00 9 pidstat^C
The pidstat command outputs the CPU usage of the process. This command will continuously output and will not overwrite the previous data, so that you can easily observe the system dynamics. The above output shows that two JAVA processes occupy nearly 1600% of the CPU time, consuming about 16 CPU core computing resources.
Iostat-xz 1
$ iostat -xz 1Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)avg-cpu: %user %nice %system %iowait %steal %idle 73.96 0.00 3.73 0.03 0.06 22.21Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %utilxvda 0.00 0.23 0.21 0.18 4.52 2.08 34.37 0.00 9.98 13.80 5.42 2.44 0.09xvdb 0.01 0.00 1.02 8.94 127.97 598.53 145.79 0.00 0.43 1.78 0.28 0.25 0.25xvdc 0.01 0.00 1.02 8.86 127.79 595.94 146.50 0.00 0.45 1.82 0.30 0.27 0.26dm-0 0.00 0.00 0.69 2.32 10.47 31.69 28.01 0.01 3.23 0.71 3.98 0.13 0.04dm-1 0.00 0.00 0.00 0.94 0.01 3.78 8.00 0.33 345.84 0.04 346.81 0.01 0.00dm-2 0.00 0.00 0.09 0.07 1.35 0.36 22.50 0.00 2.55 0.23 5.62 1.78 0.03[...]^C
The iostat command is mainly used to view the disk I/O status of a machine. The output column of this command has the following meanings:
- R/s, w/s, rkB/s, and wkB/s indicate the number of reads and writes per second and the amount of reads and writes per second (kilobytes ). A large read/write volume may cause performance problems.
- Await: average wait time for IO operations, in milliseconds. This is the time required for applications to interact with disks, including IO wait and actual operation time. If this value is too large, it may be because the hardware device encounters a bottleneck or a fault.
- Avgqu-sz: Average number of requests sent to the device. If the value is greater than 1, the hardware device may be saturated (some front-end hardware devices support parallel writing ).
- % Util: Device utilization. This value indicates the device's degree of busyness. If the experience value is greater than 60, the IO performance may be affected (you can refer to the average wait time of IO operations ). If 100% is reached, the hardware is saturated.
If the data of the logical device is displayed, the device utilization does not indicate that the actual hardware device at the backend is saturated. It is worth noting that even if I/O performance is not satisfactory, it does not necessarily mean that the application performance will be poor. You can use policies such as pre-read and write cache to improve application performance.
RH442 strategy-iostat
Use the iostat command in Linux to generate a statistical report on CPU and I/O
Install iostat and mpstat
Brief description of Linux iostat command output
Linux iostat command
Linux iostat command instance details
Free-m
$ free -m total used free shared buffers cachedMem: 245998 24545 221453 83 59 541-/+ buffers/cache: 23944 222053Swap: 0 0 0
The free command can be used to view the system memory usage. The-m parameter indicates that the memory usage is displayed in MB. The last two columns indicate the memory used for the IO cache and the memory used for the file system page cache. Note that the second row-/+ buffers/cache seems to occupy a large amount of memory space. This is the memory usage policy of Linux system. Use the memory whenever possible. If the application requires the memory, this part of the memory will be immediately recycled and allocated to the application. Therefore, this part of memory is generally considered as available memory.
If the available memory is very small, the system may use the swap zone (IF configured), which will increase the IO overhead (which can be extracted from the iostat command) and reduce system performance.
Sar-n DEV 1
$ sar -n DEV 1Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)12:16:48 AM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil12:16:49 AM eth0 18763.00 5032.00 20686.42 478.30 0.00 0.00 0.00 0.0012:16:49 AM lo 14.00 14.00 1.36 1.36 0.00 0.00 0.00 0.0012:16:49 AM docker0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0012:16:49 AM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil12:16:50 AM eth0 19763.00 5101.00 21999.10 482.56 0.00 0.00 0.00 0.0012:16:50 AM lo 20.00 20.00 3.25 3.25 0.00 0.00 0.00 0.0012:16:50 AM docker0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00^C
The sar command can be used to view the network device throughput. When troubleshooting performance problems, you can determine whether the network device is saturated by the throughput of the network device. In the example output, the eth0 Nic device has a throughput of about 22 Mbytes/s, which is 176 Mbits/sec and does not reach the hardware ceiling of 1 Gbit/sec.
Sar-n TCP, ETCP 1
$ sar -n TCP,ETCP 1Linux 3.13.0-49-generic (titanclusters-xxxxx) 07/14/2015 _x86_64_ (32 CPU)12:17:19 AM active/s passive/s iseg/s oseg/s12:17:20 AM 1.00 0.00 10233.00 18846.0012:17:19 AM atmptf/s estres/s retrans/s isegerr/s orsts/s12:17:20 AM 0.00 0.00 0.00 0.00 0.0012:17:20 AM active/s passive/s iseg/s oseg/s12:17:21 AM 1.00 0.00 8359.00 6039.0012:17:20 AM atmptf/s estres/s retrans/s isegerr/s orsts/s12:17:21 AM 0.00 0.00 0.00 0.00 0.00^C
The sar command is used to view the TCP connection status, including:
- Active/s: The number of TCP connections initiated locally per second. It is a TCP connection created through a connect call;
- Passive/s: The number of TCP connections remotely initiated per second, that is, the TCP connections created through the accept call;
- Retrans/s: number of TCP retransmissions per second;
The number of TCP connections can be used to determine whether too many connections are established due to performance issues. Further, you can determine whether a connection is actively initiated or passively accepted. TCP retransmission may be caused by poor network conditions or heavy server pressure, resulting in packet loss.
Top
$ toptop - 00:15:40 up 21:56, 1 user, load average: 31.09, 29.87, 29.92Tasks: 871 total, 1 running, 868 sleeping, 0 stopped, 2 zombie%Cpu(s): 96.8 us, 0.4 sy, 0.0 ni, 2.7 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 stKiB Mem: 25190241+total, 24921688 used, 22698073+free, 60448 buffersKiB Swap: 0 total, 0 used, 0 free. 554208 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20248 root 20 0 0.227t 0.012t 18748 S 3090 5.2 29812:58 java 4213 root 20 0 2722544 64640 44232 S 23.5 0.0 233:35.37 mesos-slave 66128 titancl+ 20 0 24344 2332 1172 R 1.0 0.0 0:00.07 top 5235 root 20 0 38.227g 547004 49996 S 0.7 0.2 2:02.74 java 4299 root 20 0 20.015g 2.682g 16836 S 0.3 1.1 33:14.42 java 1 root 20 0 33620 2920 1496 S 0.0 0.0 0:03.82 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:05.35 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 6 root 20 0 0 0 0 S 0.0 0.0 0:06.94 kworker/u256:0 8 root 20 0 0 0 0 S 0.0 0.0 2:38.05 rcu_sched
The top command contains the check content of the previous commands. For example, system load (uptime), system memory usage (free), and system CPU usage (vmstat. Therefore, you can use this command to view the sources of system load in a comprehensive manner. At the same time, THE top Command supports sorting by different columns to facilitate searching for processes with the most memory usage and processes with the highest CPU usage.
However, the output of the top command is an instantaneous value relative to the previous commands. If you do not keep staring at it, you may miss some clues. In this case, you may need to pause the top Command refresh to record and compare data.
Linux top commands
Linux top commands
Linux top commands for monitoring and diagnosis tools
Top command causes system load increase
Practical top Command
Why does Linux's htop command win the top command?
Use the most powerful process monitor htop in Ubuntu
Summary
There are still many tools to troubleshoot Linux server performance problems. The commands described above can help us quickly locate problems. For example, in the previous example output, multiple pieces of evidence prove that a JAVA Process occupies a large amount of CPU resources, and then the performance optimization can be performed on applications.
This article permanently updates the link address: