Use ten commands to check Linux server performance within one minute

Source: Internet
Author: User
Tags dmesg

Use ten commands to check Linux server performance within one minute

If the load on your Linux Server suddenly surges and an alert message is sent to your mobile phone, how can I find out the Linux performance problem in the shortest time? Let's take a look at this blog post by the Netflix performance engineering team and see that they use ten commands to diagnose machine performance problems within one minute.

Overview

By executing the following command, you can get a rough idea of system resource usage within one minute.

  • Uptime
  • Dmesg | tail
  • Vmstat 1
  • Mpstat-p all 1
  • Pidstat 1
  • Iostat-xz 1
  • Free-m
  • Sar-n DEV 1
  • Sar-n TCP, ETCP 1
  • Top

Some of these commands need to install the sysstat package, some of which are provided by the procps package. The output of these commands helps to quickly locate performance bottlenecks and check the utilization of all resources (CPU, memory, disk IO, etc.), saturation, and error) measurement, that is, the USE method.

Next we will introduce these commands one by one. For more parameters and instructions on these commands, refer to the Command manual.

Uptime
$ uptime23:51:26 up 21:31,  1 user,  load average: 30.02, 26.43, 19.02

This command can quickly view the server load. In Linux, the data indicates the number of processes waiting for CPU resources and the number of congested I/O processes that cannot be interrupted (Process status is D. These data give us a macro understanding of the use of system resources.

Command output indicates the average load of 1 minute, 5 minutes, and 15 minutes. With these three data, you can see whether the server load is getting tight or regional. If the average load within one minute is high, and the average load within 15 minutes is low, it indicates that the server is running a command for high load. You need to further investigate the CPU resource consumption. If the average load within 15 minutes is high and the average load within one minute is low, it may be that the CPU resource shortage time has passed.

The output in the above example shows that the average load in the last minute is very high and far higher than the load in the last 15 minutes. Therefore, we need to continue to check what processes in the current system consume a lot of resources. You can use the vmstat, mpstat, and other commands described below for further troubleshooting.

Use the tuptime tool to view the Linux server system boot time history and statistics

Dmesg | tail
$ dmesg | tail[1880957.563150] perl invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0[...][1880957.563400] Out of memory: Kill process 18694 (perl) score 246 or sacrifice child[1880957.563408] Killed process 18694 (perl) total-vm:1972392kB, anon-rss:1953348kB, file-rss:0kB[2320864.954447] TCP: Possible SYN flooding on port 7001. Dropping request.  Check SNMP counters.

This command outputs the last 10 lines of system logs. The output in the example shows the oom kill and TCP packet loss of the kernel. These logs can help you troubleshoot performance problems. Never forget this step.

Vmstat 1
$ vmstat 1procs ---------memory---------- ---swap-- -----io---- -system-- ------cpu----- r  b swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st34  0    0 200889792  73708 591828    0    0     0     5    6   10 96  1  3  0  032  0    0 200889920  73708 591860    0    0     0   592 13284 4282 98  1  1  0  032  0    0 200890112  73708 591860    0    0     0     0 9501 2154 99  1  0  0  032  0    0 200889568  73712 591856    0    0     0    48 11900 2459 99  0  0  0  032  0    0 200890208  73712 591860    0    0     0     0 15898 4840 98  1  1  0  0^C

With the vmstat (8) command, each line outputs some system core metrics, which give us a more detailed understanding of the system status. The following parameter 1 indicates that the statistical information is output every second. The header prompts the meaning of each column, which introduces some performance tuning-related columns:

  • R: Number of processes waiting for CPU resources. This data is more representative of the CPU load than the average load. The data does not contain processes waiting for IO. If this value is greater than the number of machine CPU cores, the machine's CPU resources are saturated.
  • Free: number of available system memory (in kilobytes). If the remaining memory is insufficient, the system performance may also be affected. The free command described below gives you a more detailed understanding of the system memory usage.
  • Si, so: Number of writes and reads in the SWAp zone. If the value is not 0, the system is already using the swap zone (swap), and the physical memory of the machine is insufficient.
  • Us, sy, id, wa, st: all of these represent CPU time consumption, which respectively indicate user time, system (kernel) Time (sys) idle, I/O wait time (wait), and stolen time (stolen, usually consumed by other virtual machines ).

The above CPU time allows us to quickly know whether the CPU is busy. Generally, if the user's time and system time are greatly increased, the CPU is busy executing commands. If the IO wait time is long, the System Bottleneck may be in disk IO.

The output of the sample command shows that a large amount of CPU time is consumed by the user State, that is, the CPU time consumed by the user application. This is not necessarily a performance issue. It needs to be analyzed together with the r queue.

Linux vmstat command details

Detailed description of vmstat display results in Linux

Vmstat for Linux monitoring tools

Linux vmstat commands

Linux vmstat monitors system load

Vmstat command details-Linux Performance Analysis

Mpstat-p all 1
$ mpstat -P ALL 1Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015  _x86_64_ (32 CPU)07:38:49 PM  CPU   %usr  %nice   %sys %iowait   %irq  %soft  %steal  %guest  %gnice  %idle07:38:50 PM  all  98.47   0.00   0.75    0.00   0.00   0.00    0.00    0.00    0.00   0.7807:38:50 PM    0  96.04   0.00   2.97    0.00   0.00   0.00    0.00    0.00    0.00   0.9907:38:50 PM    1  97.00   0.00   1.00    0.00   0.00   0.00    0.00    0.00    0.00   2.0007:38:50 PM    2  98.00   0.00   1.00    0.00   0.00   0.00    0.00    0.00    0.00   1.0007:38:50 PM    3  96.97   0.00   0.00    0.00   0.00   0.00    0.00    0.00    0.00   3.03[...]

This command can display the usage of each CPU. If a CPU usage is particularly high, it may be caused by a single-threaded application.

Pidstat 1
$ pidstat 1Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015    _x86_64_    (32 CPU)07:41:02 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command07:41:03 PM     0         9    0.00    0.94    0.00    0.94     1  rcuos/007:41:03 PM     0      4214    5.66    5.66    0.00   11.32    15  mesos-slave07:41:03 PM     0      4354    0.94    0.94    0.00    1.89     8  java07:41:03 PM     0      6521 1596.23    1.89    0.00 1598.11    27  java07:41:03 PM     0      6564 1571.70    7.55    0.00 1579.25    28  java07:41:03 PM 60004     60154    0.94    4.72    0.00    5.66     9  pidstat07:41:03 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command07:41:04 PM     0      4214    6.00    2.00    0.00    8.00    15  mesos-slave07:41:04 PM     0      6521 1590.00    1.00    0.00 1591.00    27  java07:41:04 PM     0      6564 1573.00   10.00    0.00 1583.00    28  java07:41:04 PM   108      6718    1.00    0.00    0.00    1.00     0  snmp-pass07:41:04 PM 60004     60154    1.00    4.00    0.00    5.00     9  pidstat^C

The pidstat command outputs the CPU usage of the process. This command will continuously output and will not overwrite the previous data, so that you can easily observe the system dynamics. The above output shows that two JAVA processes occupy nearly 1600% of the CPU time, consuming about 16 CPU core computing resources.

Iostat-xz 1
$ iostat -xz 1Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015  _x86_64_ (32 CPU)avg-cpu:  %user   %nice %system %iowait  %steal   %idle          73.96    0.00    3.73    0.03    0.06   22.21Device:   rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %utilxvda        0.00     0.23    0.21    0.18     4.52     2.08    34.37     0.00    9.98   13.80    5.42   2.44   0.09xvdb        0.01     0.00    1.02    8.94   127.97   598.53   145.79     0.00    0.43    1.78    0.28   0.25   0.25xvdc        0.01     0.00    1.02    8.86   127.79   595.94   146.50     0.00    0.45    1.82    0.30   0.27   0.26dm-0        0.00     0.00    0.69    2.32    10.47    31.69    28.01     0.01    3.23    0.71    3.98   0.13   0.04dm-1        0.00     0.00    0.00    0.94     0.01     3.78     8.00     0.33  345.84    0.04  346.81   0.01   0.00dm-2        0.00     0.00    0.09    0.07     1.35     0.36    22.50     0.00    2.55    0.23    5.62   1.78   0.03[...]^C

The iostat command is mainly used to view the disk I/O status of a machine. The output column of this command has the following meanings:

  • R/s, w/s, rkB/s, and wkB/s indicate the number of reads and writes per second and the amount of reads and writes per second (kilobytes ). A large read/write volume may cause performance problems.
  • Await: average wait time for IO operations, in milliseconds. This is the time required for applications to interact with disks, including IO wait and actual operation time. If this value is too large, it may be because the hardware device encounters a bottleneck or a fault.
  • Avgqu-sz: Average number of requests sent to the device. If the value is greater than 1, the hardware device may be saturated (some front-end hardware devices support parallel writing ).
  • % Util: Device utilization. This value indicates the device's degree of busyness. If the experience value is greater than 60, the IO performance may be affected (you can refer to the average wait time of IO operations ). If 100% is reached, the hardware is saturated.

If the data of the logical device is displayed, the device utilization does not indicate that the actual hardware device at the backend is saturated. It is worth noting that even if I/O performance is not satisfactory, it does not necessarily mean that the application performance will be poor. You can use policies such as pre-read and write cache to improve application performance.

RH442 strategy-iostat

Use the iostat command in Linux to generate a statistical report on CPU and I/O

Install iostat and mpstat

Brief description of Linux iostat command output

Linux iostat command

Linux iostat command instance details

Free-m
$ free -m             total       used       free     shared    buffers     cachedMem:        245998      24545     221453         83         59        541-/+ buffers/cache:      23944     222053Swap:            0          0          0

The free command can be used to view the system memory usage. The-m parameter indicates that the memory usage is displayed in MB. The last two columns indicate the memory used for the IO cache and the memory used for the file system page cache. Note that the second row-/+ buffers/cache seems to occupy a large amount of memory space. This is the memory usage policy of Linux system. Use the memory whenever possible. If the application requires the memory, this part of the memory will be immediately recycled and allocated to the application. Therefore, this part of memory is generally considered as available memory.

If the available memory is very small, the system may use the swap zone (IF configured), which will increase the IO overhead (which can be extracted from the iostat command) and reduce system performance.

Sar-n DEV 1
$ sar -n DEV 1Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015     _x86_64_    (32 CPU)12:16:48 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil12:16:49 AM      eth0  18763.00   5032.00  20686.42    478.30      0.00      0.00      0.00      0.0012:16:49 AM        lo     14.00     14.00      1.36      1.36      0.00      0.00      0.00      0.0012:16:49 AM   docker0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.0012:16:49 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil12:16:50 AM      eth0  19763.00   5101.00  21999.10    482.56      0.00      0.00      0.00      0.0012:16:50 AM        lo     20.00     20.00      3.25      3.25      0.00      0.00      0.00      0.0012:16:50 AM   docker0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00^C

The sar command can be used to view the network device throughput. When troubleshooting performance problems, you can determine whether the network device is saturated by the throughput of the network device. In the example output, the eth0 Nic device has a throughput of about 22 Mbytes/s, which is 176 Mbits/sec and does not reach the hardware ceiling of 1 Gbit/sec.

Sar-n TCP, ETCP 1
$ sar -n TCP,ETCP 1Linux 3.13.0-49-generic (titanclusters-xxxxx)  07/14/2015    _x86_64_    (32 CPU)12:17:19 AM  active/s passive/s    iseg/s    oseg/s12:17:20 AM      1.00      0.00  10233.00  18846.0012:17:19 AM  atmptf/s  estres/s retrans/s isegerr/s   orsts/s12:17:20 AM      0.00      0.00      0.00      0.00      0.0012:17:20 AM  active/s passive/s    iseg/s    oseg/s12:17:21 AM      1.00      0.00   8359.00   6039.0012:17:20 AM  atmptf/s  estres/s retrans/s isegerr/s   orsts/s12:17:21 AM      0.00      0.00      0.00      0.00      0.00^C

The sar command is used to view the TCP connection status, including:

  • Active/s: The number of TCP connections initiated locally per second. It is a TCP connection created through a connect call;
  • Passive/s: The number of TCP connections remotely initiated per second, that is, the TCP connections created through the accept call;
  • Retrans/s: number of TCP retransmissions per second;

The number of TCP connections can be used to determine whether too many connections are established due to performance issues. Further, you can determine whether a connection is actively initiated or passively accepted. TCP retransmission may be caused by poor network conditions or heavy server pressure, resulting in packet loss.

Top
$ toptop - 00:15:40 up 21:56,  1 user,  load average: 31.09, 29.87, 29.92Tasks: 871 total,   1 running, 868 sleeping,   0 stopped,   2 zombie%Cpu(s): 96.8 us,  0.4 sy,  0.0 ni,  2.7 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 stKiB Mem:  25190241+total, 24921688 used, 22698073+free,    60448 buffersKiB Swap:        0 total,        0 used,        0 free.   554208 cached Mem   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND 20248 root      20   0  0.227t 0.012t  18748 S  3090  5.2  29812:58 java  4213 root      20   0 2722544  64640  44232 S  23.5  0.0 233:35.37 mesos-slave 66128 titancl+  20   0   24344   2332   1172 R   1.0  0.0   0:00.07 top  5235 root      20   0 38.227g 547004  49996 S   0.7  0.2   2:02.74 java  4299 root      20   0 20.015g 2.682g  16836 S   0.3  1.1  33:14.42 java     1 root      20   0   33620   2920   1496 S   0.0  0.0   0:03.82 init     2 root      20   0       0      0      0 S   0.0  0.0   0:00.02 kthreadd     3 root      20   0       0      0      0 S   0.0  0.0   0:05.35 ksoftirqd/0     5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H     6 root      20   0       0      0      0 S   0.0  0.0   0:06.94 kworker/u256:0     8 root      20   0       0      0      0 S   0.0  0.0   2:38.05 rcu_sched

The top command contains the check content of the previous commands. For example, system load (uptime), system memory usage (free), and system CPU usage (vmstat. Therefore, you can use this command to view the sources of system load in a comprehensive manner. At the same time, THE top Command supports sorting by different columns to facilitate searching for processes with the most memory usage and processes with the highest CPU usage.

However, the output of the top command is an instantaneous value relative to the previous commands. If you do not keep staring at it, you may miss some clues. In this case, you may need to pause the top Command refresh to record and compare data.

Linux top commands

Linux top commands

Linux top commands for monitoring and diagnosis tools

Top command causes system load increase

Practical top Command

Why does Linux's htop command win the top command?

Use the most powerful process monitor htop in Ubuntu

Summary

There are still many tools to troubleshoot Linux server performance problems. The commands described above can help us quickly locate problems. For example, in the previous example output, multiple pieces of evidence prove that a JAVA Process occupies a large amount of CPU resources, and then the performance optimization can be performed on applications.

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.