Check the Linux server performance commands in detail

Last Update:2016-08-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

If your Linux server bursts into sudden load, how can you pinpoint Linux performance problems in the shortest possible time?

You can get a general idea of system resource usage within 1 minutes by executing the following command.

Uptime
DMESG | Tail
Vmstat 1
Mpstat-p all 1
Pidstat 1
IOSTAT-XZ 1
Free-m
Sar-n DEV 1
Sar-n tcp,etcp 1
Top

Some of these commands require the installation of sysstat packages, some of which are provided by the PROCPS package. The output of these commands helps to quickly locate performance bottlenecks, checking for utilization (utilization), saturation (saturation), and error metrics for all resources (CPU, memory, disk IO, and so on), which is known as the use method.

Uptime:linux the uptime command to view the system load.

[Email protected] bin]# uptime-V  3.2.  8

Procps is a utility package, mainly including PS top kill and other programs mainly used to display and control some system information, process state and other content.

: Up  1:  2 users,  load average:30.02, 26.43, 19.02

Current Time 04:03:58
The system has been running for ten days, 13:19
Current users online 1 user
Average load: 0.54, 0.40, 0.20, last 1 minutes, 5 minutes, 15 minutes system load

With these three data, you can see whether the server load tends to be tense or tends to ease. If the 1-minute average load is high and the 15-minute average load is low, the server is facing a high load situation and needs to be further troubleshooting where the CPU resources are being consumed. Conversely, if the average load of 15 minutes is high and the average load of 1 minutes is low, it is possible that the CPU resource crunch time has passed.

The output from the above example can see that the average load over the last 1 minutes is very high and much higher than the last 15 minutes, so we need to continue to troubleshoot what processes in the current system are consuming a lot of resources. can be further troubleshooting by Vmstat, Mpstat and other commands.

Cat/proc/loadavg

The most direct view of the system average load command

[Email protected]:~# cat/proc/loadavg0.100.060.011/  29632

In addition to the first 3 digits representing the average number of processes, the next 1 fractions, the denominator represents the total number of system processes, the numerator represents the number of processes that are running, and the last number indicates the most recently run process ID

What is the system load?

The average system load is defined as the average number of processes running in a queue during a specific time interval . If a process meets the following criteria, it will be in the run queue:

It is not waiting for the results of the I/O operation
It does not actively enter the waiting state (that is, "Wait" is not called)
Not stopped (ex: waiting to be terminated)

In general, the current number of active processes per CPU core is not greater than 3, the system performance is good! Of course, it is said that each CPU core, that is, if your host is a quad-core CPU, then as long as the uptime last output of a string of characters less than 12 means that the system load is not very serious. Of course, if you reach 20, it means that the current system load is very severe, and it is very slow to open execution Web scripts .

DMESG | Tail

[Email protected]:~ #dmesg |
[1880957.563150] perl invoked oom-killer:gfp_mask=0x280da, order=0, oom_score_adj= 0
[1880957.56340018694246
[1880957.56340818694 (perl) total-vm:1972392kb, ANON-RSS:1953348KB, file-
[2320864.9544477001. Dropping request. Check SNMP counters.

The command outputs the last 10 lines of the system log. example, you can see an oom kill and a TCP packet loss for a kernel at a time. These logs can help you troubleshoot performance issues. Don't forget this step.

Vmstat 2 10

The Vmstat command is the most common Linux/unix monitoring tool that can show the status of a server at a given time interval, including server CPU utilization, memory usage, virtual memory exchange, IO Read and write situations.

[Email protected] bin]# Vmstat2 5procs-----------Memory-------------Swap-------io------System-------CPU-----r b swpd free buff cache si so bi boinchCS US sy ID WA St0  0      0 655992  18808 115428    0    0    Ten     2    the    the  0  0  About  0  0     0  0      0 655992  18816 115428    0    0     0     -    -   98  0  1  -  0  0     0  0      0 655992  18816 115428    0    0     0     0    the    the  0  1  About  0  0     0  0      0 655992  18824 115424    0    0     0    Ten    -    the  0  1  About  0  0     0  0      0 655992  18824 115428    0    0     0     0    *    the  0  0  -  0  0

2 indicates that the server state is collected every two seconds, and 5 means that only five times are collected.

In fact, in the application process, we will be monitoring for a period of time, do not want to monitor the direct end of Vmstat, such as:

[Email protected] bin]# Vmstat2  procs---------memory-------------Swap-------io-----system--------CPU-----r  b swpd   free   buff &N  Bsp;cache   si   so    bi   &NBSP;BO   in   CS US sy ID WA St  0    0 200889792  73708 591828    0    0     0     5    6   Ten &N Bsp;1  3  0  0  0    0 200889920  73708 591860    0    0     0   592 13284 4282 98  1  1  0  0  0    0 200890112  73708 591860 & nbsp  0    0     0     0 9501 2154  1  0  0  0  0   &nbs P;0 200889568  73712 591856    0    0     0    48 11900 2459 the  0  0 &  Nbsp;0  0  0    0 200890208  73712 591860    0    0     0     0 15898 4840 98  1  1  0  0

This means that Vmstat collects data every 2 seconds and collects it until I have finished the program, and I have collected 5 data and I have finished the program.

Parameter introduction.

R means running the queue (that is, how many processes are really allocated to the CPU), the server I am testing is currently idle, there is no program running, when this value exceeds the number of CPUs, there will be a CPU bottleneck. This is also related to top of the load, the general load over 3 is relatively high, more than 5 is high, more than 10 is not normal, the state of the server is very dangerous. The load on top is similar to the run queue per second. If the running queue is too large, it means that your CPU is busy, which generally results in high CPU usage.

b represents the blocking process, which is not much to say, the process is blocked, you understand.

swpd Virtual memory has been used size, if greater than 0, indicates that your machine is out of physical memory, if not the cause of program memory leaks, then you should upgrade the memory or the memory-consuming task to other machines.

Free physical memory size, my machine memory total 8G, the remaining 3415M.

Buff Linux/unix system is used to store, directory inside what content, permissions, etc. of the cache, I machine about more than 300 m

the cache cache is used directly to memorize the files we open, to buffer the files, I have about 300 m of this machine (this is the smart place of Linux/unix, the spare part of the physical memory to do the file and directory cache, is to improve the performance of the program execution, When the program uses memory, buffer/cached is quickly used. )

Si reads the size of the virtual memory from disk every second, if this value is greater than 0, it means that the physical memory is not enough or the memory leaks, to find out the memory process. My machine has plenty of memory and everything is fine.

so per second the virtual memory is written to the size of the disk, if this value is greater than 0, ibid.

The number of blocks received per second by BI block devices, where the block device refers to all the disks and other block devices on the system, the default block size is 1024byte, I have no IO operation on this machine, so it has been 0, but I have been working on copying large amounts of data (2-3T) The machine has seen can reach 140000/s, disk write speed of almost 140M per second

The number of blocks that Bo block devices send per second, such as when we read a file, the Bo will be greater than 0. Bi and Bo are generally close to 0, otherwise the IO is too frequent and needs to be adjusted.

in CPU interrupts per second, including time interrupts

CS per second, such as the number of context switches, such as we call the system function, the context switch, the thread of the switch, but also the process context switch, the smaller the value of the better, too big, to consider the number of threads or processes, such as in Apache and Nginx Web server , we generally do performance testing will carry out thousands of concurrent or even tens of thousands of concurrent testing, the process of selecting a Web server can be the peak of the process or the thread has been down, pressure measurement, until CS to a relatively small value, the process and the number of threads is a more appropriate value. System calls are also, each time the system function is called, our code will enter the kernel space, resulting in context switching, this is very resource-intensive, but also try to avoid frequent calls to system functions. Too many context switches means that most of your CPU is wasted in context switching, resulting in less time for the CPU to do serious work, and the CPU not being fully utilized, is undesirable.

US user CPU time, I used to do encryption and decryption very frequently on the server, you can see us approaching 100,r running queue reached 80 (the machine is doing a stress test, poor performance).

sy System CPU time, if too high, indicates a long system call time, for example, the IO operation is frequent.

ID Idle CPU time, in general, ID + US + sy = 100, generally I think ID is idle CPU usage, US is the user CPU usage, SY is the system CPU utilization.

wt waits for IO CPU time.

Si, so: the number of writes and reads in the swap area. If this data is not 0, the system is already using swap (swap), the machine physical memory is insufficient.
US, SY, ID, WA, ST: these all represent CPU time consumption, which represent user time, System (Kernel) time (SYS), idle time (idle), IO wait Time (wait), and stolen time (stolen, typically consumed by other virtual machines).

The above CPU time allows us to quickly understand whether the CPU is out of a busy state. In general, if user time and system time are added very large, the CPU is busy executing instructions. If the IO wait time is long, then the system bottleneck may be in disk IO.

The output from the sample commands can be seen, and a lot of CPU time is consumed in the user state, i.e. the user application consumes CPU time. This is not necessarily a performance issue and needs to be analyzed together with the R queue.

Mpstat

Mpstat is the abbreviation of multiprocessor statistics and is a real-time system monitoring tool. Reports some statistical information about the CPU, which is stored in the/proc/stat file. In a multi-CPUs system, it not only can view the average status information of all CPUs, but also can view the information of specific CPU.

Grammar:

Mpstat [-P {| All}] [internal [count]]

Parameters:

(1)-P {| All}: Indicates which CPU to monitor, and value in [0,cpu number-1];

(2) Internal: The interval between the adjacent two samples;

(3) Count: Number of samples, count can only be used with delay;

Note: When there are no parameters, Mpstat displays the average of all information after the system is started. When there is interval, the first line of information is the average information since the system started. Starting from the second line, the output is the average information for the previous interval time period

[[email protected] bin] #mpstat-P All1Linux3.13.0- the-generic (TITANCLUSTERS-XXXXX) -/ -/ -_x86_64_ ( +CPU) -: -: thePM CPU%usr%nice%sys%iowait%irq%soft%steal%guest%gniceIdle -: -: -PM All98.470.000.750.000.000.000.000.000.000. +  -: -: -Pm096.040.002.970.000.000.000.000.000.000. About  -: -: -Pm197.000.001.000.000.000.000.000.000.002.xx  -: -: -Pm298.000.001.000.000.000.000.000.000.001.xx  -: -: -Pm396.970.000.000.000.000.000.000.000.003.Geneva [...]

(1) User: in internal time period, the CPU time (%), does not contain the Nice value is negative process, the value is (usr/total) *100;

(2)Nice: In the internal time period, the nice value is the CPU time of the negative process (%), the value is (nice/total) *100;

(3) system: In the internal time period, the core time (%), the value is (system/total) *100;

(4) iowait: in the internal time period, the hard disk IO wait time (%), the value is (iowait/total) *100;

(5) IRQ: In the internal time period, the hard Interrupt time (%), the value is (irq/total) *100;

(6) Soft: In the internal time period, the soft interrupt time (%), the value is (softirq/total) *100;

(7) Idle: During the internal time period, the CPU drops idle time (%) for any reason other than waiting for the disk IO operation, the value is (idle/total) *100;

(8) intr/s: In the internal time period, the number of interrupts received by the CPU per second, the value is (intr/total) *100;

This command can show the occupancy of each CPU, and if there is a particularly high CPU utilization, it is possible that a single-threaded application is causing it.

Check the Linux server performance commands in detail

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More