Objective
This article shares with you: Linux system common performance anomalies, how to navigate to the process level. To put it simply, there is a problem with Linux performance, and we need to determine which processes affect the performance of Linux.
This article deals primarily with the common performance dimensions of Linux: CPU, memory, IO, network
"Involving Tools"
Top: Integrated, partial CPU, memory
Dstat: Synthetic, disk
Iostat: Disk IO, Global
Iotop: Disk IO, accurate to process, (similar tools and Pidstat)
Iftop: Network, real-time refresh (similar tools and Nload,ifstat)
Nethogs: Process-level traffic
SS: Network, fast, low consumption of resources (replace Netstat)
Pidstat: A comprehensive
Free: Amount, memory ...
"CPU"
CPU focuses on performance metrics:
(1) CPU utilization: User, System, etc.
(2) CPU Cumulative usage duration
(3) Interrupt, Context switch, etc. (not much used)
In terms of CPU performance indicators, there are actually a lot of tools here, mainly about top and dstat
1. Top
Top rows of results I will not be introduced in detail, it is easier to use. For troubleshooting the CPU is used too high, the more critical instructions are P and T.
After you have entered P (Colon p): Sort by CPU
Input T (Colon T): Sort by time, cumulative time, to see which processes consume more history
This way we can find which process is consuming too much CPU.
The following example shows that the NTPD script consumes the most CPU time, and the current NTPD CPU usage is also large (actually only 1.3%)
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/5B/61/wKioL1UH31GDul6iAAZXbpI6vv0397.jpg "title=" 1.png " alt= "Wkiol1uh31gdul6iaazxbpi6vv0397.jpg"/>
Alternatively: You can also enter the following convenient instructions
1: Multi-core
M: Whether memory information is displayed
M: Sort by memory
H:shift+h, opening threading mode
X: Highlight of column (press B first)
shift+< or shift+> change the sorted rows
2. Dstat command
Dstat is also a more comprehensive tool.
Here are the parameters that are used to find the highest CPU take up rate
DSTAT-LCM--TOP-CPU
As you can see, NTPD (Time service) and Zabbix agent consume more CPU points (in fact, not much, just 0.x%)
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/5B/61/wKioL1UH4BrQxe5fAAWyLdfqcfE226.jpg "title=" 2.png " alt= "Wkiol1uh4brqxe5faawyldfqcfe226.jpg"/>
3, Pidstat
Direct input Pidstat (or pidstat-l, which outputs the absolute path of the command) can see the process using CPU-related data
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/5B/67/wKiom1UH3wehrOv7AAU8A04jN7M168.jpg "title=" 3.png " alt= "Wkiom1uh3wehrov7aau8a04jn7m168.jpg"/>
Memory
In general, we focus on several indicators of memory:
(1) Whether a large amount of swap space is used: if a large amount of swap space is used, there is a problem (insufficient memory?). Or is there a process exception? )
(2) How much memory space is consumed by each process:
For memory viewing, the simplest is free, plus dstat and top
1. Free
By free-m we see that swap is not used and memory is plentiful
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/5B/67/wKiom1UH39SSLni5AACjbZqfhMk514.jpg "title=" 4.png " alt= "Wkiom1uh39sslni5aacjbzqfhmk514.jpg"/>
2, Dstat
See which process consumes the most memory
Input Dstat Lcmd-top-mem
See, the puppet consumes the most memory 59m (normal)
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/5B/61/wKioL1UH4Qqif8aqAALF6D0B3MU488.jpg "title=" 5.png " alt= "Wkiol1uh4qqif8aqaalf6d0b3mu488.jpg"/>
3. Top
After entering top, enter B, enter X, then enter shift+'> ' or shif+'<' to adjust the sorted column to mem, and see the sort of memory consumption
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/5B/67/wKiom1UH4A7Re-NDAAZQ-6hfca8958.jpg "title=" 6.png " alt= "Wkiom1uh4a7re-ndaazq-6hfca8958.jpg"/>
Disk
In general, we focus on several indicators of disk:
(1) Read/write volume/sec: Dstat and Iostat (global), iotop or Pidstat (process level)
(2) Delay time per read/write disk: Iostat (Global)
Dstat,iostat is used to view the global IO situation, to be accurate to the process with iotop or Pidstat
Dstat: You can see the disk per second read, write the amount (units have also been collated, so I generally see the disk read and write will use this tool)
Iostat can see that the disk's disk reads, writes, and also sees the latency of the IO (generally I see the latency of IO using this tool)
1, Dstat
Dstat can see how many B, K, m data are read and written per second.
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/5B/61/wKioL1UH4bLiiPh7AAL8DUmLShY853.jpg "title=" 7.png " alt= "Wkiol1uh4bliiph7aal8dumlshy853.jpg"/>
Dstat other parameters and commonly used are:
Dstat-g-l-m-S--top-mem
Dstat-c-y-l--proc-count--top-cpu
2, Iostat
You can also see similar effects, and richer
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/5B/67/wKiom1UH4JrxgtHOAAToQYzwYiE992.jpg "title=" 8.png " alt= "Wkiom1uh4jrxgthoaatoqyzwyie992.jpg"/>
SVCTM < await (waiting time for waiting requests is repeatedly calculated),
If SVCTM is closer to await, I/O has almost no waiting time;
If the await is much larger than SVCTM, the I/O queue is too long
Await: The average wait time (in milliseconds) for each device I/O operation. Delta (ruse+wuse)/delta (Rio+wio)
SVCTM: Average service time Per device I/O operation (milliseconds): The await is also counted
%util: How much time in a second is spent on I/O operations, or how many times in a second I/O queues are non-empty. That is, the delta (use)/s/1000 (because the unit of use is milliseconds)
3, Iotop
Through the iotop directly see the IO occupy the highest process, directly enter the Iotop command, the effect is as follows
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/5B/67/wKiom1UH4LSRfBPUAAaQAy7-Czo398.jpg "title=" 9.png " alt= "Wkiom1uh4lsrfbpuaaaqay7-czo398.jpg"/>
4, Pidstat
Through pidstat-d also can see read and write disk data many process, pidstat-d effect is as follows:
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/5B/61/wKioL1UH4fSzznUJAAS_d35cDio258.jpg "title=" 10.png "alt=" Wkiol1uh4fszznujaas_d35cdio258.jpg "/>
Network
The general network mainly focuses on performance indicators:
(1) Flow of access
(2) Connection status: various states such as established,timewait, etc.
(3) Local monitoring, the number of ports consumed, etc.
Next three tools are introduced
1, SS
The SS command is used for functions similar to netstat for viewing network connection status.
SS: Low consumption of resources; compared to netstat, the reason is faster: "The secret of the SS is that it exploits the Tcp_diag in the TCP protocol stack." Tcp_diag is a module for analyzing statistics that gives you first-hand information in the Linux kernel, which ensures that the SS is fast and efficient. Of course, if you do not have tcp_diag,ss in your system, you can run normally, but the efficiency will become slightly slower "(ZZ)
The following example shows that there are currently 33 connections established, Timewaite has 866 (can be considered optimized)
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/5B/67/wKiom1UH4NuizMW7AAEOQujbnp4700.jpg "title=" 11.png "alt=" Wkiom1uh4nuizmw7aaeoqujbnp4700.jpg "/>
The following example shows which ports are open locally
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/5B/61/wKioL1UH4hyx-xdZAAFtZI_d5rE814.jpg "title=" 12.png "alt=" Wkiol1uh4hyx-xdzaaftzi_d5re814.jpg "/>
2, Iftop
With Iftop, you can see a large amount of traffic between this computer and which IP domain names:
Directly enter top to see both:
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/5B/61/wKioL1UH4kHCkUUvAARy9WD1OdI285.jpg "title=" 13.png "alt=" Wkiol1uh4khckuuvaary9wd1odi285.jpg "/>
The relevant parameters of the Iftop command are described below:
- I Configure the network adapter for monitoring, such as: # Iftop - i eth1
- b in bytes Show traffic for units ( bits is the default ) , such as: # Iftop - b
- N make Host information is displayed by default directly IP , such as: # Iftop - N
- N The port information is displayed by default directly, such as : # Iftop - N
- F display traffic for a specific segment, such as # Iftop - F 10.10.1.0/24 or # Iftop - F 10.10.1.0/255.255.255.0
- H ( Display this message ), help, display parameter information
3, Nethogs
Nethogs can see process-level traffic directly
The direct input nethogs em2 (network card) can see which ports on the local and which ports on the end of the traffic, so you know which processes consume a lot of network traffic
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/5B/67/wKiom1UH4SuSqYwUAAUzsg_rWZs126.jpg "title=" 14.png "alt=" Wkiom1uh4susqywuaauzsg_rwzs126.jpg "/>
Summary
The above tools, from the CPU, memory, disk, network dimensions, can be found in each process consumption related performance (resources) of the specific situation, the system can be abnormal, to the process level is very helpful.
Now move on to the next question, how do you go further when the system performance is abnormally targeted to the process? Even deep into the code level (if the code has an exception, not a hardware exception)?
NB's operation and siege Lions will not stop there, need to go further into the investigation of the ^_^
This article is from "H2O's Operation & Development Road" blog, reprint please contact the author!
The process level of Linux performance anomaly targeting