The process level of Linux performance anomaly targeting

Last Update:2015-03-17 Source: Internet

Author: User

Tags cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

This article shares with you: Linux system common performance anomalies, how to navigate to the process level. To put it simply, there is a problem with Linux performance, and we need to determine which processes affect the performance of Linux.

This article deals primarily with the common performance dimensions of Linux: CPU, memory, IO, network

"Involving Tools"

Top: Integrated, partial CPU, memory

Dstat: Synthetic, disk

Iostat: Disk IO, Global

Iotop: Disk IO, accurate to process, (similar tools and Pidstat)

Iftop: Network, real-time refresh (similar tools and Nload,ifstat)

Nethogs: Process-level traffic

SS: Network, fast, low consumption of resources (replace Netstat)

Pidstat: A comprehensive

Free: Amount, memory ...

"CPU"

CPU focuses on performance metrics:

(1) CPU utilization: User, System, etc.

(2) CPU Cumulative usage duration

(3) Interrupt, Context switch, etc. (not much used)

In terms of CPU performance indicators, there are actually a lot of tools here, mainly about top and dstat

1. Top

Top rows of results I will not be introduced in detail, it is easier to use. For troubleshooting the CPU is used too high, the more critical instructions are P and T.

After you have entered P (Colon p): Sort by CPU

Input T (Colon T): Sort by time, cumulative time, to see which processes consume more history

This way we can find which process is consuming too much CPU.

The following example shows that the NTPD script consumes the most CPU time, and the current NTPD CPU usage is also large (actually only 1.3%)

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/5B/61/wKioL1UH31GDul6iAAZXbpI6vv0397.jpg "title=" 1.png " alt= "Wkiol1uh31gdul6iaazxbpi6vv0397.jpg"/>

Alternatively: You can also enter the following convenient instructions

1: Multi-core

M: Whether memory information is displayed

M: Sort by memory

H:shift+h, opening threading mode

X: Highlight of column (press B first)

shift+< or shift+> change the sorted rows

2. Dstat command

Dstat is also a more comprehensive tool.

Here are the parameters that are used to find the highest CPU take up rate

DSTAT-LCM--TOP-CPU

As you can see, NTPD (Time service) and Zabbix agent consume more CPU points (in fact, not much, just 0.x%)

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/5B/61/wKioL1UH4BrQxe5fAAWyLdfqcfE226.jpg "title=" 2.png " alt= "Wkiol1uh4brqxe5faawyldfqcfe226.jpg"/>

3, Pidstat

Direct input Pidstat (or pidstat-l, which outputs the absolute path of the command) can see the process using CPU-related data

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/5B/67/wKiom1UH3wehrOv7AAU8A04jN7M168.jpg "title=" 3.png " alt= "Wkiom1uh3wehrov7aau8a04jn7m168.jpg"/>

Memory

In general, we focus on several indicators of memory:

(1) Whether a large amount of swap space is used: if a large amount of swap space is used, there is a problem (insufficient memory?). Or is there a process exception? )

(2) How much memory space is consumed by each process:

For memory viewing, the simplest is free, plus dstat and top

1. Free

By free-m we see that swap is not used and memory is plentiful

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/5B/67/wKiom1UH39SSLni5AACjbZqfhMk514.jpg "title=" 4.png " alt= "Wkiom1uh39sslni5aacjbzqfhmk514.jpg"/>

2, Dstat

See which process consumes the most memory

Input Dstat Lcmd-top-mem

See, the puppet consumes the most memory 59m (normal)

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/5B/61/wKioL1UH4Qqif8aqAALF6D0B3MU488.jpg "title=" 5.png " alt= "Wkiol1uh4qqif8aqaalf6d0b3mu488.jpg"/>

3. Top

After entering top, enter B, enter X, then enter shift+'> ' or shif+'<' to adjust the sorted column to mem, and see the sort of memory consumption

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/5B/67/wKiom1UH4A7Re-NDAAZQ-6hfca8958.jpg "title=" 6.png " alt= "Wkiom1uh4a7re-ndaazq-6hfca8958.jpg"/>

Disk

In general, we focus on several indicators of disk:

(1) Read/write volume/sec: Dstat and Iostat (global), iotop or Pidstat (process level)

(2) Delay time per read/write disk: Iostat (Global)

Dstat,iostat is used to view the global IO situation, to be accurate to the process with iotop or Pidstat

Dstat: You can see the disk per second read, write the amount (units have also been collated, so I generally see the disk read and write will use this tool)

Iostat can see that the disk's disk reads, writes, and also sees the latency of the IO (generally I see the latency of IO using this tool)

1, Dstat

Dstat can see how many B, K, m data are read and written per second.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/5B/61/wKioL1UH4bLiiPh7AAL8DUmLShY853.jpg "title=" 7.png " alt= "Wkiol1uh4bliiph7aal8dumlshy853.jpg"/>

Dstat other parameters and commonly used are:

Dstat-g-l-m-S--top-mem

Dstat-c-y-l--proc-count--top-cpu

2, Iostat

You can also see similar effects, and richer

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/5B/67/wKiom1UH4JrxgtHOAAToQYzwYiE992.jpg "title=" 8.png " alt= "Wkiom1uh4jrxgthoaatoqyzwyie992.jpg"/>

SVCTM < await (waiting time for waiting requests is repeatedly calculated),

If SVCTM is closer to await, I/O has almost no waiting time;

If the await is much larger than SVCTM, the I/O queue is too long

Await: The average wait time (in milliseconds) for each device I/O operation. Delta (ruse+wuse)/delta (Rio+wio)

SVCTM: Average service time Per device I/O operation (milliseconds): The await is also counted

%util: How much time in a second is spent on I/O operations, or how many times in a second I/O queues are non-empty. That is, the delta (use)/s/1000 (because the unit of use is milliseconds)

3, Iotop

Through the iotop directly see the IO occupy the highest process, directly enter the Iotop command, the effect is as follows

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/5B/67/wKiom1UH4LSRfBPUAAaQAy7-Czo398.jpg "title=" 9.png " alt= "Wkiom1uh4lsrfbpuaaaqay7-czo398.jpg"/>

4, Pidstat

Through pidstat-d also can see read and write disk data many process, pidstat-d effect is as follows:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/5B/61/wKioL1UH4fSzznUJAAS_d35cDio258.jpg "title=" 10.png "alt=" Wkiol1uh4fszznujaas_d35cdio258.jpg "/>

Network

The general network mainly focuses on performance indicators:

(1) Flow of access

(2) Connection status: various states such as established,timewait, etc.

(3) Local monitoring, the number of ports consumed, etc.

Next three tools are introduced

1, SS

The SS command is used for functions similar to netstat for viewing network connection status.

SS: Low consumption of resources; compared to netstat, the reason is faster: "The secret of the SS is that it exploits the Tcp_diag in the TCP protocol stack." Tcp_diag is a module for analyzing statistics that gives you first-hand information in the Linux kernel, which ensures that the SS is fast and efficient. Of course, if you do not have tcp_diag,ss in your system, you can run normally, but the efficiency will become slightly slower "(ZZ)

The following example shows that there are currently 33 connections established, Timewaite has 866 (can be considered optimized)

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/5B/67/wKiom1UH4NuizMW7AAEOQujbnp4700.jpg "title=" 11.png "alt=" Wkiom1uh4nuizmw7aaeoqujbnp4700.jpg "/>

The following example shows which ports are open locally

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/5B/61/wKioL1UH4hyx-xdZAAFtZI_d5rE814.jpg "title=" 12.png "alt=" Wkiol1uh4hyx-xdzaaftzi_d5re814.jpg "/>

2, Iftop

With Iftop, you can see a large amount of traffic between this computer and which IP domain names:

Directly enter top to see both:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/5B/61/wKioL1UH4kHCkUUvAARy9WD1OdI285.jpg "title=" 13.png "alt=" Wkiol1uh4khckuuvaary9wd1odi285.jpg "/>

The relevant parameters of the Iftop command are described below:

- I Configure the network adapter for monitoring, such as: # Iftop - i eth1

- b in bytes Show traffic for units ( bits is the default ) , such as: # Iftop - b

- N make Host information is displayed by default directly IP , such as: # Iftop - N

- N The port information is displayed by default directly, such as : # Iftop - N

- F display traffic for a specific segment, such as # Iftop - F 10.10.1.0/24 or # Iftop - F 10.10.1.0/255.255.255.0

- H ( Display this message ), help, display parameter information

3, Nethogs

Nethogs can see process-level traffic directly

The direct input nethogs em2 (network card) can see which ports on the local and which ports on the end of the traffic, so you know which processes consume a lot of network traffic

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/5B/67/wKiom1UH4SuSqYwUAAUzsg_rWZs126.jpg "title=" 14.png "alt=" Wkiom1uh4susqywuaauzsg_rwzs126.jpg "/>

Summary

The above tools, from the CPU, memory, disk, network dimensions, can be found in each process consumption related performance (resources) of the specific situation, the system can be abnormal, to the process level is very helpful.

Now move on to the next question, how do you go further when the system performance is abnormally targeted to the process? Even deep into the code level (if the code has an exception, not a hardware exception)?

NB's operation and siege Lions will not stop there, need to go further into the investigation of the ^_^

This article is from "H2O's Operation & Development Road" blog, reprint please contact the author!

The process level of Linux performance anomaly targeting

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More