Linux system performance monitoring--CPU utilization __linux

Source: Internet
Author: User
Tags switches

In the analysis of the system, one of the first and most basic tools is often a simple measurement of the CPU utilization of the system. Linux and most unix-based operating systems provide a command to display the system's average load (loadaverage).

[huangc@v-02-01-00860 ~]$ uptime 11:18:05 up for days
 ,  1:17, one users,  load average:0.20, 0.13, 0.12

Specifically, the average load value represents the average number of tasks that can be run within 1min, 5min, and 15min. The tasks that can be run include tasks that are currently running and tasks that can run but are waiting for a processor to be idle. In this case, the system has only two CPUs, which can be determined by viewing the contents of the/proc/cpuinfo

[huangc@v-02-01-00860 ~]$ cat/proc/cpuinfo processor:0 Vendor_id:genuineintel CPU Family:6 model name : Intel (R) Xeon (r) CPU e5-2670 v2 @ 2.50GHz stepping:4 CPU mhz:2500.000 cache size:25600 KB physical id:0 Sibling
S:2 core id:0 CPU cores:2 apicid:0 initial apicid:0 fpu:yes fpu_exception:yes cpuid level:13 wp:yes FLAGS:FPU VME de PSE TSC MSR PAE MCE cx8 APIC Sep MTRR PGE MCA cmov Pat PSE36 clflush DTS MMX FXSR SSE SSE2 SS HT Sysca ll NX rdtscp LM CONSTANT_TSC Arch_perfmon pebs BTS xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock PNI Pclmu LQDQ ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt AES Xsave AVX f16c rdrand hypervisor lahf_lm ida Arat xsaveopt PLN pts DT s fsgsbase smep bogomips:5000.00 clflush size:64 cache_alignment:64 address sizes:40 bits physical, in bits Virtua L Power management:processor:1 Vendor_id:genuineintel CPU Family:6 model:62 model Name:intel (R) Xeon (r) CPU E 5-2670 v2 @ 2.50GHz StepPing:4 CPU mhz:2500.000 Cache size:25600 KB physical id:0 siblings:2 core id:1 CPU cores:2 apicid:1 INI  Tial apicid:1 fpu:yes fpu_exception:yes cpuid level:13 wp:yes flags:fpu vme de PSE TSC MSR PAE MCE cx8 APIC Sep MTRR PGE MCA cmov Pat PSE36 clflush DTS MMX FXSR SSE SSE2 SS HT syscall NX RDTSCP lm CONSTANT_TSC Arch_perfmon PEBs b  TS xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic	AES Xsave AVX f16c rdrand hypervisor lahf_lm ida Arat xsaveopt PLN pts DTS fsgsbase smep bogomips:5000.00 clflush size : cache_alignment:64 address sizes:40 bits physical, and BITS virtual power management:
In this case, there are two items for two processors, so on average the processor will perform slightly less work than it can handle. At a higher level, this means that the machine needs to perform less work than it can handle. Note: If the uptime command on a dual-CPU machine displays a load average of less than 2.00, this indicates that the processor still has an additional idle cycle. On a 4-CPU machine, if the load average is less than 4.00, that's the same thing, and so on. However, the load average alone does not explain all the problems.

Although the tool can detect CPU access, it does not indicate what the system is doing and why it is so busy. If the user response time for the system is acceptable, there may be no reason to delve deeper into the system's performance.

Simple tools, such as uptime, are often shortcuts that users attempt to interpret to apply a variety of perceptible slow response times. If the system's average load indicates that response time may be caused by a single (or multiple) overloaded processor, many other tools can be used to narrow the cause of the load.

    to delve deeper into processor usage, the 3 tools described below can provide many different understandings of CPU utilization: Vmstat, Iostat, and top. Each of these tools focuses on different aspects of system monitoring, but can get a different view of the current usage of the processor. In particular, the next step is to understand whether the processor spends processing time primarily in the operating system (often called kernel space) or in applications (often called user space), or whether the processor is idle. If the processor is idle, the reason for understanding its idleness is the key to any further performance analysis. There are many reasons why the processor may be idle. For example, the most obvious reason is that a process cannot run. This may sound too obvious, but performance may be affected if a component of the workload, such as a particular process or task, is not running. In some cases, the implementation of a caching or fallback (fallback) mechanism for a component allows some applications to continue to run, although the throughput rate is reduced. For example, Internet domain name services are often configured to query the named daemon or the Off-host service. If a domain Name service provider (for example, in the first row of the/ETC/RESOLV.CONF name server statement) is not currently running, there may be a time-out period before querying other information providers. For the user, this may look like an unscheduled delay in the application. For users who use uptime to monitor a system, the average load value may not look very high. In this case, however, the Vmstat output can help narrow the scope of the troubleshooting problem.
One, Vmstat     Vmstat is a real-time performance monitoring tool. The tool provides data that helps you discover system exception activities, such as excessive page faults or the number of context switches that can degrade system performance. The display frequency of these data can be specified by the user. The VMSTAT output sample looks like this:

[huangc@v-02-01-00860 ~]$ vmstat 
procs-----------memory-------------Swap-------io------System-------CPU-----
 R  B   swpd   free   buff  cache   si   so    bi    bo   in   CS US sy ID WA St
 1  0 3853948 1386860  43092 5049692    1    6     5  1 3/0 0	

Vmstat provides the following information: The Procs section provides the number of processes that are running when the report is generated (r) and the number of processes blocked (b). You can use this information to check whether the number of running and blocking processes matches the expected value. If it does not conform to expectations, you can examine the application and kernel parameters, the system scheduler and I/O Scheduler, the distribution of processes between available processors, and so on. The memory section provides a buffer cache (buff) for swap out memory (SWPD), free memory, I/O data structures, and a memory cache (cache) that reads files from the disk in kilobytes (KB). The value of the SWPD reflects the activity of the KSWAPD. The S-WAP section provides the amount of memory (SI) that is swapped from disk and memory (so) swapped out to disk in kb/s. So reflects the KSWAPD activity when the data is swapped out to the swap area, while SI reflects the page error when the page is swapped back into physical memory.

The IO section provides the number of blocks (BI) that are read from the device and the number of blocks (BO) written to the device, in kb/s. When running I/O intensive applications, you should pay special attention to these two-part values. The System section provides the number of interrupts per second (in) and the number of context switches (CS).
The CPU portion provides the user (US), System (SY), real idle (ID), and the percentage of waiting for I/O completion (WA) in the total CPU time. CPU utilization may be the most common metric. If the WA value is too large, you should check the I/O subsystem, for example, to determine that more I/O controllers and disks are required to reduce I/o latency.
Note Uptime provides another view of the number of running processes within 3 time ranges (1min, 5min, and 15min). Therefore, if the average load value given by uptime is greater than 1 in any time range, the number of Vmstat reports can also be close to 1.
Vmstat can provide information regularly at repeated intervals, so you can obtain a dynamic system view with the following command.
Vmstat 5 10
The meaning of the above command is to output vmstat information per 5s, a total of 10 times. In addition, if the average load value is 1 in the past 1/5/10min according to the output of uptime, the output of the command should normally display a running task on each output line. It is not surprising that spikes of 5, 7, or even 20 in vmstat output information. Because the load average is a calculation average rather than an instant snapshot. These two views have their own advantages for system performance analysis work.
Suppose a user reports a slow response time to a workload in a scenario. The result of checking load averages through uptime shows that the load average is very low and may even be below the time baseline. In addition, Vmstat shows that the number of operational tasks is very low, and that the system is relatively idle based on the percentage of CPU idle time. Analysts may interpret these results as a key process has exited or is blocking waiting for an unfinished event. For example, some applications use some kind of semaphore technology to assign work and wait for completion. Perhaps the work is assigned to a back-end server or other application, and the application has stopped processing all activities for some reason. As a result, the application closest to the user is blocked, becomes not operational, and waits for feedback on a completion message before it can return the information to the user.     This may cause the system administrator to focus on the server application to find out why the requests queued for it cannot be completed. In another scenario, assuming a load average of more than 1, it may even be 1% higher than the established baseline. In addition, Vmstat shows that there are always one or two processes that are operational, but the percentage of user time within an extended time span is nearly 100%. Other tools, such as PS (1) or Top (1), may be required to discover which processes are taking up 100% of the CPU time. PS (1) provides a list of all currently existing processes, or a selected subset of processes based on command options. Top (1) (or gtop (1)) provides a continuously updated view of the most active processes, where the most active processes can be defined as the processes that currently consume the most CPU time. This data may help identify runaway processes that do not perform effective work in the system. If Vmstat (1) has reported that these processes are running primarily in user space, the system administrator may want to connect the debugger (such as GDB (1)) to the process and use breakpoints, trace execution, or other debugging methods to understand the work currently performed by the application. If Vmstat has reported that most of the time is consumed as a "system" time, other tools such as Strace (1) can be used to determine which system calls are being performed, and if the Vmstat (1) report indicates that a significant proportion of the time is spent waiting for I/O operations to complete, you can use SAR Tool to view the devices in use and may provide information such as which applications or file systems are in use, whether the system is performing an Exchange or page scheduling.


Two, top and gtop tools
The T op and gtop tools are very helpful in understanding the tasks and processes that lead to the generation of high-level information displayed by Vmstat or uptime.     They can show which processes are active and which processes consume the most processing time or memory. The top command provides constantly updated overview information for all running processes and system loads, including CPU load, memory usage, and memory usage for each process, as described in the following snapshot content. Note that top also provides a snapshot of the load average, which is very similar to the practice of uptime (1), however, top also provides subtotals for the number of processes created but currently sleeping, and the number of processes that are running. The hibernate task is a task that is blocked and waits for an activity, such as a user's keystroke to the keyboard, data from a pipe or socket, a request from another host (for example, a WEB server waiting for someone to issue a request for content), and so on. Top (1) also displays the average load per processor separately, which helps identify any imbalances in the scheduling task process. By default, the top output is refreshed frequently, and the task is sorted based on the percentage of CPU time consumed. There may also be other sorting options, such as CPU cumulative consumption or percentage of memory consumption.
[huangc@v-02-01-00860 ~]$ top top-15:17:44, 5:17, users, Load average:0.00, 0.10, 0.12 tasks:227 Total , 1 running, sleeping, 1 stopped, 0 zombie Cpu (s): 7.8%us, 12.1%sy, 0.0%ni, 79.9%id, 0.0%wa, 0.0%hi, 0.2%si   , 0.0%st mem:8193720k Total, 6315000k used, 1878720k free, 47436k buffers swap:4128760k total, 3852496k used,                                                                      
 276264k free, 5054928k cached PID USER PR NI virt RES SHR S%cpu%mem time+ COMMAND 2301 Seeproxy 0 2196m 13m 660 S 15.5 0.2 15285:39/home/seep ROXY/SEEPROXY/PYTHON27/BIN/PYTHON/HOME/SEEPROXY/SEEPROXY/SEEPROXY.P 30824 HUANGC 0 872m 1744 1096 S 7.3 0.0 6  4:05.38 hsserver-f warmstandby_hc.xml-start proxysvr-t ar-s 1-status 0 32729 zhouds 0 1958m 41m 936 S 4.6 0.5 62:54.17 hsserver-f front_demo.xml-start mainsvr-t ar-s 0-status 0 32740 zhouds 2 0 0 2818m 35m 736 S 4.6 0.4 71:29.01/home/zhouds/linux.x64/bin/hsserver-f queue_demo.xml-t ar-s 0-d/home/zhou 1307 shizj 20 0  144G 335m 4.2 4.0 60:50.38 hsserver-f shizj_uft.xml-start mainsvr-t ar-s 0-uft_status 498 Zhouds 0 2722m 48m 5920 S 3.3 0.6 46:57.94 hsserver-f uft_demo.xml-start mainsvr-t ar-s-0-status 0-syst Em_type 0-l 1496 SHIZJ 0 26.6g 11m 1188 S 3.3 0.1 43:37.45 hsserver-f shizj_init.xml-start mainsvr-t ar -S 0-status 0 30829 HUANGC 0 1355m 19m 796 S 3.3 0.2 45:27.38/home/huangc/linux.x64/bin/hss Erver-f warmstandby_hc.xml-t ar-s 1-d/home/1303 shizj 0 1286m 6120 728 S 2.3 0.1 30:22.74 hsserver-f Shizj_arb.xml-start arbsvr-t ar-s 0-status 0 32723 zhouds 0 1234m 17m 668 s 2.3 0.2 33: 27.00 hsserver-f arb_demo.xml-start arbsvr-t ar-s 0-status 0 32725 zhouds 0 614m 632 39 2 S 1.6 0.0 17:38.62Hsserver-f queue_demo.xml-start proxysvr-t ar-s 0-status 0 13284 HUANGC 0 15036 1336 948 R 1 .0 0.0 0:00.10 Top 1341 root 20 0 82 292 1412 1104 S 0.3 0.0 154:38.71/USR/SBIN/VMTOOLSD 1750 R                                                                     
    Oot 0 13584 0.3 0.0 15:07.26 lldpad-d                                                                    
    1 Root 0 19364 312 136 S 0.0 0.0 0:15.99/sbin/init                                                                    
    2 Root 0 0 0 0 S 0.0 0.0 0:00.68 [Kthreadd]                                                                 
    3 root RT 0 0 0 0 S 0.0 0.0 0:10.29 [migration/0] 4 Root 0 0 0 0 S 0.0 0.0 36:04       . [Ksoftirqd/0]                                                          
    5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 [m 
 IGRATION/0] [huangc@v-02-01-00860 ~]$

The results of the top output include the following information:
Line 1th shows the system uptime, including the current time, the length of time that the system has been running since the last reboot, the current number of users, and the average load value of 3 average processors used to represent the previous run in 1min, 5min, and 15min. Line 2nd gives statistics about the process, including the total number of processes that are running when the top output results were last updated. This line also shows the number of processes in sleep, running processes, zombie processes, and stopped processes. Lines 3rd and 4th display statistics for each CPU, including the percentage of CPU time consumed by user processes, system processes, niced processes, and idle processes. Line 5th provides memory statistics, including the total amount of memory, the amount used, the amount of slack, the amount of memory shared by different processes, and the amount of RAM used as a buffer. Line 6th shows statistics for the virtual or exchange activity, including the total amount of swap space, the size of the swap space used, the amount of free swap space, and the size of the cached swap space. The remaining lines show statistics for the specific process. Some of the more useful top parameters are shown below:
d The update delay for the output data. P displays only the information for the specified process. You can specify up to 20 processes. S Displays summary information about the time taken by the process and its child processes, and also gives the process downtime. I do not report information about the idle process. H displays all thread information for the process. N the number of times the report was generated. Top also provides a dynamic mode to modify the reported information. Press the F key to activate dynamic mode. Press the J key again to add a new column to show the most recent CPU time used by a currently executing process.
SAR tool SAR is an integral part of the SYSSTAT Toolkit. It collects and reports on a wide range of system activities in the operating system, including CPU utilization, context switching and interrupt rates, page swapping and page swap rates, shared memory usage, buffer usage, and network usage. The SAR (1) tool is useful for continually collecting and remembering system activity information
Recorded in a group of log files, thus it is possible to assess performance issues before and after the performance degradation events are reported。 SAR is often used to determine the time of an event, and can also be used to identify specific system behavior changes. SAR can output information using a shorter time interval or a fixed number of time intervals, which is very similar to Vmstat. The SAR tool performs a specified number of information output operations at a specified interval, in seconds, based on the value of the number and time interval parameters. In addition, SAR can provide average information for many of the data points that are collected.
1. CPU Utilization
[huangc@v-02-01-00860 ~]$ Sar-u-P all-c 5 Linux 2.6.32-431.el6.x86_64 (v-02-01-00860) 10/12/16 _x86_64_ (2 CPU) 15:5 0:19 CPU%user%nice%system%iowait%steal%idle all 3.38 0.00 5           .28 0.00 0.00 91.34 15:50:24 0 3.39 0.00 5.30 0.00 0.00 91.31 15:50:24    1 3.36 0.00 5.25 0.00 0.00 91.39 15:50:24 CPU%user%nice          %iowait%steal%idle 15:50:29 all 3.17 0.00 4.66 0.00 0.00 92.17 15:50:29      0 3.40 0.00 4.25 0.00 0.00 92.36 15:50:29 1 2.75 0.00 5.08        0.00 0.00 92.16 15:50:29 CPU%user%nice%system%iowait%steal%idle 15:50:34 All 2.95 0.00 5.16 0.00 0.00 91.89 15:50:34 0 3.36 0.00 5.25 0
.00 0.00 91.3915:50:34 1 2.75 0.00 4.86 0.00 0.00 92.39
 
The network and disk service processes are one of the CPU-consuming system components. When the operating system generates I/O activity, the appropriate device subsystem responds and uses a hardware interrupt signal to indicate that the I/O request is complete. These interrupts are counted by the operating system. The output is useful for visualizing the rate at which network and disk I/O activities are rendered. The SAR (1) provides this input. A performance baseline might be used to track the system outage rate, which would be another source of operating system overhead or an indicator of the potential change in system performance. The "-I SUM" option generates the following information, including the total number of interrupts per second. The "-I-all" option provides similar information (not shown) for each interrupt source.
2. Interrupt Rate

10:53:53 INTR intr/s
10:53:58 sum 4477.60 10:54:03 sum 6422.80 10:54:08 sum 6407.20 10:54:13 sum
6111.4010:54:18 sum 6095.40
10:54:23 sum 6104.81
10:54:28 sum 6149.80
...
Average:sum 4416.5

The CPU-based interrupt distribution view can be obtained through the SAR-A command (the following example extracts from the full output). Note that the system IRQ values are 0, 1, 2, 9, 12, 14, 17, 18, 21, 23, 24, and 25.

3, the distribution of interrupted distribution of distributed research may reveal the imbalance of the interrupt processing mechanism. The next step is to analyze the scheduler. One way to resolve this problem is to bind the IRQ processing to a particular processor or to many processors by setting the affinity for a particular CPU or a set of CPUs for a particular device's interrupt (or IRQ). Cases
For example, if 0x0001 is echoing back to/proc/irq/id (where the ID corresponds to a device), only CPU 0 will handle the IRQ for that device, and if 0x000f is echoing back to/proc/irq/id, CPU 0~CPU 3 is responsible for processing the device's IRQ. For some workloads, this technique can reduce the competitive phenomenon that occurs on a particular processor that is heavily used. This technology can handle I/O interrupts more efficiently, thereby improving I/O performance accordingly.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.