CPU of Linux system and performance monitoring

Source: Internet
Author: User
Tags switches cpu usage


CPU of Linux system and performance monitoring


Performance optimization is the process of finding bottlenecks in system processing and removing them. This article was translated by sanotes.net webmaster Tonnyom in August 2009 from the Linux System and performance monitoring series. This article is the first in a series that describes CPU performance monitoring.


Tonnyom Source: sanotes.net|2010-12-24 13:25Mobile collection SharingCTO Training Camp | December 3-5th, Shenzhen, is the time to become a good technology manager


Editor's note: This article was translated by sanotes.net webmaster Tonnyom in August 2009 from the Linux System and performance monitoring series. This article is the first in a series that describes CPU performance monitoring.



Preface: There are a lot of articles on the Internet, then why there is this article, there are so several reasons, is the motivation of my translation, first, the concept and content although old-fashioned, but all speak very thorough, And it's very comprehensive. Secondly, the theory is practical, and the case analysis is good. Third, not fancy, the tools and commands are the most basic, to help the actual operation. But I Caishuxueqian, most of the translation is based on their own understanding of the original text, we can also go to Oscan on their own to find the original text, if there is any big discrepancy, but also hope message reply, very grateful!


1.0 Introduction to Performance monitoring


Performance optimization is to find the system processing bottlenecks and the process of removing these, most administrators believe that some of the relevant "cook book" can achieve performance optimization, usually through some configuration of the kernel can be simple to solve the problem, but not for each environment, performance optimization is actually the OS Each subsystem achieves a balanced definition of the subsystems that include:


Cpumemoryionetwork


The relationships between these subsystems are interdependent with each other, and any high load can cause problems for other subsystems. For example:


    1. A large number of paging requests result in memory queue congestion
    2. High throughput of network cards can lead to more CPU overhead
    3. A lot of CPU overhead will also try to make more memory use requests
    4. A large number of disk write requests from memory can lead to more CPU and IO problems


So to optimize a system, find out what the bottleneck is the key, although it seems that a certain subsystem problems, in fact, there may be other subsystems caused.



1.1 Determining the type of application



Based on the need to understand where to begin to optimize bottlenecks, the first important point is to understand and analyze the characteristics of the current system, most systems run the type of application, mainly 2 kinds:



IO Bound: Applications in this category are generally high-load memory usage and storage systems, which actually represent the application of the IO category, which is a process of mass data processing. The application of the IO category does not initiate additional requests from the CPU and the network (unless networked storage hardware like NAS). IO category applications typically use CPU resources to generate IO requests and to enter the sleep state of the kernel scheduler. Usually the database software (mysql,oracle, etc.) is considered as the application type of IO category.



CPU Bound (CPU category): Applications in this category are typically high-load CPU usage. The application of CPU category is a process of batch processing CPU request and mathematical calculation. Typically web Server,mail server, and other types of services are considered to be CPU-bound application types.



1.2 Determining baseline Statistics



System utilization is generally determined by the Administrator's experience and the system's own use. The only thing to be clear is what the system optimization wants to achieve, and what needs to be optimized, and what is the reference value? Therefore, a baseline is established, and this statistic must be the system's available performance state value. Used to compare unavailable performance status values.



In the following example, a baseline snapshot of 1 system performance is used to compare a snapshot of the systemic energy when the load is high.


# vmstat 1procs Memory swap io system Cpur b swpd free buff cache si so bi bo in cs us sy wa id1 0 138592 17932 126272 214 244 0 0 1 18 109 19 2 1 1 960 0 138592 17932 126272 214244 0 0 0 0 105 46 0 1 0 990 0 138592 17932 126272 214244 0 0 0 0 1 98 62 40 14 0 450 0 138592 17932 126272 214244 0 0 0 0 117 49 0 0 0 1000 0 138592 17924 126272 214244 0 0 0 176 220 938 3  4 13 800 0 138592 17924 126272 214244 0 0 0 0 358 1522 8 17 0 751 0 138592 17924 126272 214244 0 0 0 0 368 1447 4 24 0 720 0 138592 17924 126272 214244 0 0 0 0 352 1277 9 12 0 79
# vmstat 1procs Memory swap io system Cpur b swpd free buff cache si so bi bo in cs us sy wa id2 0 145940 17752 118600 215  592 0 1 1 18 109 19 2 1 1 962 0 145940 15856 118604 215652 0 0 0 468 789 108 86 14 0 03 0 146208 13884 118600 214640 0  498 71 91 9 0 02 0 146388 13764 118600 213788 0 340 0 340 672 41 87 13 0 02 0 147092 13788 118600 212452 0 740 0 13  24 620 61 92 8 0 02 0  13848 118600 211580 0 720 0 720 690 41 96 4 0 02 0 147912 13744 118192 210592 0 720 0 720 605 44 95 5 0 02 0 148452 13900 118192 209260 0 372 0 372 639 45 81 19 0 02 0 149132 13692 117824 208412 0 372 0 372 457 47 9 0 10 0 0


As you can see from the first result above, the last column (ID) represents the idle time, and we can see that the CPU's idle time is 79%-100% when the baseline is counted. The second result shows that the system is at 100% occupancy and has no idle time. From this comparison, We can determine whether the CPU usage should be optimized.


2.0 Installing Monitoring Tools


Most *nix systems have a stack of standard monitoring commands. These commands are part of the *nix from the beginning. Linux provides additional monitoring tools through the basic installation package and additional packages, most of which are available in various Linux distributions. Although there are additional open source and third-party monitoring software, this document only discusses monitoring tools based on the Linux release version.



This chapter discusses what tools are in place to monitor system performance.


Tools      Descriptions                                                               whether Base vmstat All purpose Performance tool in the software source warehouse Yes Yesmpstat provides statistics per CPU no Yessar all Purpose performance Monitoring Tool No Yesiostat provides disk statistics No YESNETSTAT provides network s Tatistics Yes Yesdstat monitoring statistics aggregator No in most distributionsipt    RAF Traffic Monitoring Dashboard No yesnetperf Network bandwidth tool No                        In some Distributionsethtool reports on Ethernet interface configuration Yes   yesiperf Network bandwidth Tool No yestcptrace Packet Analysis tool no Yes 
3.0 CPU Introduction


CPU utilization relies heavily on what resources are trying to access. The kernel scheduler will be responsible for dispatching 2 kinds of resources: threads (single or multiple) and interrupts. Scheduler to define different priorities for different resources. The following list is ranked from high to Low:



Interrupts-The device notifies the kernel that they have completed a process of data processing. For example, when a NIC device delivers a network packet or a piece of hardware provides an IO request.



Kernel (System) Processes (the kernel processing process)-all kernel processing is the control priority level.



User Processes-This section deals with "Userland". All software programs are running in this user space. This block is at a low priority in the kernel scheduling mechanism.



From above, we can see how the kernel manages different resources. There are a few key things to be introduced, and the following sections will cover the context, run queues, and utilization: utilization.



3.1 Context Switches



Most modern processors can run a process (single thread) or threads. Multiple Hyper-Threading processors have the ability to run multiple threads. However, the Linux kernel also takes the dual core chip of each processor core as a standalone processor. For example, a Linux kernel system on a dual-core processor, Is that the report is displayed as two separate processors.



A standard Linux kernel can run 50 to 50,000 processing threads. When there is only one CPU, the kernel dispatches and balances each process thread. Each thread is assigned a time limit that is spent in the processor. A thread either gets a time limit or has been preempted to get some higher priority ( such as hardware interrupts), where higher priority threads will be re-placed from the zone back into the processor's queue. The transition relationship of this thread is the context switch we mentioned.



Each time the kernel's context switches, the resource is used to shut down the thread in the CPU register and put it in the queue. The more context switches are in the system, the more work the kernel will get under the processor's scheduling management.



3.2 Running the queue



Each CPU maintains a running queue for one thread. In theory, the scheduler should constantly run and execute threads. The process thread is not in the sleep state (in both blocking and waiting IO) or in the operational state. If the CPU subsystem is under high load, That means the kernel scheduler will not be able to respond to system requests in a timely manner. Results can be run-state processes are congested in the running queue. When the running queue becomes larger and larger, the process threads will spend more time getting executed.



The popular term is "load", which provides the detailed status of the current running queue. The system load is the number of threads in the CPU queue and the combination of how many process threads are currently executing. If a dual-core system executes 2 threads and 4 are in the running queue, the load should be 6. The load averages shown in the top program refers to load in 1, 5, 15 minutes or less.



3.3 CPU Utilization



CPU utilization is the percentage that defines the CPU usage. One of the most important metrics of the evaluation system is CPU utilization. Most performance monitoring tools have the following classifications for CPU utilization:



User Time-The percentage of CPU overhead time that the process was executed in user space.



System time (kernel threads and interrupt times)-about the thread in kernel space and the percentage of interrupts in CPU overhead time.



Wait io (I/O request wait time)-Percentage of time that all process threads are blocked waiting to complete the CPU overhead of an IO request.



Idle-The percentage of time that a full idle process spends in the CPU processor.


4.0 CPU Performance monitoring


Understand the relationship between running queues, utilization, and context switching on how CPU performance is optimized. Earlier mentions that performance is relative to baseline data. In some systems, the performance that is typically expected to be achieved includes:



Run Queues-each processor should run a queue of no more than 1-3 threads. For example, a dual-core processor should run a queue with no more than 6 threads.



CPU Utiliation-If a CPU is fully used, the ratio between utilization classifications should be


65%-70% User time30%-35% System time0%-5% Idle time


Contextual switches-The number of context switches is directly related to CPU usage, and a large number of context switches are normal if the CPU utilization is maintained in the above equilibrium state.



Many of the tools on Linux can get these status values, first of all, Vmstat and top 2 tools.



Use of 4.1 vmstat tools



The Vmstat tool provides a low-overhead system performance observation. Because Vmstat itself is a low-overhead tool, you need to view and monitor the health of your systems on very high-load servers, or you can use the Vmstat output in a control window. This tool runs in 2 modes: Average and sample mode. The sample mode measures the status value by specifying the interval time. This mode is useful for understanding performance under constant load. Here's the



Vmstat example of running a 1-second interval:


 # vmstat 1procs-----------memory-------------Swap-------io------System------CPU----r b swpd free buff cache si s o bi bo in cs us sy ID wa0 0 104300 16800 95328 72200 0 0 5 26 7 14 4 1 95 00 0 104300 16800 95328 72200 0 0 0 24 1021 64 1 1 98 0 104300 16800 95328 72200 0 0 0 0 1009 1 1 98 0 

Table 1:the Vmstat CPU Statistics
Field Description
R The amount of threads in the run queue. These was threads that was runnable, but the CPU was not available to execute them.
The number of threads in the current run queue. The thread is in a running state, but the CPU has failed to execute.
b This was the number of processes blocked and waiting on IO requests to finish.
The number of the current process blocking and waiting for the IO request to complete
In this is the number of interrupts being processed.
The number of current interrupts being processed
CS This is the number of the context switches currently happening on the system.
The number of context switches that occur in the current kernel system
Us this is the percentage of the user CPU utilization.
Percent of CPU utilization
SYS this is the percentage of kernel and interrupts utilization.
Percentage of kernel and outage utilization
WA This is the percentage of idle processor time due to the fact, all runnable threads be blocked waiting on IO.
Percentage of all operational status threads blocked waiting for IO requests
ID This is the percentage of time and the CPU is completely idle.
Percent of CPU idle time


4.2 Case Study: Continuous CPU utilization



In this case, the system is fully utilized


# vmstat 1procs Memory swap io system Cpur b swpd free buff cache si so bi bo in cs us sy wa ID3 0 206564 15092 80336 1760 80 0 0 0 0 718 26 81 19 0 02 0 206564 14772 80336 176120 0 0 0 0 758 23 96 4 0 01 0 206564 14208 80336 176136 0 0 0 0 820 20 96 4 0 01 0 206956 13884 79180 175964 0 412 0 2680 1008 80 93 7 0 02 0 207348 14448 78800 175576 0 412 0 412 763 70 84 16 0 02 0 207348 15756 78800 175424 0 0 0 0 874 25 89 11 0 01 0 207348 16368 78800 175596 0 0 0 0 940 24 86 14 0 01 0 2073 48 16600 78800 175604 0 0 0 0 929 27 95 3 0 23 0 207348 16976 78548 175876 0 0 0 2508 969 35 93 7 0 04 0 207348 16216 7854 8 175704 0 0 0 0 874 36 93 6 0 14 0 207348 16424 78548 175776 0 0 0 0 850 26 77 23 0 02 0 207348 17496 78556 175840 0 0 0 0 736 23 83 17 0 00 0 207348 17680 78556 175868 0 0 0 0 861 21 91 8 0 1


Based on the observed values, we can get the following conclusions:



1, there are a lot of interrupts (in) and less context switching (CS). This means that a single process is generating a request for a hardware device.



2, further shows a single application, User time (US) often at 85% or more. This application should also be processed in the processor, given the less context switching.



3, the running queue is also within acceptable performance range, of which 2 places are beyond the allowable limit.



4.3 Case study: Overload scheduling



In this example, the context switch in the kernel scheduler is saturated


# vmstat 1procs Memory swap io system Cpur b swpd free buff cache si so bi bo in cs us sy wa id2 1 207740 98476 81344 1809  72 0 0 2496 0 900 2883 4 12 57 270 1 207740 96448 83304 180984 0 0 1968 328 810 2559 8 9 83 00 1 207740 94404 85348 180984 0 0 2044 0 829 2879 9 6 78 70 1 207740 92576 87176 180984 0 0 1828 0 689 2088 3 9 78 102 0 207740 91300 88452 180984 0 0 1276 0 565 2182 7 6 83 43 1 207740 90124 89628 180984 0 0 1176 0 551 2219 2 7 91 04 2 207740 89240 90512 180984 0 0 880 52 0 443 907 22 10 67 05 3 207740 88056 91680 180984 0 0 1168 0 628 1248 12 11 77 04 2 207740 86852 92880 180984 0 0 1200 0 6 54 1505 6 7 87 06 1 207740 85736 93996 180984 0 0 1116 0 526 1512 5 10 85 00 1 207740 84844 94888 180984 0 0 892 0 438 155 6 6 4) 90 0


Based on the observed values, we can get the following conclusions:



1, the number of context switches is higher than the number of interrupts, stating that a considerable amount of time in kernel is spent on context switching threads.



2, a large number of context switches will result in unbalanced CPU utilization. It is clear that the percentage of the IO request is actually very high (WA) and the user time percentage is very low (US).



3, because the CPU is blocked on the IO request, there are a number of running queues in the running queue that are waiting to be executed.



Use of 4.4 mpstat tools



If your system is running on a multiprocessor chip, you can use the Mpstat command to monitor each individual chip. The Linux kernel sees a dual-core processor as 2 CPU s, so a dual-core dual-core processor is reported to have 4 CPU s available.



The CPU utilization statistics given by the Mpstat command are roughly and vmstat consistent, but Mpstat can give statistics based on a single processor.


# mpstat–p All 1Linux 2.4.21-20.elsmp (localhost.localdomain) 05/23/2006 05:17:31 PM CPU%user%nice%system%idle intr/s 05:17:32 pm All 0.00 0.00 3.19 96.53 13.2705:17:32 PM 0 0.00 0.00 0.00 100.00 0.0005:17:32 PM 1 1.12 0.00 12.73 86.15 13.2 705:17:32 PM 2 0.00 0.00 0.00 100.00 0.0005:17:32 PM 3 0.00 0.00 0.00 100.00 0.00


4.5 Case Study: underutilized throughput



In this example, the 4 CPU cores are available. Of these, 2 CPUs are mainly processed by the process (CPU 0 and 1). The 3rd core handles all kernel and other system functions (CPU 3). The 4th core is idle (CPU 2).



Using the top command, you can see that there are 3 processes that occupy the entire CPU core almost entirely.


# top-d 1top-23:08:53 up 8:34, 3 users, load average:0.91, 0.37, 0.13tasks:190 Total, 4 running, 186 sleeping, 0 stop PED, 0 zombiecpu (s): 75.2% us, 0.2% sy, 0.0% ni, 24.5% ID, 0.0% wa, 0.0% Hi, 0.0%simem:2074736k total, 448684k used, 1626 052k free, 73756k buffersswap:4192956k total, 0k used, 4192956k free, 259044k cached PID USER PR NI VIRT RES SHR S%cpu MEM time+ COMMAND15957 Nobody 0 2776 280 224 r 20.5 0:25.48 php15959 MySQL 0 2256 280 224 r 38.2 0:17.78 Mys qld15960 Apache 0 2416 280 224 R 15.7 0:11.20 httpd15901 root 0 2780 1092 16 R 1 0.1 0:01.59 Top1 root 0 178 0 660 572 S 0 0.0 0:00.64 init # mpstat–p all 1Linux 2.4.21-20.elsmp (localhost.localdomain) 05/23/2006 05:17:31 PM CPU% User%nice%system%idle intr/s05:17:32 pm All 81.52 0.00 18.48 21.17 130.5805:17:32 PM 0 83.67 0.00 17.35 0.00 115.3105:1 7:32 pm 1 80.61 0.00 19.39 0.00 13.2705:17:32 PM 2 0.00 0.00 16.33 84.66 2.0105:17:32 PM 3 79.59 0.00 21.43 0.00 0.00 05:1 7:32 PM CPU%user%nice%system%idle intr/s05:17:33 pm All 85.86 0.00 14.14 25.00 116.4905:17:33 PM 0 88.66 0.00 12.37 0.00 116.4905:17:33 PM 1 8 0.41 0.00 19.59 0.00 0.0005:17:33 PM 2 0.00 0.00 0.00 100.00 0.0005:17:33 PM 3 83.51 0.00 16.49 0.00 0.00 05:17:33 PM CPU %user%nice%system%idle intr/s05:17:34 pm All 82.74 0.00 17.26 25.00 115.3105:17:34 PM 0 85.71 0.00 13.27 0.00 115.3105: 17:34 pm 1 78.57 0.00 21.43 0.00 0.0005:17:34 PM 2 0.00 0.00 0.00 100.00 0.0005:17:34 PM 3 92.86 0.00 9.18 0.00 0.00 05:17  :%user CPU%nice%system%idle intr/s05:17:35 pm All 87.50 0.00 12.50 25.00 115.3105:17:35 PM 0 91.84 0.00 8.16 0.00  114.2905:17:35 pm 1 90.82 0.00 10.20 0.00 1.0205:17:35 PM 2 0.00 0.00 0.00 100.00 0.0005:17:35 PM 3 81.63 0.00 15.31 0.00 0.00


You can also use the PS command to check which process is consuming which CPU by looking at the PSR column.


# while:; Do Ps-eo Pid,ni,pri,pcpu,psr,comm | grep ' mysqld '; Sleep 1;donepid ni pri%cpu psr COMMAND15775 0 86.0 3 mysqldpid ni PRI%cpu PSR COMMAND15775 0 94.0 3 mysqldpid ni P RI%cpu PSR COMMAND15775 0 96.6 3 mysqldpid NI PRI%cpu PSR COMMAND15775 0 98.0 3 mysqldpid NI PRI%cpu PSR COMMAND1 5775 0 98.8 3 mysqldpid NI PRI%cpu PSR COMMAND15775 0 99.3 3 mysqld


4.6 Conclusion



Monitoring CPU performance consists of the following components:



1, check the system's running queue, and determine the limit of 3 running state threads per processor.



2, determine the user/system ratio in CPU utilization is maintained at 70/30



3, when the CPU spends more time in system mode, that means that it is overloaded and should try to reschedule the priority



4, when I/O processing grows, the application processing of CPU category will be affected



Original: http://www.sanotes.net/html/y2009/370.html



"Editor's recommendation"


    1. Linux Performance monitoring: Introduction to monitoring purposes and tools
    2. Analysis on optimization method of Linux network performance
    3. Dozens of of the most frustrating performance bottlenecks


CPU for Linux system and performance monitoring


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.