Linux performance monitoring and tuning (CPU)

Source: Internet
Author: User
In fact, there are many articles on this part on the Internet. Why is this article still available? There are several reasons why it is the motivation of my translation. First, although concepts and content are old-fashioned, but they are all very thorough and comprehensive. second, combining theory with practice, the case analysis is good. third, it is not fancy. The tools and commands used are the most basic and helpful for actual operations. however, I am not very familiar with the original text. Most of the translations are based on my understanding of the original text. You can also go to oscan to find the original text. If there is any major difference, I would like to send a message to you. I am very grateful!


1.0 performance monitoring

Performance optimization is to find the bottlenecks in system processing and remove these processes. Most administrators believe that performance optimization can be achieved by reading related "cook book, some kernel configurations can solve the problem simply, but are not suitable for each environment. performance optimization is actually a balanced definition of the various OS subsystems. These subsystems include:

CPU

Memory

Io

Network

These subsystems are mutually dependent, and any high load will cause problems in other subsystems. For example:

Memory queue congestion caused by a large number of page call requests

The large throughput of the NIC may cause more CPU overhead.

A large amount of CPU overhead will try more memory usage requests

A large number of disk write requests from memory may cause more CPU and IO problems.

Therefore, to optimize a system, it is critical to find out the bottleneck. Although it seems that a sub-system has a problem, it may be caused by other subsystems.

1.1 determine the application type

First of all, it is important to understand and analyze the features of the current system. Most systems run two types of applications:

Io bound: Applications in this category are generally high-load memory usage and storage systems, which actually represent applications in the IO category, is a process of massive data processing. i/O applications do not initiate more requests to the CPU or network (unless network storage hardware such as NAS ). i/O applications usually use CPU resources to generate IO requests and enter the sleep status of kernel scheduling. usually database software (: MySQL, Oracle, etc.) is considered to be the application type in the IO category.

CPU bound: Applications in this category are generally high-load CPU usage. the Application of CPU is a process of batch processing of CPU requests and mathematical computing. generally, web server, mail server, and other types of services are considered as CPU applications.

1.2 determine baseline statistics

The system utilization rate is generally determined based on the Administrator's experience and the purpose of the system. The only thing you need to know is what effect the system optimization hopes to achieve, what aspects need optimization, and what reference values are? Therefore, a baseline is established. The statistical data must be the available system performance status value to compare the unavailable Performance Status values.

In the following example, a baseline snapshot of system performance is used to compare the system performance snapshots at high loads.

# Vmstat 1

Procs memory swap Io system CPU

R B SWPD free buff cache Si so Bi Bo in CS us Sy wa ID

1 0 138592 17932 126272 214244 0 0 1 18 109 19 2 1 96

0 0 138592 17932 126272 214244 0 0 0 105 46 0 1 0 99

0 0 138592 17932 126272 0 0 0 214244 62 40 14 0 45

0 0 138592 17932 126272 0 0 0 214244 49 0 0 117

0 0 138592 17924 126272 214244 0 0 176 220 3 4 13 80

0 0 138592 17924 126272 214244 0 0 0 358 8 17 0 75

1 0 138592 17924 126272 214244 0 0 0 368 4 24 0 72

0 0 138592 17924 126272 214244 0 0 0 352 9 12 0 79

# Vmstat 1

Procs memory swap Io system CPU

R B SWPD free buff cache Si so Bi Bo in CS us Sy wa ID

2 0 145940 17752 118600 215592 0 1 1 18 109 19 2 1 96

2 0 145940 15856 118604 215652 0 0 468 789 86 14 0 0

2 0 146388 13764 118600 213788 0 340 340 41 87 13 0 0

2 0 147092 13788 118600 212452 0 740 1324 61 92 8 0 0

2 0 147912 13744 118192 210592 0 720 720 605 44 95 5 0 0

2 0 148452 13900 118192 209260 0 372 372 639 45 81 19 0 0

2 0 149132 13692 117824 208412 0 372 372 47 90 10 0 0

From the first result above, we can see that the last column (ID) represents the idle time. We can see that during baseline statistics, the CPU idle time is between 79% and 100%. the second result shows that the system is in the 100% usage and there is no idle time. from this comparison, we can determine whether the CPU usage should be optimized.

2.0 install monitoring tools

Most * nix systems have a bunch of standard monitoring commands. these commands are part of * nix from the very beginning. in Linux, other monitoring tools are provided through basic installation packages and additional packages. Most of these installation packages are available in various Linux versions. although there are other open-source and third-party monitoring software, this document only discusses monitoring tools based on the Linux version released.

This chapter describes the tools used to monitor system performance.

Tool description base repository

Vmstat all purpose performance tool Yes

Mpstat provides statistics per CPU No Yes

SAR all purpose performance monitoring tool No Yes

Iostat provides disk statistics no Yes

Netstat provides network statistics Yes

Dstat monitoring statistics aggregator no in most distributions

Iptraf traffic monitoring dashboard No Yes

Netperf network bandwidth tool no in Some Distributions

Ethtool reports on Ethernet interface configuration Yes

Iperf network bandwidth tool No Yes

Tcptrace packet analysis tool No Yes

3.0 CPU Introduction

CPU utilization mainly depends on what resources are trying to access. the kernel scheduler will be responsible for scheduling two types of resources: threads (single or multi-channel) and interruptions. the scheduler defines different priorities of different resources. the following lists are sorted from high priority to low priority:

Interrupts-the device notifies the kernel of a data processing process. For example, when a NIC device delivers a network packet or a hardware device provides an IO request.

Kernel (system) processes (annotation: Kernel processing process)-All kernel processing processes are control priorities.

User processes (: User process)-This section involves "userland". All software programs run in this user space. This section is of low priority in the kernel scheduling mechanism.

From the above, we can see how the kernel manages different resources. there are also several key topics to introduce. The following sections will introduce context, run queues and utilization ).

3.1 Context switching

Most modern processors can run a process (single thread) or thread. multi-channel hyper-threading processors can run multiple threads. however, the Linux kernel uses the dual-core chip of each processor as an independent processor. for example, a Linux kernel system is displayed as two independent processors on a dual-core processor.

A standard Linux kernel can run 50 to 50,000 processing threads. when there is only one CPU, the kernel will schedule and balance each process thread. each thread is assigned a time limit for overhead in the processor. A thread can either obtain the time limit or obtain some threads with a higher priority (such as hardware interruption). Threads with a higher priority will be replaced from the region to the processor queue. the Conversion Relationship of this thread is the context switch we mentioned.

During each kernel context switch, resources are used to shut down threads in the CPU register and put in the queue. there are more context switches in the system, and the kernel will get more work under the scheduling management of the processor.

3.2 running queue

Each CPU maintains a thread running queue. in theory, the scheduler should continuously run and execute threads. the process thread is either in sleep or running state. if the CPU subsystem is under high load, it means that the kernel scheduling will not be able to respond to system requests in a timely manner. as a result, processes in the running status are congested in the running queue. when the running queue is getting bigger and bigger, the process thread will spend more time to get it to be executed.

The popular term is "LOAD", which provides the detailed status of the current running queue. system load refers to the combination of a few threads in the CPU queue and the number of processes in the queue. if a dual-core system executes two threads and four other threads are in the running queue, load should be 6. the load averages shown in the top program refer to the load conditions within 1, 5, 15 minutes.

3.3 CPU utilization

CPU utilization is the percentage of CPU usage. The most important measurement method for evaluating the system is the CPU usage. Most performance monitoring tools have the following types of CPU utilization:

User time (: User process time)-Percentage of CPU overhead time of processes executed in user space.

System time.

Wait io (: Io request wait time)-Percentage of CPU overhead idle time occupied by all process threads being blocked and waiting for an IO request to be completed.

Idle (idle)-percentage of time a completely idle process can be sold in a CPU processor.

4.0 CPU performance monitoring

Understand the relationship between running queues, utilization, and context switching on how to optimize CPU performance. as mentioned earlier, performance is relative to baseline data. in some systems, the expected performance includes:

Run queues-each processor should run a queue of no more than 1-3 threads. For example, a dual-core processor should run a queue of no more than 6 threads.

CPU utiliation-if a CPU is fully used, the balanced ratio between utilization classifications should be

65%-70% user time

30%-35% system time

0%-5% idle time

Context switches-the number of context switches is directly related to the CPU usage. If the CPU usage remains balanced, a large number of context switches are normal.

Many Linux tools can obtain these status values, namely, vmstat and top.

4.1 Use of vmstat

The vmstat tool provides a low-cost system performance observation method. because vmstat itself is a low-cost tool, on a very high-load server, you need to view and monitor the health of the system. In the control window, you can still use vmstat to output results. this tool runs in two modes: Average and sample. the sample mode measures the status value by specifying the interval. this mode is helpful for understanding the performance under continuous load. below is

Example of one-second interval of vmstat running:

# Vmstat 1

Procs ----------- memory ---------- --- swap -- ----- Io ---- System -- ---- CPU ----

R B SWPD free buff cache Si so Bi Bo in CS us Sy ID wa

0 0 104300 16800 95328 72200 0 0 5 26 7 14 4 1 95 0

0 0 104300 16800 95328 0 0 0 24 72200 64 1 1 98 0

0 0 104300 16800 95328 0 0 0 0 72200 59 1 1 98 0

Table 1: The vmstat CPU statistics

Field description

R The amount of threads in the run queue. These are threads that are runnable, but the CPU is not available to execute them.

The number of threads in the current running queue. It indicates that the thread is in a running state, but the CPU has not been executed yet.

B This is the number of processes blocked and waiting on IO requests to finish.

Number of times the current process is blocked and waits for the completion of IO requests

In this is the number of interrupts being processed.

Number of currently interrupted Processes

CS this is the number of context switches currently happening on the system.

Number of context switches in the current Kernel System

Us this is the percentage of user CPU utilization.

Percentage of CPU utilization

Sys this is the percentage of kernel and interrupts utilization.

Percentage of kernel and interrupt Utilization

Wa this is the percentage of idle processor time due to the fact that all runnable threads are blocked waiting on Io.

Percentage of all running threads blocked waiting for IO requests

ID this is the percentage of time that the CPU is completely idle.

Percentage of CPU idle time

4.2 Case Study: continuous CPU utilization

In this example, the system is fully utilized.

# Vmstat 1

Procs memory swap Io system CPU

R B SWPD free buff cache Si so Bi Bo in CS us Sy wa ID

3 0 206564 15092 80336 176080 0 0 0 718 26 81 19 0 0

2 0 206564 14772 80336 176120 0 0 0 758 23 96 4 0 0

1 0 206564 14208 80336 176136 0 0 0 820 20 96 4 0 0

1 0 206956 13884 79180 175964 0 412 2680 1008 80 93 7 0 0

2 0 207348 14448 78800 175576 0 412 412 763 70 84 16 0 0

2 0 207348 15756 78800 175424 0 0 0 874 89 11 0 0

1 0 207348 16368 78800 175596 0 0 0 940 24 86 14 0 0

1 0 207348 16600 78800 175604 0 0 0 929 27 95 3 0 2

3 0 207348 16976 78548 0 0 0 175876 35 93 7 0 0

4 0 207348 16216 78548 175704 0 0 0 874 36 93 6 0 1

4 0 207348 16424 78548 175776 0 0 0 850 26 77 23 0 0

2 0 207348 17496 78556 175840 0 0 0 736 23 83 17 0 0

0 0 207348 17680 78556 0 0 0 175868 21 91 8 0 1

Based on the observed values, we can draw the following conclusions:

1. There are a large number of interruptions (in) and fewer context switches (CS). This means that a single process is generating requests to hardware devices.

2. It is further shown that the user time (US) of a single application is often 85% or more. Considering a small number of context switches, this application should still be processed in the processor.

3. The running queue is still within the acceptable performance range. Two of them are beyond the permitted limits.

4.3 Case Study: overload Scheduling

In this example, context switching in kernel scheduling is saturated.

# Vmstat 1

Procs memory swap Io system CPU

R B SWPD free buff cache Si so Bi Bo in CS us Sy wa ID

2 1 207740 98476 81344 180972 0 0 2496 900 2883 4 12 57 27

0 1 207740 96448 83304 180984 0 0 1968 328 810 8 9 83 0

0 1 207740 94404 85348 180984 0 0 2044 829 9 6 78 7

0 1 207740 92576 87176 180984 0 0 1828 689 3 9 78 10

2 0 207740 91300 88452 180984 0 0 1276 565 2182 7 6 83 4

3 1 207740 90124 89628 180984 0 0 1176 551 2 7 91 0

4 2 207740 89240 90512 180984 0 0 880 520 443 22 10 67 0

5 3 207740 88056 91680 180984 0 0 1168 628 12 11 77 0

4 2 207740 86852 92880 180984 0 0 1200 654 6 7 87 0

6 1 207740 85736 93996 180984 0 0 1116 526 5 10 85 0

0 1 207740 84844 94888 180984 0 0 892 438 6 4 90 0

Based on the observed values, we can draw the following conclusions:

1. The number of context switches is higher than the number of interruptions, indicating that a considerable amount of time in the kernel overhead is in the Context switching thread.

2. A large number of context switches will result in unbalanced CPU utilization classification. Obviously, the percentage of requests waiting for Io (WA) is very high, and the percentage of user time is very low (us ).

3. Because the CPU is blocked on IO requests, a considerable number of runnable threads in the running queue are waiting for execution.

4.4 use of the mpstat Tool

If your system runs on a multi-processor chip, you can use the mpstat command to monitor each independent chip. the Linux kernel depends on the dual-core processor as 2 CPUs. Therefore, the dual-core processor reports that 4 CPUs are available.

The CPU utilization statistical value provided by the mpstat command is roughly the same as that provided by vmstat, but mpstat can provide a statistical value based on a single processor.

# Mpstat-P all 1

Linux 2.4.21-20. elsmp (localhost. localdomain) 05/23/2006

05:17:31 CPU % USER % nice % System % idle intr/s

05:17:32 pm all 0.00 0.00 3.19 96.53 13.27

05:17:32 PM 0 0.00 0.00 0.00 100.00 0.00

05:17:32 pm 1 1.12 0.00 12.73 86.15

05:17:32 PM 2 0.00 0.00 0.00 100.00 0.00

05:17:32 PM 3 0.00 0.00 0.00 100.00 0.00

4.5 Case Study: insufficient processing capacity

In this example, 4 CPU cores are available. two of the CPU processes run (CPU 0 and 1 ). the 3rd cores process all kernels and other system functions (CPU 3 ). 4th cores are in idle (CPU 2 ).

Using the TOP Command, we can see that three processes occupy almost the entire CPU core.

# Top-D 1

Top-23:08:53 up, 3 users, load average: 0.91, 0.37, 0.13

Task: 190 total, 4 running, 186 sleeping, 0 stopped, 0 zombie

CPU (s): 75.2% us, 0.2% Sy, 0.0% Ni, 24.5% ID, 0.0% wa, 0.0% hi, 0.0%

Si

Mem: 2074736 k total, 448684 K used, 1626052 K free, 73756 K Buffers

Swap: 4192956 k total, 0 K used, 4192956 K free, 259044 K cached

PID user PR Ni virt res shr s % CPU % mem time + command

15957 nobody 25 0 2776 280 R 224 100. 48 PHP

15959 MySQL 25 0 2256 280 R 224 100. 78 mysqld

15960 Apache 25 0 2416 280 R 224 100. 20 httpd

15901 root 16 0 2780 1092 800 r 1 0.1. 59 top

1 root 16 0 1780 660 572 S 0 0.0. 64 init

# Mpstat-P all 1

Linux 2.4.21-20. elsmp (localhost. localdomain) 05/23/2006

05:17:31 CPU % USER % nice % System % idle intr/s

05:17:32 pm all 81.52 0.00 18.48 21.17 130.58

05:17:32 PM 0 83.67 0.00 17.35 0.00 115.31

05:17:32 pm 1 80.61 0.00 19.39 0.00

05:17:32 PM 2 0.00 0.00 16.33 84.66 2.01

05:17:32 PM 3 79.59 0.00 21.43 0.00 0.00

05:17:32 CPU % USER % nice % System % idle intr/s

05:17:33 pm all 85.86 0.00 14.14 25.00 116.49

05:17:33 PM 0 88.66 0.00 12.37 0.00 116.49

05:17:33 pm 1 80.41 0.00 19.59 0.00

05:17:33 PM 2 0.00 0.00 0.00 100.00 0.00

05:17:33 PM 3 83.51 0.00 16.49 0.00 0.00

05:17:33 CPU % USER % nice % System % idle intr/s

05:17:34 pm all 82.74 0.00 17.26 25.00 115.31

05:17:34 PM 0 85.71 0.00 13.27 0.00 115.31

05:17:34 pm 1 78.57 0.00 21.43 0.00

05:17:34 PM 2 0.00 0.00 0.00 100.00 0.00

05:17:34 PM 3 92.86 0.00 9.18 0.00 0.00

05:17:34 CPU % USER % nice % System % idle intr/s

05:17:35 pm all 87.50 0.00 12.50 25.00 115.31

05:17:35 PM 0 91.84 0.00 8.16 0.00 114.29

05:17:35 pm 1 90.82 0.00 10.20 0.00

05:17:35 PM 2 0.00 0.00 0.00 100.00 0.00

05:17:35 PM 3 81.63 0.00 15.31 0.00 0.00

You can also run the ps command to check which process is occupying the CPU by checking the column of the "SRS.

# While:; do PS-eo pid, Ni, PRI, pcpu, PSR, comm | grep 'mysqld '; sleep 1;

Done

PID Ni pri % cpu psr command

15775 0 15 86.0 3 mysqld

PID Ni pri % cpu psr command

15775 0 14 94.0 3 mysqld

PID Ni pri % cpu psr command

15775 0 14 96.6 3 mysqld

PID Ni pri % cpu psr command

15775 0 14 98.0 3 mysqld

PID Ni pri % cpu psr command

15775 0 14 98.8 3 mysqld

PID Ni pri % cpu psr command

15775 0 14 99.3 3 mysqld

4.6 conclusion

CPU performance monitoring consists of the following parts:

1. Check the running queue of the system and determine not to exceed the limit of three runable threads for each processor.

2. Check whether the user/system ratio is 70/30 in CPU utilization.

3. When the CPU overhead is more time in system mode, it indicates that it is overloaded and the priority should be rescheduled.

4. When I/O processing increases, application processing in the CPU category will be affected.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.