Performance Tuning--CPU Performance analysis

Last Update:2018-07-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1:CPU architecture and how it works 2: Operating Systems and processes
3: Metrics to measure CPU busy
4:CPU resources as a symptom of system performance bottlenecks
5: Which processes are the big players that consume CPU resources?
6: Using the SAR tool to analyze CPU utilization
7: Analyze the running process queue length with the SAR tool
8: Analyzing system calls with SAR tools
9: Test the execution efficiency of a command and program with the time command
10: Use the top command to view the processes that consume the most CPU resources
11: Use the uptime command to view the overall system situation
12: Using Glanceplus to analyze system CPU resource utilization
13: Performance tuning for CPU demand-intensive systems

CPU architecture and how it works

What we call CPU generally refers to the microprocessor, namely microprocessor, generally, the main component of a CPU is:

CPU (central processing Unit)
Cache:cache is high-speed memory, which is typically 10-20 microseconds (NS) access time, so that the CPU can access the cache in one clock cycle, while the average memory access time is 80-90 microseconds (NS), its size has a great impact on the performance of the CPU.
TLB (translation lookaside boffer): TLB is a high-speed cache that is used to store recently accessed virtual addresses and their corresponding physical address pairs so that the TLB can convert the virtual address to a physical address. A TLB is a subset of the system conversion tables in memory; The TLB usually points to a memory page rather than a memory address, and its size has a significant impact on the performance of the CPU.
Coprocessor
Different CPUs typically have different clock frequencies and cache capacity.

The CPU is typically able to fetch an instruction from the cache and execute it within a single clock cycle. Therefore, in theory, as long as the CPU's main frequency, the unit time can execute more instructions. Currently, some CPUs can execute multiple instructions in a single clock cycle, such as PA8500 can execute 4 instructions.
The size of the cache restricts the CPU's execution efficiency, although the CPU is fast, but it does not take the data, then only the empty operation. Therefore, the size of the cache is important, and the cache is divided into data caches and instruction caches, respectively, storing the data and instruction units that are to be executed in advance from memory.
Virtual addressing

In general, the virtual address space in the system is much larger than the physical address space, for example, if the system is 64-bit, then theoretically, its addressable space can reach 2 64 power (2**64=18,447PB), but because of the cost factor, the actual physical memory is only more than 10 GB of memory.

Each process has its own unique virtual address space, however, the running of the process must map the virtual address to the physical address, which requires a combination of TLB, cache, and memory. If the required information is not in memory, it causes a page error.

Pipeline (pipelining)

The TLB and cache attempt to provide the CPU with the information it needs in a clock cycle, however, the process is 100% utilization, and for the CPU it must first use a clock cycle to fetch the next instruction, and then a clock cycle to execute the instruction, so that the CPU utilization is only 50%. In order to make the CPU busier, the usual approach is to use pipelining. For example, PA8500 is a 7-step pipeline.

Operating systems and processes

HP-UX A multi-user, multitasking Unix operating system. Its performance depends on the number of users, the type of user tasks, the configuration of hardware/software components.

HP-UX has two levels of operation:

User level: System users can interoperate with the operating system, such as running apps and system commands. The user level accesses the kernel level through the system invoke interface.
Kernel level: The operating system automatically runs a number of functions, which operate primarily on the hardware.
In the operating system, the user program runs in a process manner. The status of the process has the following types:

Srun
Ssleep
Szomb
Sidl
Sstop
Scheduling of CPUs

Once the process requires data to be transferred into memory, it waits for the CPU dispatcher to allocate CPU time. Typically, in HP-UX, each process can be run with a fixed time slice, which is one-tenth seconds (1/10 seconds) long.

Because HP-UX is a multitasking operating system, it requires a means to process the order of execution, which is the interruption. In the system, the clock interrupt processor is the system software used to handle clock interrupts. Specifically, it collects systems and accounting statistics and does context switching. System performance is also related to the frequency with which such interrupts occur.

Process-Priority

Each process has its own priority;
Real-time Priority: -32~127, if a process wants to run in real-time priority, it must be set with the command #rtprio;
CTSS priority: 128~177;
Time-sharing user priority: 178~251;
Priority: 252~255 is used by the system as virtual memory management priorities for process deactivation.
The time-sharing process at the initial priority is assigned by the system and is a fixed value. The user can change the priority of the time-sharing process by changing the nice value of the process. Because the process will follow its execution, the priority of the nice value will be lowered, and when it waits for execution, it will increase its priority with the nice value. The system missing value for the nice value is 20.
In the process of system performance analysis, I care not only how much time it takes to complete a process, but also where it is spent and how much time it is.

Metrics to measure the CPU's busy level

To analyze whether the system's CPU resources are sufficient, who occupies the CPU resources, how much, how long. Here are some of the durable metrics that measure the CPU's busy level:

1) Usage of CPU by user

CPU runs regular user processes
CPU Running niced Process
CPU running real-time process
2) CPU usage of the system

For system calls
For I/O Management: interrupts and drives
For memory management: paging and swapping
For process management: Context switch and process start
3) WIO: The rate at which the CPU is idle due to the process waiting for I/O, these I/O mainly refer to the block I/o,raw I/O,VM Paging/swapins;

4) CPU idle rate, that is, except above the wio idle situation;

5) CPU ratio for context switching (context switch CPU utilization)

6) Nice

7) Real-time

8) The length of the running process queue, that is, the number of processes in the operational state, but we are concerned about the time it takes to wait for the CPU to schedule execution;

9) Average load (load average)

CPU resources become a symptom of system performance bottlenecks

The CPU is like a human brain, accomplishing various tasks entrusted to it. If the task is too many, the CPU will not be busy, and its efficiency will be reduced. Just as a person's illness can have a typical symptom, when CPU resources become a bottleneck in system performance, it also has some typical symptoms:

Very slow response times (slow response time)
CPU idle time is 0 (zero percent idle CPU)
Excessive user consumption CPU time (high percent user CPU)
Excessive CPU time (high percent system CPU)
Long running process queue (large run queue size sustained over time)
Processes blocked on prority
It must be noted that if the system appears above these symptoms are not necessarily due to insufficient CPU resources, the fact that some of the symptoms are likely due to the lack of other resources, such as memory is not enough, the CPU will be busy memory management, then from the surface, the use of the CPU is 100%, even not enough, It would be a big mistake to simply think that adding CPUs would solve the problem.

Therefore, it is the same sentence, must use different tools, from different aspects of the system analysis, can make a conclusion, even so, experience will play an irreplaceable role.

Which processes are the big players that consume CPU resources?

In the operating system, not all processes use CPU resources in the same way. Typically, some processes require more CPU time slices than other processes to successfully complete the task. Here are some of the typical CPU-intensive resources:

Process Creation (creation)
Terminal character process (teminal character processes (Mux-and lan-based)
compute-intensive processes and real-time processes
X-terminal and X-server processes (X-terminals and x-servers)

Using the SAR tool to analyze CPU utilization

The command form of using SAR for CPU utilization analysis:

#sar-U, when the data is generated by SA1 in the background;
#sar-U 5 100, sampling every 5 seconds, 100 times;
Sar-u:report CPU Utilization (the default); portion of time running in one of several modes. On a multi-processor system, if the-m option was used together with THE-U option, PER-CPU utilization as well as the aver Age CPU utilization The processors is reported. If the-m option is a used, only the average CPU utilization of all the processors are reported:

CPU:CPU number (only in a multi-processor system with THE-M option);
%usr:user mode;
%sys:system mode;
%wio:idle with some process waiting for I/O (only block I/O, raw I/O, or VM pageins/swapins indicated);
%idle:otherwise Idle;
Analysis of the results

First, we look at the value of the%idle column, if it is close to 0, then look at the value of the corresponding%wio column, if this column is greater than 7, it indicates that the system disk or other I/O may have a problem, need further analysis:

Use the Iostat command to analyze the transport busy status of each disk, such as #iostat-t 5 2, sampling every 5 seconds, 2 times;
Analyze the activity of each block device (disk, tape) with sar-d command;
Using the Sar-b command to analyze the cache activity of the system;
Analyze the process's deactivation/reactivation and switching activities of the system with the SAR-W command;
If the%idle column is small and the value of the corresponding%wio column is small, then we look at the values of the%USR column and the%sys column. If the value of the%USR column is large, the user process consumes a lot of CPU time, and if the value of the%sys column is large, the system administration takes a lot of time. Further analysis is required:

Use Glanceplus to analyze the process that consumes the most CPU time separately, why it consumes so much CPU time.
If the value of the%sys column is large, you can use the sar-c command to further decompose the system calls to see what these system calls are primarily doing. At the same time, you must also analyze whether there are other bottlenecks, such as paging also cause%sys value is very large, when you can use Sar-q to view the system's running process queue Length, you can also use Glanceplus and vmstat to view memory usage;

Analyzing the running Process queue length using the SAR tool

command form for running process Queue Length analysis using SAR:

#sar-Q, when the data is generated by SA1 in the background;
#sar-Q 5 100, sampling every 5 seconds, 100 times;
Sar-q: report average queue Length while occupied, and percent of time occupied. On a multi-processor machine, if the-m option was used together with THE-Q option, the PER-CPU run queue as well as the A Verage Run queue of all the processors is reported. If the-m option isn't used, only the average run queue information of all the processors are reported:

CPU:CPU number (only in a multi-processor system with THE-M option);
Runq-sz:average length of the run queue (s) of processes (in memory and runnable);
%runocc:the percentage of time the run queue (s) were occupied by processes (in memory and runnable);
Swpq-sz:average length of the swap queue of runnable processes (processes swapped out and ready to run);
%swpocc:the percentage of time the swap queue of runnable processes (processes swapped out, but ready for run) was occupied .
Analysis of the results:

The smaller the data, the better.

If Runq-sz is greater than 4, or if%SWAPOCC is greater than 5 o'clock, there may be a problem with the CPU or memory of the system and further analysis is required:

Use the Sar-u command to analyze CPU usage;
Analyze the process's deactivation/reactivation and switching activities of the system with the SAR-W command;
can also use Glanceplus;

Using the SAR tool to analyze system calls

The command form for system invocation analysis using SAR:

#sar-C, when data is generated by SA1 in the background;
#sar-C 5 100, sampling every 5 seconds, 100 times;
SAR-C: report System calls:

SCALL/S: Number of system calls of all types per second;
SREAD/S: Number of Read () and/or READV () system calls per second;
SWRIT/S: Number of Write () and/or Writev () system calls per second;
Swpq-sz:average length of the swap queue of runnable processes (processes swapped out and ready to run);
FORK/S: Number of fork () and/or vfork () system calls per second;
EXEC/S: Number of exec () system calls per second;
RCHAR/S: Number of characters transferred by the read system calls block devices only) per second;
WCHAR/S: Number of characters transferred by write system calls (block devices only) per second.
Analysis of the results:

If the value of the scall/s column is large, then the reason for so many system calls must be analyzed carefully.

We can look at the values of the fork/s and EXEC/S columns to see if the system is creating a lot of new processes.

Use the time command to test the execution efficiency of a command and program

We can use the time command to test the execution efficiency of a command with the following syntax:

Time command

command is executed. Upon completion, time prints the elapsed time during the command, the time spent in the system, and the time spent Executi ng the command. Times is reported in seconds.

Execution time can depend on the performance of the "Memory in which" is running.

When we think that the performance of a process is poor, the simplest way is to use the time command to look at its temporal distribution when the process executes, and then analyze it further with other tools.

Use the top command to view the processes that consume the most CPU resources

We can use the top command to see the process that consumes the most CPU resources. The top command also changes dynamically depending on how much CPU resources the process consumes.

The syntax for this is:

Top [-S time] [-D count] [-Q] [-u] [-h] [-N number]

The meanings of each of these options are:

-S time: The screen refreshes at an interval of 5 seconds by default;
-D Count: After the screen refreshes count times, the top command exits itself;
-q:this option runs the top program at the same priority as if it was executed via a nice-20 command so that it would exec Ute faster (see Nice (1)). This can being very useful in discovering any system problem if the system is very sluggish. This option was accessibly only to the users who had appropriate privileges.
-u:user ID (UID) numbers is displayed instead of usernames. This improves execution speed is eliminating the additional time required to map UID numbers to user names.
-h:hides the individual CPU state information for systems has multiple processors. Only the average CPU status would be displayed.
-N number:show only number processes per screen. Note that this option was ignored if number is greater than the maximum number of processes so can be displayed per scree N.
When the top command is running, we can flip the screen using the following shortcut keys:

J: Turn forward;
K: Turn backward;
T: Back to the first page;
Analysis of the results:

With the top command, we can quickly understand the current CPU resource usage of the system, especially the process that consumes the most CPU resources is the object we must pay attention to.

We can know the amount of memory consumed by each process through res (the current size of the process resident in memory) column.

We can tell by the Nice column whether the system uses the Nice value to adjust the workload balance for that process.

Use the uptime command to view the overall system situation

Uptime Prints the current time, the length of the system have been up, the number of the users logged the system, an D The average number of jobs in the run queue over the last 1, 5, and minutes.

W is linked to uptime and prints the same output as uptime-w, displaying a summary of the current activity on the system.

The syntax for this is:

uptime [-HLSUW] [user]

w [-HLSUW] [user]

The meanings of each of these options are:

-h:suppress the first line and the heading line. This option should is used with THE-U option. This option assumes the use of the-w option to uptime.
-l:use long output. This option assumes the use of the-w option to uptime.
-s:use the short form of output for displaying terminal information. The terminal name is abbreviated; The login time and CPU times are suppressed.
-u:print the first line describing the overall state of the system. The the default for the uptime command.ormation for systems has multiple processors. Only the average CPU status would be displayed.
-w:print a summary of the current activity in the system for each user. The default for the W command.

Using Glanceplus to analyze system CPU resource utilization

With HP's Glanceplus tool, you can analyze the overall situation of a process and a single process in detail.

1) Analysis of the overall use of the CPU:

Enter Glanceplus;
Press the key to enter the online Help interface;
Press the C key to enter the CPU's detailed interface;
Press the B key to page back, press the F key to page forward;
With CPU Detail screen, we can know the distribution of CPU time, how much users use, how much the system uses.

2) Analysis of CPU resource usage for a single process:

Enter Glanceplus;
Press the key to enter the online Help interface;
Press the G key to enter the process list interface;
Press the S key to enter the process selection interface, usually the busiest process as the default process;
Enter the process number you want to view;
Press the B key to page back, press the F key to page forward;
In the analysis of a single process, we typically focus on the following values:

CPU usage;
User CPU;
System CPU;
priority;
Logical and physical Reads and writes;
Total Rss/vss;
Blocked on (obtained by pressing shift+>);

Performance tuning for CPU demand-intensive systems

1) Hardware-based approach:

Upgrade to a faster CPU;
Upgrade to a larger cache;
Increase the number of CPUs;
Distribution of applications to multiple systems;
Use no intertwining point;
Add floating-point processor;
2) Software-based approach:

Run batches at not peak hours;
Nice umimportant application;
Use the Rtpio command to help important applications;
Use the Plock command to help important applications;
Turn off System accounting;
Consider using Taskbroker or DCE;
optimize applications;
Consider using Process Resource Manager, but the PRM is only available on the HP-UX platform.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Performance Tuning--CPU Performance analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Performance Tuning--CPU Performance analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support