Detailed explanation of operating system performance counters

Source: Internet
Author: User

Performance counters are also called performance monitors. How is a person's health condition? We obtain relevant indicators through various health checks, such as blood pressure, heartbeat, and vital volume. In the performance test process, the software and hardware of the entire system are also essential for monitoring. The data obtained by monitoring is also the main basis for analyzing the system performance.

In the entire system, we monitor different indicators for different software and hardware, just like all the personnel in a company, each of whom has different responsibilities, the criteria for judging and assessing are also different. Next we will analyze all aspects of the system.

Operating system performance counters

The operating system monitor mainly monitors the system performance at the operating system level. Here we analyze the most common Windows and Linux operating systems.

Main Performance counters of the window Operating System

Windows OS performance monitoring:

There are many counters in the window system. The main technical tool is as follows:

Linux/Unix
Main Performance counters of the Operating System

There are some differences between Linux commands and uxin. in Unix systems, the main counter monitoring commands are vmstat, iostat, top, SAR, and memory (graphical mode, xserver is required
Supported). in Linux, there is no isostat command. In addition, their output results are slightly different.

The performance counters in Windows and Linux are listed above. To analyze the performance of an operating system, you should check the indicators. Then the carrier of the operating system is the system hardware. Therefore, the performance of the hardware directly affects the performance of the operating system. The following is a brief analysis of the system hardware. CPU, memory, and disk.

CPU Analysis

The CPU Performance plays a leading role in the overall performance of the computer. For early versions of computers or even their CPUs, such
386, 486, Ben San, Ben Si.

Therefore, the most direct evaluation of CPU performance is to view the CPU operating frequency, that is, the CPU clock frequency, in Hz. With the development of CPU, the clock speed is increased from the current GHz

(1 GHz = 1000 MHz = 1000000 kHz = 1000000000Hz)

In addition to clock speed indicators, there are also two concepts closely related to the processor: Frequency Doubling and external frequency. The external frequency is the baseline frequency of the CPU, measured in MHz. The external frequency is the synchronous speed between the CPU and the Main Board. In most computer systems, the external frequency is the synchronous speed between the memory and the Main Board. In this way, it can be understood that the external frequency of the CPU is directly connected to the memory. Achieve the synchronous running status of both, that is, the multiplier between the frequency and the external frequency.

Clock speed
= External frequency *
Multiple

How to analyze the CPU?

1) view system \ % Total processor time
The Count value of the performance counter.

This Count value is used to reflect the overall processing utilization of the server. For multi-processor, this value reflects the average utilization of all CPUs. If the value is greater than 90%
, Indicating that the CPU may be calm.

2) view processor \ % USER time for each CPU

Processor \ % USER time refers to the CPU time consumed by the system's non-core resources. If the value is large, you can use algorithm optimization to reduce the value. If the server is a database server, processor \ % USER
The reason for the large time value is probably that database sorting or function operations consume too much Cup time. You can consider optimizing the database.

3) view processor \ % processor time
And System \ processor Queue Length

View System \ processor Queue Length
Calculator, when the value of this counter is greater than the total number of CPUs plus 1, it indicates that the CPU has a bet. However, processor \ % Processor
The value of time is not necessarily large, so you must check the reason for the CPU bet.

4) view % DPC time

% DPC time
Is another item that requires attention. The lower the Count value, the better. In a multi-CPU system, if the value is greater than 50%
And processor \ % processor time value is very high, consider adding a NIC to improve performance.

Disk I/O Analysis

Hard disks should be the slowest growing devices in computer hardware. Many common bottlenecks are caused by slow reading/writing speed of hard disks. Improving the read/write performance of hard disks is nothing more than increasing the speed, increasing the single-disc capacity, and increasing the cache and update interfaces. Because traditional hard disks are physically rotated to read and write data, it is quite difficult to increase the speed; however, increasing the single-disc capacity also has a technical bottleneck. It takes time for a 1 TB single-disc capacity to break through.

For the traditional Wenshi hard drive, the speed can only reach 120 Mb/s. This speed is really true for reading and writing large files, however, the speed of reading/writing a large number of small files on the server will be surprisingly reduced to less than 1 MB, and the corresponding iops (the number of disk reads/writes per second) will be low, A large amount of data is queued to read from the hard disk to the memory, and the operation is completed with the memory bandwidth. This is why systems with large memory are faster. However, although the memory speed is much faster than the hard disk speed, it also has its fatal disadvantages. Once the power is down, all the data in the memory will be lost.

Iops (input/output per second) is one of the main indicators to measure disk performance. Iops refers to the number of I/O requests that the system can process per second.

Another important metric is throughput, which refers to the number of data records that can be successfully transferred per unit time. For a large number of sequential read/write applications, more attention is given to throughput indicators.

The time taken by a traditional Wenshi hard drive to complete an I/O request includes track time, rotation latency, and data transmission time.

*Seek timeThe time required to move the read/write head to the correct track. At present, the average seek time of the disk is generally 3 ~ 15 ms

* Rotation DelayIt refers to the time required by disk rotation to move the requesting data sector to the bottom of the read/write head. For disks with a rotation speed of 7200, the average rotation frequency is greater than 60.
* 1000/7200/2 = 4.17 Ms

*Data transmission timeIt refers to the time required to complete the data requested for transmission. Currently, Sata II
The interface data transmission rate can reach 300 MB/S. The data transmission time is usually far earlier than the first two parts.

How to analyze disk I/O

1) and
Combine processor/privileged time for analysis.

If
Calculator, only % disk Time
If the value is large and other values are moderate, the hard disk may be a bottleneck. If several values are relatively large, and the value continues to exceed 80%
The memory may leak.

2) According to disk SEC/Transfer
For Analysis

Generally, transfer is defined.
The value is excellent when it is less than 15 milliseconds, ranging from 15 ~ Good in 20 milliseconds, 30 ~ It is acceptable between 60 milliseconds. If it exceeds 60 milliseconds, you need to consider replacing the hard disk or hard disk raid mode. (Note: Different raid calculation methods are also different)

SSD is an electronic device that avoids the time spent on searching and rotating traditional hard disks. The storage unit addressing overhead is greatly reduced because of the high iops.

Memory Analysis

Why can't SSD achieve memory access speed? This is because the implementation principle of the RoM SSD and RAM memory is different.

The memory development speed has reached the fifth-generation DDR memory (usually used on graphics cards), and we usually use the third-generation DDR memory on the motherboard, the main memory indicators are read/write/bandwidth, while the indicators that affect the bandwidth are mainly memory channels and memory frequencies.

The common memory model is ddr3 1333 MHz.
, We can increase the memory bandwidth by replacing the higher frequency or lower time series. (Memory timing is a parameter used to describe the performance of a memory stick)

The memory frequency is better understood. The current "Fever" memory frequency can be 2400 MHz, which is almost doubled from the default frequency of 1333mhz, the bandwidth can be increased from 16 GB to 24 GB. If the time sequence can be reduced, the result will be further improved. (For more information about the concept of memory timing, see other documents ).

Another improvement strategy is the channel. In short, it is to allow multiple memories to interact with the memory controller in parallel, thus doubling the throughput. For friends who know more about memory, the terms dual-channel, three-channel, or even four-channel should not be unfamiliar.

Memory analysis indicators

1) view the memory \ available Mbytes metrics.

This counter is a direct indicator describing the available memory of the system. When analyzing the operating system-level memory of the system, we first use this indicator to create a preliminary impression, check whether the system still has enough memory available during the performance test.

If the metric data is small, the system may encounter memory problems.

2) pages/sec
Pages read/sec and page faults/sec indicators

The operating system often uses disk swap to increase the available memory volume or memory usage efficiency of the system. These three indicators directly reflect the disk swap frequency of the operating system.

If pages/sec
The counter is continuously higher than several hundred, and memory problems may occur. However, a large value of pages/sec does not necessarily indicate memory problems, this may be caused by running programs that use memory ing files. Page faults/sec
The value indicates the number of page failures per second. The more page failures occur, the more times the Operating System reads from the memory. You also need to view pages read/sec
The threshold value of the counter is 5. If the Count value exceeds 5, you can determine that there is a memory problem.

3) analyze the performance bottleneck based on the physical disk counter value

Physical Disk
The Analysis of counters includes pages read/sec and
% Disk Time and average disk Queue Length
. If pages read/sec
Very low. % disk Time and average disk queue length at the same time
A high value may cause a disk bottleneck. However, if the length of the queue increases at the same time, page read/sec
This is because the memory is insufficient.

------------------------------

Due to space issues, process analysis, and network analysis are omitted, which will be supplemented later if necessary.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.