The stress test measures three CPU metrics: CPU utilization, Load average, and context Switch Rate

Last Update:2017-01-13 Source: Internet

Author: User

Tags cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

CPU utilization good understanding, that is, CPU utilization, more than 75% is higher (also has the argument is 80% or higher). In addition to this indicator, but also in combination with the load average and context Switch rate, it is possible that the high CPU is due to the latter two high indicators.

Load Average, this is hard to measure. Online Search a lap, have not seen a few reasonable explanations. I have 100 concurrent user tests. These two values are: 77.534%,6.108,CPU utilization ratio is high, Load average also seems a bit high. Later found the following two blog posts: Understanding load average do stress testing, "load average is the CPU load, it contains information is not CPU usage status, but in a period of time the CPU is processing and waiting for the CPU to process the sum of the statistics of the number of processes, That is, the length of the queue used by the CPU is statistical information. , basically explains the principle of the Multi-process,multi-thread program. Understand the load mean of the Linux processor (translate), simply put it in a word:

Load Average < number of CPUs * *0.7

For example 1 1 nuclear cpu,load Average < 1 * 1 * 0.7 1 4 core Cpu,load Average must < 1 * 4 * 0.7 = 2.8.

To view CPU information: grep ' model name '/proc/cpuinfo

Context Switch Rate. is the process (Thread) switch, if too many switches, the CPU will be busy switching, also can cause impact throughput. The 2nd section of this article, "High-performance server Architecture", is the problem. How much is appropriate? Google has a big circle, without a definitive explanation. The context switch is largely composed of two parts: interrupts and processes (including threads) switching, one interrupt (Interrupt) can cause a single switch, and process (thread) creation, activation, and so on can also cause a switch. The value of CS is also related to TPS (Transaction per Second), assuming that each call will cause N CS, then you can draw

Context Switch Rate = Interrupt Rate + tps* N

The CSR minus IR is the process/thread switch, and if the main process receives the request and gives it to the thread, the thread is processed and returned to the main process, which is the 2-time switch. You can also use the values of CSR, IR, and TPS in the formula to get the number of transitions that each thing causes. Therefore, to reduce the CSR, you must work on the switch caused by each TPS, only n this value down, the CSR can be reduced, ideally n=0, but in any case, if n >= 4, you should check. In addition to the online csr<5000, I think the standard should not be so unitary.

Additional Information:

These three metrics can be monitored in loadrunner, and in Linux, you can also use Vmstat to view R (Load arerage), in (Interrupt) and CS (context Switch)

#vmstat 1 5

Procs--------------Memory-----------------swap-----IO---system------CPU----
R b swpd free buff cache si so bi bo in CS US sy ID WA
0 0 244644 29156 415720 2336484 0 0 1 49 2 1 1 0-98 0
0 0 244644 29140 415720 2336484 0 0 0 28 9 115 0 0-99 1
0 0 244644 29140 415720 2336484 0 0 0 24 62 256 0 0-100 0
0 0 244644 29140 415720 2336484 0 0 0 0 5 93 0 0-100 0
0 0 244644 29140 415720 2336484 0 0 0 0 58 255 0 0-100 0

The Interrupt rate includes the kernel due to the time slice interruption of the process. (In Linux 2.6, the system clock interrupts the clock frequency every 1 milliseconds, expressed in HZ macro, defined as 1000, that is, 1000 interrupts per second.) System is not the same, the kernel is not the same configuration 100, 250 have. ）

The clock frequency of the kernel can be known by the following command

cat/boot/config-' Uname-r ' | grep ' ^config_hz= '

config_hz=100

The total number of clock interrupts per second is = number of CPUs * Number of cores * config_hz

Cat/proc/interrupts

CPU0 CPU1 CPU2 CPU3
loc:97574747 52361843 105207680 69447653 Local timer interrupts
res:107368 257510 98635 186294 rescheduling interrupts
cal:14174 14206 14164 194 function call interrupts
tlb:1007949 853117 992546 591410 TLB shootdowns

You can view the type and number of interrupts

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More