Understanding load average doing stress testing

Source: Internet
Author: User
Tags memcached memory usage cpu usage

The not noticed indicator is the load average, he and I started just pay attention to the CPU, memory usage, but not too much attention to this indicator, this indicator and their usual limit (10 or so) there is a difference. The results of the test were so low that the final report was not as good as it used to be. I did not go into the stress test, but felt that it would have an impact on the future machine configuration and expansion, so to ask the DBA and SA, the results are very different, it seems that they have to find the root of the problem.

Through the following sections, we can find out the real role of the load average in the stress test step-by-step.

CPU Time Slice

In order to improve the efficiency of program execution, we use multithreading mode in many applications, which can transform the original serialization execution into parallel execution, and the decomposition of the task and the parallel execution can greatly improve the running efficiency of the program. But this is all code-level performance, and how hardware is supported. That depends on the CPU's time slice pattern to illustrate all this. The execution of any instruction of a program is often to compete with the most valuable resource of the CPU, no matter how many threads your program divides into different tasks, they have to wait in line to get the resource to compute and process the command. First look at the single CPU situation. The following two diagrams describe the execution of threads in both the time slice mode and the non-time slice mode:


Figure 1 Non-time slice thread execution


Figure 2 Non-time slice thread execution

As you can see in figure I, if any thread is queued for CPU resource acquisition, then the so-called multithreading has no practical significance. The CPU manager in Figure II is just one of my virtual roles, which allocates and manages CPU usage, and multithreading will have the opportunity to get CPU resources in the process of running, and realize multithreading parallel processing in single CPU.

Multi-CPU is only the expansion of a single CPU, when all the CPU is operating at full capacity, it will be every CPU using time slices to improve efficiency.

During Linux kernel processing, each process defaults to a fixed time slice to execute the command (default 1/100 seconds), during which time the process is allocated to the CPU and then exclusively used. If used up, while not to the time slice of the specified time, then take the initiative to abandon the CPU occupation, if the time slice has not completed the work, then the right to use the CPU will be retracted, the process will be interrupted pending waiting for the next time slice.

the difference between CPU utilization and load average

Stress testing requires not only the simulation of the concurrent user pressure parameters of the business scene, but also the performance of the machine at all times during the stress testing process to ensure the effectiveness of the stress test. When the server is in an overloaded situation for a long time, the pressure that can be received is not what we think is acceptable pressure. Just as a project manager makes a person work 12 hours a day when he is estimating a workload, the project plan is not a reasonable plan, and the person will collapse sooner or later, affecting the overall project schedule.

CPU utilization In the past is often considered by our laymen as a criterion for judging whether a machine has reached a full load, and seeing a 50%-60% usage rate means that the machine has been pressed to the critical level. CPU utilization, as the name implies is the use of the CPU, which is a time period of CPU usage statistics, through this indicator can be seen in a certain period of time CPU occupied, if the time is very high, then need to consider whether the CPU is already overloaded operation, Long-term overload operation is a damage to the machine itself, so the CPU utilization must be controlled in a certain proportion to ensure the normal operation of the machine.

The load average is the CPU's load, which contains information that is not a CPU usage condition, but rather a statistic of the number of processes that the CPU is processing over time and waiting for the CPU to process, that is, the length of the queue used by the CPU. Why do you want to count this information, and what is the impact of this information on stress testing, an analogy to explain the difference between CPU utilization and load average and the implications for stress testing?

We'll analogy the CPU to a phone booth, and each process is a person who needs to call. There are now 4 telephone booths (just like our machines have 4 cores) and 10 people need to call. The rule for using the phone now is that the administrator will give each person a 1-minute use of the phone in the order of time, if the user is finished in 1 minutes, then the right to use the telephone can be returned to the administrator immediately, if the telephone user has not finished using the 1 minutes, then need to be queued again, waiting to be allocated again.


Figure 3 Phone usage scenarios

In the image above, the user who uses the telephone is sorted again, 1min represents these users to occupy the telephone time is less than equal to 1min,2min means the user occupies the telephone time is less than equal to 2min, and so on. According to the rules of the telephone use, 1min users only need to get one allocation to complete the call, while the other two types of users need to queue two to three times.

utilization of the phone = SUM (active use CPU time)/period

Each user assigned to the phone uses the sum of the phone time to remove the time period with statistics. The point to note here is the sum of time spent using the phone (sum (active use CPU times), which differs from the sum of elapsed time (sum (occupy)). (for example, a user gets a one-minute right to use, call in 10 seconds, then go to the query number for 20 seconds, and then use the remaining 30 seconds to make another call, then occupy the phone 1 minutes, actually only use 40 seconds)

The average load of a telephone reflects the average number of people who use the phone and those who wait for a call assignment during a certain statistical period.

Telephone utilization statistics can reflect the use of the telephone, when the phone is in the long run without enough time to rest, then for the telephone hardware is an overloaded operation, need to adjust the frequency of use. While the phone Average load from another angle to show the description of the use of the phone, the higher the Average load indicates the more intense competition for telephone resources, telephone resources are relatively short. The application and maintenance of resources is also a great cost, so in this high average load situation, the long-term "hot competition" of telephone resources is also a kind of damage to the hardware.

Will there be a high load average in the case of low utilization? Understanding the time to occupy and the time to use can be known, when the time slice is allocated, whether the use is entirely dependent on the user, so the situation of low utilization and high load average is entirely possible. From this point of view, only from the CPU usage rate to determine whether the CPU is in an overloaded working state or not enough, must be combined with the load average to see the overall CPU usage and application.

So turn around and look at the requirements of the test Department for the load average, in our machine for 8 CPU, control at about the load, that is, each CPU is processing a request, while there are 2 waiting to be processed. Look at the introduction of a lot of people on the Internet generally speaking load simple calculation is 2* CPU number minus 1-2 (this is only on the Internet, it is not necessarily a standard).

Additional points:

1. Determine performance issues for CPU utilization and CPU Load average results. First, low CPU utilization does not indicate that the CPU is not a bottleneck, the competitive CPU queue for a long time to maintain a long is also a performance of CPU overload. For applications that may take time to i/o,socket and so on, consider whether the speed of these hardware will affect the overall efficiency.

The best example here is a phenomenon I found in the test: SIP is currently in the process of processing, in order to improve processing efficiency, the control strategy and count information are placed in the memcached cache, when I will memcached cache configuration expansion after one times, CPU utilization and load are down, in fact, in the process of processing tasks, waiting for the return of the socket for CPU competition has also had an impact.

2. The importance of future multi-CPU programming. Now the server's CPU is multiple CPUs, our server processing capacity is no longer in accordance with Moore's law to develop. As for the phone booth scenario I mentioned above, for three users with different time requirements, we can see that the load average will be different with different allocation sequences. Suppose we count the load time period to be 2 minutes, if the order of the telephone allocation in accordance with: 1min users, 2min users, 3min users to allocate, then our load average will be the lowest, in other order will have different results. So the future of the multi-CPU programming can better improve CPU utilization, so that the program run faster.

CPU Utilization good understanding, that is, CPU utilization, more than 75% is higher (also has the argument is 80% or higher). In addition to this indicator, but also in combination with the load average and context Switch rate, it is possible that the high CPU is due to the latter two high indicators.

Load Average , this is hard to measure. Online Search a lap, have not seen a few reasonable explanations. I have 100 concurrent user tests. These two values are: 77.534%,6.108,CPU utilization ratio is high, Load average also seems a bit high. Later found the following two blog posts: Understanding load average do stress testing, "load average is the CPU load, it contains information is not CPU usage status, but in a period of time the CPU is processing and waiting for the CPU to process the sum of the statistics of the number of processes, That is, the length of the queue used by the CPU is statistical information. , basically explains the principle of the Multi-process,multi-thread program. Understand the load mean of Linux processors (translation), simple words: Load Average < number of CPUs * *0.7

To view CPU information: grep ' model name '/proc/cpuinfo

Context Switch Rate . is the process (Thread) switch, if too many switches, the CPU will be busy switching, also can cause impact throughput. The 2nd section of this article, "High-performance server Architecture", is the problem. Exactly how much is appropriate. Google has a big circle, without a definitive explanation. The context switch is largely composed of two parts: interrupts and processes (including threads) switching, one interrupt (Interrupt) can cause a single switch, and process (thread) creation, activation, and so on can also cause a switch. The value of CS is also related to TPS (Transaction per Second), assuming that each call will cause N CS, then you can draw

context Switch Rate = Interrupt Rate + tps* N

The CSR minus IR is the process/thread switch, and if the main process receives the request and gives it to the thread, the thread is processed and returned to the main process, which is the 2-time switch. You can also use the values of CSR, IR, and TPS in the formula to get the number of transitions that each thing causes. Therefore, to reduce the CSR, you must work on the switch caused by each TPS, only n this value down, the CSR can be reduced, ideally n=0, but in any case, if n >= 4, you should check. In addition to the online csr<5000, I think the standard should not be so unitary.

Additional information:

These three metrics can be monitored in loadrunner, and in Linux, you can also use Vmstat to view R (Load arerage), in (Interrupt) and CS (context Switch)

#vmstat 1 5

Procs--------------Memory-----------------swap-----IO---system------CPU----
R b swpd free buff cache si so bi bo in CS US sy ID WA
0 0 244644 29156 415720 2336484 0 0 1 49 2 1 1 0-98 0
0 0 244644 29140 415720 2336484 0 0 0 28 9 115 0 0-99 1
0 0 244644 29140 415720 2336484 0 0 0 24 62 256 0 0-100 0
0 0 244644 29140 415720 2336484 0 0 0 0 5 93 0 0-100 0
0 0 244644 29140 415720 2336484 0 0 0 0 58 255 0 0-100 0

The Interrupt rate includes the kernel due to the time slice interruption of the process. (In Linux 2.6, the system clock interrupts the clock frequency every 1 milliseconds, expressed in HZ macro, defined as 1000, that is, 1000 interrupts per second.) System is not the same, the kernel is not the same configuration 100, 250 have. )

The clock frequency of the kernel can be known by the following command

cat/boot/config-' Uname-r ' | grep ' ^config_hz= '

config_hz=100

The total number of clock interrupts per second is = number of CPUs * Number of cores * config_hz

Cat/proc/interrupts

CPU0 CPU1 CPU2 CPU3
loc:97574747 52361843 105207680 69447653 Local timer interrupts
res:107368 257510 98635 186294 rescheduling interrupts
cal:14174 14206 14164 194 function call interrupts
tlb:1007949 853117 992546 591410 TLB shootdowns

You can view the type and number of interrupts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.