Understanding load average to do stress testing

Source: Internet
Author: User

Reprinted from: http://www.blogjava.net/cenwenchu/archive/2008/06/30/211712.html

SIP 's fourth phase is over because of the richness of the control strategy, the previous stress test results have been unable to reflect the health of the SIP at high concurrency and high pressure , and therefore need to be re-tested for stress. After the test staff to do a week of stress testing, the pressure test report is formally released, it is also an end, but the next day the testers said to modify the report, because the stress test is the first time for the students to do, there is an indicator is not noticed, so need to modify several test results. The unnoticed indicator is the load average, who, like me, began to pay attention to CPU, memory usage, and not too much attention to this indicator, which differs from their usual limit ( around ten). The results of the re-test were lowered as a result of this indicator, and the final report was not as good as the original. I did not go into the stress test, but I do not understand that the future of machine configuration and expansion will have an impact, so to ask the DBA and SA, the results vary greatly, it seems to have to find the root of the problem themselves.

through the following sections, it is possible to find out the real role of load Average in stress testing step-by-step.

CPU time slices

In order to improve the efficiency of program execution, we have adopted multi-threading mode in many applications, so that the original serialization execution can be changed into parallel execution, the decomposition of tasks and parallel execution can greatly improve the running efficiency of the program. But this is all code-level performance, and how does hardware support it? That depends on the CPU 's time slice mode to illustrate all this. The execution of any instruction in a program often competes with the most valuable resource of the CPU, and no matter how many threads your program is divided into to perform different tasks, they must queue up for the resource to compute and process the command. Let's look at the single CPU situation. The following two graphs describe the situation of thread execution in time slice mode and non-time slice mode:


Figure 1 non-time slice thread execution


Figure 2 non-time slice thread execution

As can be seen in figure one, if any thread is queued for CPU resource acquisition, then the so-called multithreading does not have any practical significance. The CPU manager in Figure two is just one of my virtual role, it is to allocate and manage CPU usage, at this time multithreading will have the opportunity to get CPU resources in the running process, also really realize the multi-threaded parallel processing in the case of single CPU.

multi-CPU is only a single CPU expansion, when all CPUs are operating at full load, it will be a time slice for each CPU to improve the efficiency of the way.

During Linux kernel processing, each process defaults to a fixed time slice to execute the command (default is 1/100 seconds), during which time the process is assigned to the CPU andthen used exclusively. If the use is complete, and not the time slice of the specified time, then the active abandonment of the CPU, if the time slice has not completed the work, then the use of the CPU will be retracted, the process will be interrupted pending waiting for the next time slice.

CPU Utilization and Load Average the Difference

The stress test not only needs to simulate the pressure parameters of the concurrent users of the business scenario, but also to pay attention to the performance of the machine during the stress testing process to ensure the effectiveness of the pressure test. When the server is in a state of overload for a long time, the pressure that can be received is not the acceptable pressure we think. Just as a project manager makes a person work for an hour every day when he is estimating a workload, the Project plan is not a reasonable plan, and the person will break up sooner or later and affect the overall progress of the project.

CPU Utilization In the past is often used by our layman to determine whether the machine has reached the full load of a standard, see the use of 50%-60% that the machine has been pressed to the critical. CPU utilization, as the name implies is the use of the CPU, this is a time period of CPU usage statistics, through this indicator can be seen in a certain period of time the CPU is occupied, if the occupied time is high, then need to consider the CPU is already overloaded, long-term overload of the machine itself is a kind of damage, so the utilization of the CPU must be controlled at a certain proportion to ensure the normal operation of the machine.

load average Cpu statistics of the sum of processes processed, namely Cpu statistics using the length of the queue. Why do you want to count this information, the impact of this information on the stress test is how, then through an analogy to explain Cpu

We use the CPU analogy as a telephone booth, and each process is a person who needs to make a phone call. There are now altogether 4 phone booths (just like our machines have 4 cores) and there are ten people who need to call. Now the rule of using the telephone is that the administrator will give each person in order to take 1 minutes to use the telephone time, if the user in 1 minutes to complete, then can immediately return the phone access to the administrator, if the 1- minute phone users have not been used to complete, Then you need to re-queue and wait for the allocation to be used again.


Figure 3 phone usage scenario

in the use of telephone users have also made a classification, the 1min represents these users occupy the phone time is less thanor equal to 1min, 2min indicates that the user takes up the phone time is less than or equal to 2min, and so on. According to the phone usage rules, 1min users only need to get one allocation to complete the call, while the other two types of users need to queue two to three times.

utilization of the phone = SUM (active use CPU time)/period

Each user assigned to the phone uses the sum of the telephone time to be removed with a statistical time period. It is important to note that the sum (SUM (active use CPU time)) of the telephone is used, which is different from the sum of the elapsed time (sum (Occupy CPU) . (for example, a user got a minute of the right to use, in ten seconds to make a call, and then go to query the number of seconds, and then use the rest of the second phone call, then occupied the phone for 1 minutes, the actual use of only ten seconds)

The average Load of the telephone represents the average number of people who use the phone and those who are waiting for a telephone assignment during a statistical period.

Telephone utilization statistics can reflect the use of the telephone, when the phone is in use for a long time and not enough to break the interval, then for the telephone hardware is an overloaded operation, need to adjust the frequency of use. And the phone Average load from another perspective. For a description of the status of the phone usage, the higher the Average load, the more competitive the phone resource, the more scarce the telephone resources. In fact, the application and maintenance of resources also need a great cost, so in this high average Load situation, the long-term "hot competition" of telephone resources is also a kind of damage to hardware.

Is there a case of high load Average in case of low utilization ? Understanding occupancy and usage time can be known when the time slice is allocated and whether the use is entirely dependent on the user, so there is a good chance of a low utilization high load Average . From this point of view, only from the CPU utilization to determine whether the CPU is in an overloaded state of work or not enough, you must combine the load Average to see the overall CPU usage and application situation.

So go back to the test Department for load Average requirements, in the case of our machine for 8 CPUs, control at about ten Load, that is, each CPU is processing a request, while there are 2 waiting to be processed. Look at the introduction of many people on the Internet generally , load simple calculation is the number of CPUs minus 1-2 (this is only on the Internet, it is not necessarily a standard).

Additional points:

1. performance issues are judged for CPU utilization and the results of CPU Load Average. First, low CPU utilization does not indicate that the CPU is not a bottleneck, and the queue of competing CPUs remains long and is a manifestation of CPU overload. For applications that may take time to i/o,socket and so on , consider whether the speed of these hardware will affect the overall efficiency.

The best example here is what I found in the test: SIP is currently in the process of processing, in order to improve processing efficiency, the control strategy and counting information are placed in the memcached cache , when I will memcached cache Once the configuration has been scaled up, CPU utilization and load have been reduced, in fact, in the process of processing tasks, waiting for the return of the socket for the CPU competition has also had an impact.

2. The importance of future multi-CPU programming. Now the server CPU is multi-CPU , our server processing power is no longer in accordance with Moore's law to develop. As far as the phone booth scene I mentioned above is concerned, for users with three different time requirements, we can see different load Average in different allocation order . Suppose we count the time period of load to 2 minutes, if the order of the telephone allocation according to: 1min users, 2min users, 3min users to allocate, then our load Average will be the lowest, The other order will have different results. So the future of multi-CPU programming can better improve the CPU utilization, let the program run faster.

The above mentioned content is not all very accurate or correct, if there are any deviations also please point out that some of the unclear concepts can be corrected.

Understanding load average doing stress testing (RPM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.