Let's imagine the following scenarios. You are about to end six months of development for a complex Internet application or web service and are ready to deploy it. The development team carefully designed loosely coupled N-Layer Web applications. Starting from the first day of work, all elements necessary for scalable, stable, and high-performance applications have been fully built into the system architecture. The Quality Assurance team has thoroughly tested the system, solved most serious errors, and carefully considered the remaining errors to be discovered. Therefore, your development work should be smooth, right? Think about it again.
Have you implemented load testing as part of the development work? If not, you should accept the fact that, in complex design, you will need to pay attention to performance, scalability, and reliability in some areas. A bottleneck is an element that hinders normal traffic in the system. Although good design is critical to successful web applications, experience tells us that most of these types of errors can only be discovered when the system is under heavy load. These are issues that you cannot find during the development process as a single user through the test system. Early implementation of the Load Test Plan helps to minimize exceptions during development.
In this article, we will introduce the testing method based on practical experience, but this is not to leave the traditional load testing strategy behind. As we have led a large number of load testing teams, we have learned some lessons that may help you.
We will discuss the advantages of starting the test earlier and outline precautions for setting up the test environment. We will help you identify metrics that suit your implementation and introduce some tools to implement these metrics. In addition, we will explain to you why "Can my site process X users at the same time ?" These questions are too vague to be answered accurately. Finally, we will discuss important considerations for selecting appropriate load testing tools for your specific needs, and provide some suggestions for tracking the test results.
We will use the term "load testing" to describe performance, scalability, and reliability testing. People often use the term "scalability test" too frequently to describe the above three items, and what your team does is probably not limited to this. Figure 1 describes these goals.
Start as soon as possible
You should plan the load testing at the design stage. Based on our experience, we recommend that you use the "No surprise" Method for development. Always stick to the idea of discovering problems at work. The increasingly complex architecture of distributed Web applications and Web Services makes potential problems inherent in application design.
Recently, in the development phase of Complex N-Layer Web architecture, load testing has been very effective. However, we cannot estimate the two problems. First, we underestimate the number of problems that will be discovered after the test starts. Our first test failed after only two users and 100 orders were processed. Second, we underestimated the time required to set the test environment. Fortunately, since we started planning tests early enough, we had time to solve problems found before the deployment date or minimize them. After paying close attention to the design and successfully solving the first few problems, the scalability of the system was quickly improved.
By defining the test environment, you can start planning the test. Depending on the scale of the test, this may be a very important task.
Back to Top
Define Environment
In the process of defining the test environment, the first task is to estimate what work is required. A general guide for resource costs is to use 15% to 20% of the implementation time for testing, about 1/3 of which is used for load testing.
It is important to create a separate test environment, which should be similar to the production environment. If the computer's configuration, speed, and settings are different, it is impossible to infer the performance in the birth environment. In other words, you can give a definite answer to questions such as "add more hardware to the system to improve scalability, however, how many users can a Web server process in a production environment?" However, you cannot answer such questions accurately. One of your main tasks is to reduce uncertainty and answer questions through conclusive evidence. Without similar hardware, you can only guess based on your experience at best if you have.
In the face of the cost of using production computers in a load testing environment, you may be afraid of shrinking. However, consider the costs required to locate hardware-related problems in the production environment and accurately predict the value of the load that a single web server can handle. Variables such as processor speed and available Ram affect available system resources and may change the scalability to indicate their own approach. In the lab, environment variables are irresistible. The number of such variables is too large, and you cannot determine the root cause of the problem. If it is not possible to use a separate environment, consider accelerating the purchase of production hardware for use in the load testing lab. Once a system is deployed, laboratory equipment can also be used as a backup for production equipment. Another benefit is that you can eliminate system defects without waiting until the date of release.
There are several reasons why you should not use the development environment for testing. For more information, see "Don't use your Dev environment for load testing" in the feed bar ". The same is true for the system testing environment used by the Quality Assurance Team. This applies to individual user tests that want to track functional errors that seem unrelated to system load. This test relaxed the restrictions on the types of hardware used in the system test environment. It also receives more software updates from the development team. In the load test, only versions that affect system performance should be installed to minimize the time needed to modify the load script.
In addition to the resources required for scalability lab operations, the success of the load test depends on other roles in the Organization. Figure 2 Role Summary.
The most important role outside the lab is the database administrator (DBA) with high permissions. This fact does not need to be emphasized too much. The most likely cause of scalability problems is databases, data access policies (such as stored procedures, pre-processing statements, or Inline SQL), or data access technologies (such as ADO and ODBC ). DBA can help identify and solve database-related problems, such as high indexing costs, excessive locking, and transaction timeout. Ideally, you should have a dedicated and competent DBA to serve as a full-time resource for key points in load testing.
We also recommend that you turn the development team members in charge of the test lab in turn so that each team member can participate in the test. If you do this, you will have excellent cross-training results and will continue to provide the lab with the latest ideas.
Back to Top
Define test policy
So far, you must have attended such a meeting. The customer relied on a large meeting table and asked you, "Can this system handle thousands of users ?" The traditional load testing method requires you to write scripts and perform tests to try to give a precise answer to this question. For this test, you need to define the meaning of "processing" and the situation of 1000 typical users during site activity. You need to define test cases to represent various user activities, such as purchasing stocks or registering new accounts. Next, you must estimate the distribution of users in these test cases. Assume the following data, that is, the time (or wait) It takes to simulate the interaction between the real user and the application ). Therefore, activities during the load test can roughly reflect the situation of the same number of real users during site activities.
This method has several shortcomings. First, the results will not be better than your assumptions. Obviously, incorrect assumptions will lead to deviations in the results.
Second, it is estimated that real users need a large number of client hardware. If you specify the required processing capacity and memory volume for each virtual user, the typical client computer can process about 200 virtual users. Therefore, testing of the concurrent processing level for 2000 users requires 10 client computers-a major investment. Testing websites using HTTPS will require much more client hardware.
In the end, it is difficult to provide operation-oriented information to your development team. When a fault occurs, it is often difficult to reproduce the problem.
As an alternative, we recommend that you design test cases around the following key issues:
• Where are the system bottlenecks? How many concurrent requests can the system process synchronously?
• How many non-sync superusers can be processed by one machine before the response time becomes unacceptable?
• Is the result of linear growth when additional hardware is added?
• Are there any stability problems that may impede the site from running in the production environment?
This method uses additional information provided by the development team (this development team participates in areas where issues may occur. Please pay attention to these fields. For the previous example, the bottleneck may be in the order submission field. Here you can derive more specific questions, such as "how many requests can be processed by the submission process at the same time ?" Attacking these specific fields is the fastest and least costly method to provide operation-oriented information to development teams so that they can improve their systems. When using this method, we recommend that you keep in mind the following suggestions.
Focus on load testing as we have mentioned, the first thing to do is to build scripts that cause potential bottlenecks and stability problems. This "data first, assume second" method enables you to collect raw data from the application, and then determine higher-level results based on the assumptions. Don't worry about writing scripts to identify low-risk sites. For example, writing scripts for the field of site help or read-only documentation is unlikely to cause system bottlenecks.
Synchronous requests use synchronization request attack bottlenecks. The idea here is to simulate the worst case: that is, users on the site can precisely attack the bottleneck at the same time. By enabling user synchronization, you can perform this test again. If the result is not synchronized, it is difficult to reproduce the fault. You can use synchronization points to achieve this. synchronization points are a feature provided by most robust (cost-effective) testing tools. The synchronization point forces each virtual user to wait until the remaining user reaches the point defined in the script before starting the next request. It allows you to precisely and repeatedly determine the number of concurrent users that can be processed in the potential bottleneck area of the site. For example, the lower limit can be 7 concurrent synchronization users.
Create a cyclic test case script to cycle the test case. In another method, the site should be in the same status before and after each test case iteration. This allows you to repeatedly run test cases for a long time.
Use what we call a Super User. As mentioned above, the Super User runtime thinking time is set to zero. Keep in mind that the thinking time hypothesis is used in regular tests to enable virtual users to simulate real users. However, if the virtual user's thinking time is halved, the actual server load will be doubled. In another method, the server is really concerned with the number of requests per second. The number of virtual users and their thinking time are combined to generate the load.
Let's perform some mathematical operations to make this concept clearer. The following formula calculates the load generated by the real users who access the site (number of requests/second ):
For example, if a site has 100 concurrent users, if the download time is 10 seconds and the thinking time is 30 seconds, 2.5 pages are generated every second. If we assume three requests per page, the web server converts them to 7.5 requests per second.
When a Super User runs the test, observe the number of requests per second and compare it with the value just calculated. Based on our experience, the ratio of real users to Super Users is usually about. For the same example, this means that (100/15) superusers will generate the same load as 100 normal users. For another example, it is assumed that the response time becomes unacceptable after 10 super users. Please note that the number of requests per second at this point is converted back to the actual number of users. Now you can make any desired thinking time hypothesis, or even change it without re-running the test. After a few days of testing, you will be able to convert from Super Users to real users intuitively. This method allows you to keep the number of users controllable, reduce the number of client hardware required, and include the cost of load testing software.
These super user test cases are useful for multi-host testing. To test the scalability of the site, you can add a second Web server and a Load balancer, and repeat the superuser test. Ideally, you can double the number of super users before seeing the same number of times.
To answer stability questions, you can run the test to maintain a reasonable number of concurrent users within an extended period of time and fail to synchronize Super Users. We spent a lot of time in the previous project, even 24 hours a day, but the duration is related to the application. We call it a "built-in" test. Once you have taken steps to identify and potentially solve the bottleneck, repeat the synchronization point test to see if the lower limit has increased. Then run the "built-in" test again with the new concurrent users supported. Repeat the cycle with the goal of improving this number until the quality bar is reached.
Back to Top
But how many users are there?
Although this method provides valuable information to the development team, it makes it more difficult for you to answer questions in the conference room. However, you can estimate the answer in an approximate way. For example, if the worst-case bottleneck shows that each computer has more than 20 super users, the response time exceeds 10 seconds. Based on the formula you have calculated, we estimate that there are approximately 300 real users (20 Super Users × 15 real users ). In this case, you can make the same assumptions as in common use cases. In general, how many percent of users are using the site in this field? Assume that 50% of users are expected to use this field, while the proportion of users in other fields, such as reading documents or databases, is not that large. This means that a system with a web server will process about 600 users.
So far, we have discussed how to do this when it clearly points to a bottleneck field of the site, but what should you do if there are more than one field that affects performance? The answer is to create a test script to separately view various fields. Run these scripts in isolation and then run them together. Compare the results to see the impact of one domain on another.
Back to Top
Measure
The next step is to clearly define the metrics. Examples of measurement standards include the number of orders processed per minute, or the number of milliseconds required to execute requests to ASP pages. The measurement standard allows you to quantify the results of changes between each test run. They provide a comparison with the standards defined for your web application.
To determine the metrics to be tracked, a series of steps are required. You need to define the question to be answered, define the quality bar for each question, and then determine the measurement criteria necessary to compare the test results with the quality bar.
The first step is very direct. For example, you may want to know the checkout response time. Remember to list the issues related to the test policy to avoid Organizing Fuzzy questions that you cannot test.
The next step is to define the quality bar for each problem. Let's take the typical order submission process as an example. We may decide that the site must process 10 orders per minute during peak load. A user waits until the request is executed for no more than 30 seconds. To establish such a standard, you may pay attention to many different sources. First, consult industry insiders to learn about the acceptable level of system performance. Bringing historical data to these meetings will facilitate discussion and often be used to manage expectations. If a version already exists in the production environment, you can collect data from the short-term projection of the current site activity and added traffic, or query the activity trend of the existing database to collect data.
With a list of questions and quality standards for each question, you now need to determine which metric to use. Based on the previous example, the number of orders per minute and the number of orders in a given test will be good high-level metrics that can serve as indicators of whether the site meets quality bar requirements. When you want to update these Metric Standards during the test, report them to the management.
Low-level metrics measure performance and help you solve system bottlenecks and stability problems, or minimize these bottlenecks and problems. Increasing performance may have a direct impact on advanced metrics. For example, reducing the transaction time of a specific activity may increase the number of orders per minute.
Most test tools allow you to Set timers on individual pages or a group of pages and provide the average time for running test cases. Both measure standards allow you to use advanced measurement standards for test run again and again, but they do not help you understand what improvements are needed.
This is useful for Windows performance counters. For example, you can monitor the processrivate bytes of the DLLHOST process to detect memory leakage in the server software package. The appropriate and detailed descriptions of Microsoft Internet Information Service (IIS) counters can be obtained from the art and science of Web Server tuning with Internet Information Services 5.0, figure 3 describes the main counters used for load testing and the trend to be noted.
However, performance counters are only useful for identifying symptoms and causes of problems. If your system is interrupted when 20 concurrent users exist, the Active Server Pages: Requests timed out counter can be confirmed. At least one user has timed out, the reason for the timeout is as if it were a haystack. This is because the performance counter data mainly provides information at the operating system and network level. To locate the cause of the problem, you need to access the data at the application layer. The key to this task is to build a distributed logging system to retrieve and centrally store error and performance data from applications. This allows you to immediately check whether the system is working. If the system does not work, you have the information necessary to locate the problem area.
Back to Top
Interpreting measurement standards
After all these metrics are configured, you can now access a large amount of data. Therefore, how do you understand data in an efficient way? We will discuss three options for interpreting performance counter data: Performance Monitor, perfcol, and performance data integrated with the load testing tool.
The Performance Monitor in Windows 2000 allows you to graphically display the progress of each counter. A useful feature is the ability to capture readings in a log file, allowing you to visually check the entire test run once the test is complete. Figure 4 illustrates how site activity on an online ordering application can be explained in Performance Monitor.
Windows DNA performance kit beta contains a tool called perfcol along the same line as performance monitor. This tool is similar to Performance Monitor in that it stores sample data in a database rather than writing files.
Some load testing tools, such as Microsoft Application Center Test (ACT) and e-test suites from empirix, all contain built-in performance counters, they can record measurement values during the test run duration. Then the counter data is written into the database for future access. Act is included in Visual Studio. NET and integrates performance monitor counters, allowing all data to be stored in a single resource library.
Whether or not the load testing tool integrates some form of Performance Counter Monitoring, you may find that you still need support for tools such as performance monitor, especially if the server that generates the load does not have proper security access to monitor the application server, this situation occurs frequently when the environment includes a firewall.
Regardless of the monitoring tool selected, the key is to store the measurement standards for each test run for future evaluation. Returning data to the past is critical to understanding how the system responds to changes.
For application-level data generated by the logging system, we recommend that you build a viewer that allows you to immediately access errors and performance information in a location. This is worth considering that the operation it replaces is to generate an SQL query on the command line every time the feedback is required.
Back to Top
Select an appropriate Load Testing Tool
To implement this test policy, you need to select an appropriate load testing tool. The complete evaluation of available load testing tools is beyond the scope of this article, but we still want to help you determine the options and considerations when selecting the appropriate tool.
The first option is to consider free tools such as Windows application stress tool (wast. On the other hand, you can also choose more flexible tools, such as Act or empirix's e-test suite. Figure 5 shows the e-load interface, which is the load generation part of the E-test suite.
There are some obvious functional differences between these tools. Wast is a good tool for small websites that are not complex. You can easily test the two or three key pages on the site and understand the response rate. However, only one isolation test tool that can test multiple page sites is not enough. In addition, wast does not provide several important functions necessary for testing complex sites (and some suggestions in this article ). To use wast to obtain complex results, you need to customize the application for load testing. This is obviously not what you want.
To perform the tests we recommend for Complex Sites, more robust test tools such as act or e-test suite will be more effective. If you are developing in. net, act will be integrated into the entire development cycle. However, this requires the programming skills and knowledge of the act object model to generate a powerful test script. If you decide to use tools such as e-test, you need to pay a license fee.
Figure 6 ACT result page
Quality tools must not only effectively test sites, but also report test results in a useful way. Both act and e-test provide detailed reporting environments, allowing you to draw results into images as needed. The ACT result page is displayed in figure 6. Figure 7 provides a summary of common features and a description of each tool type that must be provided.
If you are sure that more robust tools are necessary, do not underestimate the time required to start and run them. Some tools claim to be able to write the script necessary to start the test in just a few hours. This may be true if you have previous experience using this tool or similar load testing tool. However, it may take a few days or even weeks to prepare the site. The time required depends on the complexity of the site. Our first test case took about three weeks to start and run. You may find that although you have learned the sample tutorial once, you still have some tips that you can only learn from practice, and you need to call the Support Hotline frequently. It takes several hours to learn tools, and the effect may be far better than regular training or hiring experienced consultants. In addition, if you start testing late in the development phase, you will not be able to afford the loss of time. In this case, we strongly recommend that you use one or all of the above two resources.
Back to Top
Understanding history
The number of tests in a day or even a week may change. If you are adjusting the web server, you may decide to run a series of tests every hour. If your goal is to test application stability, the test may be run all night. Either way, unless you keep a document history, it will be difficult to track the variables and progress from one test to the next test. It is important that you easily determine which tests have been performed, what has been found, and what to test next.
At least the start and end times of the operation, the number of virtual users in the test, and a description of the start of the test target and change content should be recorded. Run the test with a description of the end of the test result.
Back to Top
Summary
To successfully deploy a complex web application, you must first use a "No surprise" test method that is out of the scope of the system test. Server Load balancer consists of scalability testing, performance testing, and stability testing. It is the only way to discover major internal problems in the architecture. To achieve this, you need a separate production environment with similar production hardware, robust load testing tools, and collaboration among several members in the Organization.
Appropriate measurement standards provide methods to determine whether the system meets the quality bar. Of course, for the scalability lab team, the most valuable thing is that distributed logging records errors and performance data captured by the system because it provides application-level information.
By using the suggestions discussed in this article and recording work conditions, you can ensure smooth deployment within the planned date.
Reprinted: http://bbs.51testing.com/thread-125398-1-3.html