Interface Performance test Scenario white paper V1.0

Last Update:2015-08-04 Source: Internet

Author: User

Tags benchmark website performance

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the performance Test terminology interpretation
1. Response Time
Response time is the time it takes to start a request from the application to the last byte of data received by the client. Response time according to the characteristics of the software can be subdivided, such as the response time for a C/s software can be subdivided into network transmission time, Application server processing time, database server processing time. In addition, the client itself also has parsing time, interface rendering time and so on.

Response time is a performance metric that is primarily viewed by the client and is a performance metric that users are most concerned about and easily aware of.
2. Throughput rate
Throughput rate refers to the number of requests processed by the system per unit of time, from a business point of view, the throughput rate can be measured in units of requests, transactions per second, number of pages per second, number of queries per second, etc. From a network point of view, the throughput rate can also be measured in bytes per second.
Throughput is a performance metric that is primarily viewed from the point of view of the server, and it can measure the processing power of the entire system. For a cluster or cloud platform, the throughput metrics reflect the pressure that the server cluster can withstand externally, which is easier to compare than the number of users.
Note: throughput = throughput Rate * unit time
3. Number of users
For a server cluster or cloud platform, almost all are multi-user system, the system can provide to how many users normal use, is also a very important measure. We use these users in accordance with the timing of the system to make the following distinction.
System users: Refers to the amount of users the system can store.
Online users: Refers to the number of users who are in a normal use state after they have been identified.
Concurrent users (Concurrent users): The number of users who are using the system at the same time within a certain timeframe.
Strict number of concurrent users (strictly the number of concurrent users): refers to how many people are operating a business at the same time.
In the performance testing process, we are going to simulate the actual user to send the request. But in order to spit out the server to create greater pressure, we simulated the user operation and the actual user operation there are some differences (such as the simulated user request more frequent than the actual user request), and the number of users back to the simulation and the actual number of users is also difficult to convert each other. Therefore, in the measurement of server cluster capability, the throughput index is more practical than the number of users indicators.
Second, performance testing methods and objectives
1. Performance test Methods
1.1 Benchmark Test (Benchmark testing)
Benchmark test based on the amount of data on a certain scale, a single business or with the actual user operation of the same proportion of the combination of business testing, the purpose is to quantify the response time, throughput rate of the indicators, easy to follow-up.
The method is to do multi-group test of different scenes, observe the results, extract a few key data to remember login, used for performance comparison and evaluation later.
1.2 Performance test (performance testing)
Test the performance of the system to meet production performance requirements by simulating the business pressure and usage scenarios of the production operation.
Characteristics:
(1) The main purpose is to verify whether the system has the ability to claim the system.
(2) need to understand the typical scenario of the system under test, and have the definite performance target.
(3) required to operate in a defined environment.
1.3 Load test (load testing)
By increasing the pressure on the system under test until performance indicators such as "response time" exceed the predetermined indicator or some resource use has reached saturation.
Characteristics:
(1) The main purpose is to find the limit of the system processing ability.
(2) need to be carried out in a given test environment, usually also need to consider the measured system's business pressure and typical scenarios, making the test results have business significance.
(3) Generally used to understand the performance capacity of the system, or to match the performance of the use of tuning.
1.4 Pressure test (Stress testing)
Test system in a certain saturation, such as CPU, memory and so on in the saturated use of the system can handle the ability of the session, and whether the system error.
Characteristics:
(1) The main purpose is to check the performance of the system under pressure.
(2) Generally through the simulation of load and other methods, so that the use of the system to achieve a higher level of resources.
(3) Generally used to test the stability of the system.
1.5 Configuring Tests (Configuration Testing)
Through the adjustment of the software/hardware environment of the system under test, the degree of the influence of various environments on the system performance is obtained, and the optimal allocation principle of each resource is found.
Characteristics:
(1) The main purpose is to understand the degree of influence of various factors on the performance of the system, so as to determine the most worthwhile tuning operation.
(2) Generally after a preliminary understanding of the state of the system performance.
(3) Generally used for performance tuning and planning capabilities.
1.6 Concurrency Test (Concurrency testing)
Test for deadlocks or other performance issues when multiple users concurrently access the same application, the same module, or data records by simulating concurrent access for the user.
Characteristics:
(1) The main purpose is to discover problems with concurrent access that may be hidden in the system.
(2) Focus on the possible concurrency problems of the system, such as memory leaks in the system, thread locks, and resource contention issues.
(3) can be used in various stages of development, the need for the relevant testing tools to cooperate and support.
1.7 Reliability Test (reliability testing)
The test system can run stably under this condition by loading some business pressure to the system (for example, the utilization of resources in 70%~90%) and keeping the application running for a period of time.
Characteristics:
(1) The main purpose is to verify whether the system supports long-term stable operation.
(2) It needs to run for a period of time under pressure.
(3) need to pay attention to the system health.
1.8 Failure Recovery Test (Failover testing)
Designed for systems with redundant backup and load balancing, it can be used to verify that users can continue to use the system if the system fails locally, and how much the user will be affected if this happens.
Characteristics:
(1) The main purpose is to verify the system can continue to use in the case of local failure.
(2) It is also necessary to point out the conclusion of "how many users will be able to support" when the problem occurs and the "what contingency measures" programme.
(3) In general, this type of testing is only required for systems that have a clear requirement for the system's continuous operating indicators.
2. Performance Test Objectives
Overview, can be divided into 4 areas:
2.1 Ability Verification
In system testing or acceptance testing, we need to evaluate the capabilities of the system and measure the performance of the system. The ability of the system can be the number of concurrent users accommodated, or the rate of swallowing of the system; the performance metrics of the system can be response time, or CPU, memory, disk, and network usage.
Characteristics:
(1) required to be carried out in a defined environment.
(2) test scenarios and use cases need to be designed according to typical scenarios.
The general method is: Performance test, pressure test, reliability test, failure recovery test.
2.2 Capacity Planning
Assess whether a system can support user growth over time or how the system configuration should be adjusted so that the system can meet the needs of the growing number of users.
Characteristics:
(1) is an exploratory test
(2) can be used to understand the performance of the system and to obtain extended performance methods, such as system expansion planning. The capacity of the system can be either user capacity or data capacity, or the amount of the system's capacity to swallow (the processing power of the system). For the Cluster service we are more using throughput as capacity.
The ① method is to test the performance of each subsystem and component first, and find out the optimal ratio between them. ② and then through the horizontal expansion of each link, the overall expansion of the machine ratio is calculated.
The general method is: Load test, pressure test, configuration test.
2.3 Performance Tuning
In order to better play the potential of the system, positioning system bottlenecks, targeted system optimization.
The method is to perform the system tuning, we need to do the benchmark test, to compare the performance data changes, and repeatedly adjust the system hardware and software settings, so that the system to play the best performance. Of course, in the case of system optimization, we will select the key indicators for optimization, the return time may have to sacrifice other performance indicators. If the goal is to optimize the response time, we may choose a strategy that takes space time, at the expense of memory or the expansion of the cache, and we need to find a balance in each performance metric.
General adjustments to the system include the following 3 areas:
(1) Adjustment of the hardware environment
(2) Adjustment of system settings
(3) Application-level adjustments
The general approach is: benchmarking, load testing, stress testing, configuration testing, and fail-back testing.
2.4 Defects found
As with other tests, a performance test can also detect defects. In particular, whether there is a resource contention when strict concurrent access is causing slow response times, or whether a large number of users are accessing the program to cause a crash.
The method is to set the collection point for strict concurrent user access, or to set up such performance test cases as hyper-scale user burst access.
The common approach is: concurrency testing.
Third, performance requirements analysis
1. Performance Requirements Acquisition
1.1 Function Spec Sheet
1.2 System Design Documentation
1.3 Business Plan
1.4 User Behavior Analysis record
2. Selection of performance key points
It is mainly selected from the following 4 dimensions:
2.1 Business Analysis
Determine if the tested interface is a critical business interface or analyze the key business to indirectly obtain the interface that the business accesses.
2.2 Statistical analysis
If the interface system access behavior exists log analysis record, then directly obtain a high daily access to the interface; otherwise, based on the interface publishing type, select 3rd party Log Analysis tool to obtain indirectly.
(1) IIS Log Analysis tool: Log Parser v2.2

(2) Tomcat log analysis tool: AWStats v7.3

(3) Nginx Log analysis tool: Goaccess v0.9

If an interface application server such as IIS or Tomcat uses Nginx for load, the log traffic must be load-dependent, which prevents the interface from setting the cache in Nginx (that is, not transmitting), which results in incorrect statistics.

2.3 Technical Analysis
(1) logic to achieve a high degree of complexity of the interface (such as the decision to branch too much or involve cpu/io intensive operations, etc.)
(2) High interface for hardware resources such as system (memory, CPU, disk IO) and network IO
Note: If the interface is refactored due to logical modification, it needs to be re-analyzed.
2.4 Operational analysis
As a result of the operation of the promotion of daily access to a sudden increase in the interface.
Note: If your business plan is adjusted, you need to re-analyze it.
3. Description of performance indicators
3.1 Response Time
In general, the average response time of weakly interacting class interfaces is less than 1 seconds, and the average response time of strong interaction class interfaces is less than 200 milliseconds.
3.2 Success Rate
In general, the interface response success rate must reach more than 99.99%.
3.2 System resources
For optimal load, the system CPU and memory usage recommended interval [50%,80%], otherwise no more than 50% is recommended.
3.3 Processing power
The application of the project is expressly required: under the XX pressure (concurrent number) TPS needs to reach XX or interface system can support XX million real-time online access.
3.4 Stability
In the actual system operating pressure conditions, can be stable operation N*24 (General N >= 7) hours. It can operate stably for 12 hours under the condition of 1 time times the operating pressure of the actual system.
Four, the performance test range
1. Scope of Business
Key business function point description.
2. Design scope
The network access layer, interface layer, middleware, storage layer and other tested components and topological structure description.
The calculation method of concurrency number
Have done some performance testing of children's shoes just started to compare the concurrency of one or a certain type of interface how to calculate, in fact, the number of concurrency can be from the user's business and server 2 point of view.
1.80/x principle
Scope of application: Unlimited
Take a project as a case, the day of the mother Day interface server traffic distribution as shown below, how to calculate the average number of concurrent and peak concurrency today?

Through the Baidu statistical platform http://tongji.baidu.com/to see the day of Mother's Day UV curve distribution is linearly related to the request volume, as follows:

Using the idea of calculus, each point of time is treated as a rectangle, which can be summed to find the entire area of the map, as shown below:

In fact, the length of each rectangle is 1 (1 hours), so the area should only consider the width, that is, to consider the amount of requests per hour.
According to the principle of 80/x, to find the total area of 80% of the time, select the largest possible point to calculate the total area of 80% of the time, the number of points is 7, meaning that the length of time to the total length of 30%, then the 80/x principle is converted to 80/30 principle, as follows:

Therefore, the average number of concurrent (average requests per second) = 80% * Daily request volume/1 days * 30%
And then calculate a multiple of the peak value and the average concurrency = 2.25
Therefore, the peak concurrency number (peak requests per second) = 2.25 * Average number of concurrency =
2.25 * 80% * Daily request volume/1 days * 30% = 6 * Daily request volume/1 days
Because of the linear relationship between the UV and the demand curve distribution, the daily request volume = 9.25 * Day UV
Therefore, the peak concurrency number = 6 * 9.25 * Day UV/1 day = 55.5 * Day UV/1 days
2. Formula method
Scope of application: Web Class Access
Equation (1) calculates the average number of concurrent users: C = n * l/t
C is the average number of concurrent users; n is the number of login sessions; L is the average length of the login session; t refers to the length of the period of the visit.

Equation (2) calculates the peak number of concurrent users: C ' ≈c+3 radical C

C ' refers to the peak number of concurrent users, and C is the average number of concurrent users in the formula (1). The formula is derived from the assumption that the login session of the user is estimated to be in accordance with the Poisson distribution.

Example 1:
Suppose there is an OA system, the system has 3,000 users, an average of about 400 users per day to access the system, for a typical user, the average user from logging on to exit the system in a day is 4 hours, in a day, the user only within 8 hours to use the system.
C = 400 * 4/8 = 200
C ' ≈200 + 3 * Radical root 200 = 242
To better understand the above formula, convert it to the following formula:
Formula (3) Number of concurrent users = throughput * Scene Business time/unit time period
Example 2:
An OA system with 8000 users logged into the system within 1 hours. Each time users log in to the system, you need to start the login page, and then enter the user name and password to enter the homepage. In general, the user in the above operation process takes 5 seconds, and requires from the click the Login button to fully display the home page, need to control in 5 seconds.
Analysis:
Throughput = 8000 * 2 (the entire business operation needs to be loaded 2 pages to complete)
Scene Business time = 5 + 5 = 10 seconds
Unit time period = 1 hours = 3,600 seconds
Number of concurrent users (login scenario) = (8000 * 2) * 10/3600 = 45
After we get the number of business concurrency by the above method, we can further analyze which interfaces the business accesses, and we just need to simulate these interface invocation methods and call timing.
Sometimes we need to calculate the number of concurrent numbers for one or a class of interfaces, and we can analyze them as follows:
(1) The number of business scenarios and each business scenario accessed by the tested interface is combed out
(2) Calculate the number of concurrent users in the business scenario by the above method
Interface concurrency = Scenario 1 Concurrent users * Business Scenario interface invocation Number 1 + Scene 2 concurrent users * Interface calls 2 + ...

If a system needs to support 100,000 of online users, how can I calculate the number of concurrent users through performance requirements analysis? We can learn from the above content, independent thinking?

Vi. performance test Cases and scenarios

Script templates
　　　　　
Scene template

Seven, the Performance test tool selection
1. Data Modeling Tools
DataFactory is a powerful data generator that allows developers and QA to easily generate millions of meaningful, accurate test databases that support DB2, Oracle, Sybase, SQL Server databases, ODBC connectivity, The MySQL database cannot be used directly and can be indirectly supported.
2. Script Development Tools
(1) If you consider the script run efficiency, you can consider the underlying development language C or support asynchronous communication language JS, we can select: LoadRunner or node. JS IDE Environment for development.
(2) If you consider the efficiency of script development, you can consider code reusability, you can choose the object-oriented language C # or Java, for which we can choose: VS2008 and above version + corresponding Lr.net control or Eclipse4.0 and above version + JDK1.7 and above version.
HTTP, socket and other protocol interface performance test script development process, please see the attachment:
Script development and performance analysis of HTTP interface performance test. pdf
Use the Lr.net control to complete the performance test scripting method. pdf
node. JS Learning starter Manual. pdf
3. Pressure Simulation Tool
(1) If the Java class interface and the single-machine concurrency control within 500, you can choose JMeter or LoadRunner.
(2) If the WebService class interface and the single-machine concurrency control within 500, you can choose Soapui or LoadRunner.
(3) If the number of single-machine concurrency more than 500 and control within 5000, you can choose LoadRunner.
(4) If the number of single-machine concurrency exceeds 5000, it is recommended to use a load cluster, that is, the "control Center" + Multi-Machine deployment (load Generator) "scenario.
4. Performance monitoring Tools
4.1 Application Monitoring
Regardless of the Windows or Linux platform, there is typically one or a set of process instances, and we can choose LoadRunner or Nmon to monitor. Sometimes in order to obtain some characteristic indicators of the application under test, you can choose the performance toolset or monitoring system that comes with the tested components. Common Application Server monitoring tools are recommended as follows:

4.2 Platform Monitoring
The monitoring machine mainly monitors the service or resource usage of the cluster server under test, such as various open source monitoring tools, MRTG: Traffic monitoring, CACTI: Traffic warning, performance report SMOKEPING:IDC quality control; Integrated Monitoring: Nagios, Zenoss, Ganglia , Zabbix, Sitescope, Hyperic HQ and so on.
Analysis of performance Test results
1. Indicators Analysis
The indexes of performance test can be divided into two kinds: Product index and resource index. For testers, the demand for performance testing comes from three aspects of user, development, and operation. Users and development focus on the product metrics related to the business requirements, and the OPS people are concerned with the resource metrics associated with hardware consumption.

(1) Indicators of concern from the user's point of view
Users are concerned with a single business-related experience, such as how fast a response is, whether a request is successful, whether a connection fails, and so on, which reflects a single business-related metric, including:
A. Success rates B. Failure rate C. Response time
(2) Indicators of concern from a development perspective
Developers are more concerned with metrics at the system level.
A. Capacity: What is the maximum amount of user access that the system can carry? What is the maximum amount of business processing?
B. Stability: Whether the system supports 24x7 (one week) of business access.
(3) Indicators of concern from operation and maintenance perspective
Operations personnel are more concerned about the consumption of hardware resources.

The above shows that the tester in the selection of indicators need to stand at the user's point of view to think, in addition, in order to follow up to better analyze the problem, but also need to master the characteristics of the components or operating principles related to the performance indicators.

For example, usually the interface system will directly or indirectly access the database layer media (such as MySQL, Oracle, SQL Server, etc.), at this time we need to consider the interface system under pressure to produce the performance of the storage medium, usually we will choose to analyze the indicator as follows:
(1) Number of connections (Connections)
(2) Queries per second/number of transactions per second (QPS/TPS)
(3) Number of disk IO per second (IOPS)
(4) Cache hit ratio (buffer Hits)
(5) Number of deadlocks occurring per second (Dead locks/sec)
(6) Number of read/write sections per second (Read/write bytes/sec)
For Windows or Linux platform specific metrics monitoring and analysis methods, please see the attached:
Windows operating system performance monitoring tool and metrics analysis V1.0. pdf
Linux operating system performance monitoring and Analysis manual V1.0. docx
2. Modeling Analysis
2.1 Barber Shop Model

The figure shows the 1 standard software performance models. There are three curves in the graph that represent the utilization of resources (utilization, including hardware resources and software resources), throughput (throughput, which is the number of transactions per second), and response time (Response times). The axis horizontal axis of the graph shows the increasing number of concurrent users (numbers of Concurrent users) from left to right.
In this diagram we can see that, at first, as the number of concurrent users increases, the resource occupancy and throughput increases correspondingly, but the response time changes little; However, when the number of concurrent users increases to a certain level, the resource consumption is saturated, the throughput growth slows down or even stops increasing, while the response time is further extended. If the number of concurrent users continues to grow, you will find that the hardware and software resource usage continues to saturate, but the throughput starts to fall, the response time is clearly beyond the acceptable range of the user, and ultimately causes the user to abandon the request or even leave.
According to this performance, the figure is divided into three regions, namely light Load (lighter pressure), Heavy load (heavier pressure) and buckle zone (the user can not stand and abandon the request). The number of concurrent users at the junction of light load and heavy load two zones, which we call "best concurrent users (the Optimum number of Concurrent users)", while heavy load and buckle The number of concurrent users at the junction of zone two zones is called the maximum concurrent users (the Maximum number of Concurrent users).
When the load of the system equals the optimal number of concurrent users, the overall efficiency of the system is the highest, no resources are wasted and the user does not need to wait; When the system load is between the optimal number of concurrent users and the maximum number of concurrent users, the system can continue to work, but the user waits longer, the satisfaction starts to decrease, and Will eventually cause some users to give up, and when the system load is greater than the maximum number of concurrent users, it will be doomed to cause some users to be unable to tolerate the extra long response time and abandon. So we should ensure that the optimal number of concurrent users is greater than the average load of the system.
2.2 Pressure Change model

With the continuous increase of the unit time flow, the pressure of the system being measured is increasing, the server resources will be consumed continuously, the TPS value will change because of these factors, and conform to certain laws.

In the diagram:
Point A: Performance expectations
Point B: Higher than expected, system resources at a critical point
Point C: Above expectations, inflection points
D-point: system crashes over load
2.3 Capacity Calculation model

Take a website performance test as a case:
1. By analyzing operational data, you can know how many PV systems are currently processed per hour.
2. Through the load test, you can know the maximum number of PV processing per hour of the system

That sort of.

System per hour PV processing remainder = maximum PV processing per hour-the number of PV processed per hour by the system

Assuming that the Web site user load is basically linear growth, the number of existing system users is 700,000, according to the Operation promotion plan, 1 years the site development users will reach 10 million, that is, 14 times times the growth. That is to be organized:

System PV processing increase per hour = number of PV processed per hour in the current system * 14-PV number processed per hour for the current system

System load increase per day = 100%/365 = 2.74 (note: The load for the future number of system users is defined as 100%)

System PV processing increase per day = System hourly PV Processing increase * Daily system load increase rate * 24

So, we can know under normal load conditions:

The system can support uptime = The amount of PV processing remaining in the system per hour * 24/System PV processing increase per day

Let's say that the number of days for subsequent deployment upgrades for this site is known, so we can know the number of days ahead of upgrade:

The system supports uptime-the number of days to deploy the upgrade.
IX. Performance Test pass Standard
1. All planned tests have been completed.
2. All performance data collected by the program has been obtained.
3. All performance bottlenecks are improved and meet design requirements.
Ten, the performance Test book recommended

Xi. Performance Test Report template
See annex:
Performance test report template. doc

Interface Performance test Scenario white paper V1.0

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More