Benchmarking is a stress test for system design. The goal is to master the performance of the system, or to reproduce a system state, or to perform reliability testing of new hardware.
1 Why do benchmark tests
Benchmarking is the only convenient and effective way to understand what happens to a system under a given workload. Benchmark tests can hang the system's behavior at different pressures, evaluate the system's capacity, understand what changes will occur, and how the system handles different types of data.
Benchmarks can be tested beyond the actual load by creating fictional scenarios. Benchmarks can be completed:
- Verify that some assumptions are based on the system to confirm that these assumptions are consistent with the current situation
- To reproduce some of the abnormal behavior in the system, these exceptions have been resolved
- Test the current operating conditions of the system.
- Simulates a higher load than the current system, and has identified bottlenecks that can be encountered as the pressure increases
- Plan for future business growth, assess what hardware is needed for future projects, and how much capacity it takes to network. This helps reduce the risk of system upgrades and major changes
- Testing the ability to adapt to variable environments, you can find the performance of the system under random concurrency values, or the performance differences between different configuration servers.
- Test the configuration of different hardware, software and operating systems, such as RAID5 or RAID10, for the current system
- Verify that the purchased equipment is properly configured
- Creating unit Tests
The main problem with benchmarking is that it is not really a stress test. Benchmark time is usually simpler for the system than for real stress. It's not predictable and changeable, and sometimes it's too complicated to explain. So using really stress tests may be difficult to analyze the correct conclusions from the results.
And the difference between the pressure and the real pressure at the time of registration: Many factors affect benchmark testing, such as data volume, data, and query distribution, and most importantly, benchmarks often require the task to be completed as soon as possible, so it usually causes too much pressure on the system.
The benchmark test can only approximate the test to determine how much headroom the system has (that is, the TPS that can withstand the peak).
2 Strategy for benchmark testing
There are two main strategies of benchmark testing:
- Integrated: Overall testing for the entire system
- Single-component: Test MySQL separately
Integrated testing is usually due to:
- Test the entire application system, including Web servers, application code, networks, and databases. Because users focus not only on the performance of MySQL itself, but on the overall performance of the application
- MySQL is not always a bottleneck to apply, so it can be revealed through the overall test
- Only by doing the whole test of the application can we find the mutual influence of each part.
- Can reveal the real performance of the application.
But integrated benchmark testing is difficult to establish, and even difficult to set correctly, if the design is unreasonable, can not reflect the real situation.
A single component test is usually due to:
- Need to compare schema or query performance in different scenarios
- Test for a specific problem in your application
- Avoid lengthy benchmarks to react to overall performance with a short-term benchmark test
Setting benchmarks that are really data is time-consuming. If you want to test the performance of your application after scale expansion, it can only be done by simulating a large amount of data and pressure.
3 Test target
The test target determines what test tools and odd numbers to choose.
It's time to test different metrics in different ways:
3.1 Throughput
Throughput refers to the number of transactions per unit of time, which has been a classic database application test metric. Test criteria such as: tpc-c
This type of benchmarking is primarily for online transaction processing (OLTP) throughput and is suitable for multi-user interactive applications.
Common test units are per-second transaction-time TPs, which also takes advantage of every minute of transaction processing (TPM)
3.2 Response time or delay
Response time is used to test the overall time required to complete a task, depending on the specific reference, the unit of test time is possible. The average response time, the minimum response time, the maximum response time, and the percentage are calculated according to the different time units.
The maximum response time is usually not significant. Use a percentage response time instead of the maximum response time, for example, 95% response time is 5ms
Use charts to help interpret test results, to plot test results as line charts, or scatter plots, and to visualize the distribution of data result sets in a disguised manner
3.3 Concurrency
Concurrency typically identifies how many users are browsing access at the same time, in how many sessions. But HTTP is stateless, so a more accurate concurrency description of the Web server's concurrency is the number of concurrent requests that occur concurrently in a unit of time.
Applications can test concurrency at various points. The high concurrency of the Web server generally results in high concurrency of the database, but the language or toolset used by the server will have an impact on the database. So a well-designed application, colleagues can open hundreds of MySQL database server links, But at the same time, there should be only a few links to the actual query work. That is, the Web server should do, when hundreds of users access, only more than 10 concurrent requests to reach the database.
Concurrency benchmarks are concerned with concurrent operations at work, or the number of threads at work, or the number of connections, and when concurrency is driving, you need to test whether the throughput is falling and whether the response time is getting longer.
3.4 Extensibility
Scalability is to increase the number of times the system work, in the ideal case should increase throughput, or in the system to add half of the hardware resources, you can get twice times the throughput.
Testing scalability is necessary in situations where the system's business pressures can change.
Scalability metrics are useful for capacity specifications and can provide information that other tests cannot provide to help uncover application bottlenecks.
4 Benchmark test methods
In benchmark tests, the following errors can cause inaccurate test results:
- Use a subset of TRUE data
- Using the wrong data distribution
- Single-user testing of values in multi-user scenarios
- Test distributed applications on a single server
- Does not match real user behavior
- There are no check errors.
- Ignoring the system preheating process: It takes a while before the system restarts to achieve normal system glory Lucas. Need to be aware of warm-up time
- Using the server's default configuration
- Test time is too short
4.1 Design and Planning benchmarks
The first step in planning a benchmark is to raise a clear target for a problem. Then decide whether to use standard benchmarks or design-specific tests.
If it is a standard benchmark test, you should confirm the selection of the appropriate test plan.
The benchmark for design use is complex, requiring an iterative process that first requires a snapshot of the resulting data, and the snapshot is easy to restore for subsequent testing.
You can record queries at different levels (that is, a benchmark, while documenting both integrated and single-component results)
Even if you do not need to create a dedicated benchmark, you need to record the test plan in detail. The test may need to be run repeatedly, so an accurate repetition of the test process is required.
4.2 Baseline test Run time
The benchmark should run long enough. Because the system warms up.
If you need to test the performance of the system in a stable state, you need to achieve this stability state, so it may require very long trial teaching.
Most systems have some headroom to deal with emergencies, absorb performance spikes, and postpone some work to the peak.
After the system warms up, IO activity may tend to stabilize in three or four hours
4.3 Getting the system performance status
You need to write a script that uses tools to get the current system's Cpu,io, memory, and other data.
4.4 Accurate test Results
Obtaining accurate test results requires:
- The right benchmark test
- Test data is correct
- Test standard is correct
- Reproducible test Results
- When testing, the system is in the same state
- When testing, the system is configured correctly. MySQL's default configuration test is meaningless. The default configuration is based on performance applications that consume very little memory.
4.5 Results Analysis
In general, automated tests can be used to obtain more accurate test results, preventing occasional missed steps or misoperation by testers during testing.
Automated test procedures are typically written using scripting languages.
The performance is plotted in chronological order as an icon to facilitate analysis.
5 Benchmark Test Tools
If you do not need to develop your own benchmarking system, you can use the following test tools.
5.1 Integrated Test Tools
Ab,webbench,http_load,tcp_copy
5.2 Single-Component test tools
Mysqllslap: Can simulate the load of the server, output even information.
MySQL Benchmark Suite
Super Smack
Database Test Suit
Percona ' s Tpcc-mysql tool
Sysbench
5.3 MySQL built-in functions
benchmark(次数,命令)
6 Test Cases
Jump
No. 02 MySQL Benchmark test