Eight common mistakes in System Performance Optimization
I. throughput and response time
The system throughput reflects the capacity and load of a system. Many systems use this metric to measure the system performance. The response time is often easier to ignore. I think throughput is more about the stability of a system under a specific pressure, and the response time can better describe the system performance. If a request response time does not meet the requirements, then the system's high throughput is meaningless. For example, for a common website page, if the customer can respond to a request within ms, it is very good. If the customer can respond within 2 seconds, it is also okay, however, if the response takes 20 seconds, it is estimated that no one will use it. For LAN applications, such as billing operations in the business office, such response time is 2 seconds, which will make the salesperson obviously uncomfortable.
Ii. Ignoring system environment differences
Do we often see good offline performance tests, poor online performance problems, or good environment A and poor performance in Environment B. In most cases, there are differences in the system environment. For example, the two environments have different hardware, different configuration parameters, different data scales, and different cache hit rates. During performance testing, you need to thoroughly analyze various data details in the formal environment, and then conduct targeted simulation during performance testing.
Iii. Useless performance tests
Performance testing is a very complex task and the most challenging work of Human Computer skills. Performance testing is not only to learn how to use tools such as LoadRunner or jmeter, it is more important to analyze users and business scenarios, estimate and verify the system performance capacity, identify performance bottlenecks, and solve them. Proficient in testing tools can improve work efficiency. The reason why some people will propose the useless performance testing theory is that the formal environment is too complex to effectively simulate the bottleneck of the formal environment. In fact, this is also the difficulty of performance testing, how to simulate performance bottlenecks in different environments. If you are a common test engineer, it is estimated that only the performance test environment will be built based on the business logic and the test results will be given. Senior test engineers should be clear about the system architecture, application logic, business scenarios, data distribution, hardware performance, and so on, and finally give meaningful performance test simulation scenarios and data.
Data Distribution and cache hit rate are the most likely to be ignored in performance tests. Data Distribution in the formal environment can be sampled through online data, and no formal data can only be evaluated based on the business. For example, what is the average individual to-do ticket in a workflow application? In e-commerce applications, what are the review records of Hot commodities? The data distribution has a significant impact on the performance test results.
The impact of the cache hit rate on the performance test results is even more terrible, which may be 10 times or even times. Such as CPU cache for memory cache, memory for hard disk data cache, memcached for DB data cache, and browser local for remote cache. We need to carefully analyze the actual cache hit data for performance testing, then simulate the maximum difference, normal value, and best value for evaluation, and finally analyze the impact of the cache hit rate on the real performance.
Iv. Lack of performance quantification
Performance quantification refers to the calculation of performance indicators for system functions or main hardware indicators, such as all overhead of a query request, including network overhead, Application Server overhead, and database server overhead, or more detailed CPU overhead, memory overhead, Io overhead, etc. Performance quantification also includes the hardware indicators used by the system, including CPU performance, memory capacity and performance, hard disk bandwidth and iops, network bandwidth and latency, and so on. Without such basic data, it is difficult to quantify the performance. Otherwise, you can only perform simple surface performance tests and provide some perceptual data. Therefore, it is impossible to accurately evaluate the overall performance and capacity of the system. Without solid basic performance quantification data, it is hard to figure out the overhead of a logic in each link. performance optimization can only be performed based on feeling or experience.
V. hardware costs
In the eyes of iters, the hardware cost is very high, and the software cost is very low or even negligible. Because the hardware needs to be purchased, there is basically no free hardware, and the software can be open-source and free, or you can develop it yourself or even use pirated copies. Therefore, the performance problem is:ProgramThe first thought of optimization software performance. However, in this era of increasing labor costs, decreasing hardware costs, and increasing hardware performance or capacity with Moore's fixed rate, we should also pay attention to hardware optimization methods.
If it is a bottleneck on the server network, the upgrade from Mbit/s interfaces to Mbit/s and from Mbit/s to multiple ports can solve performance problems quickly. If the throughput of a single SATA hard disk is insufficient, you can change the SAS 15 K hard disk to double the throughput. If not, you can choose multiple hard disks for raid, achieve a nearly linear increase in throughput within an order of magnitude. If the hard disk iops is low, you can choose to change the sata ssd hard disk to increase the iops by more than 10 times, if the requirement is higher, you can choose to change the PCIe SSD hard drive, which can increase the iops by more than 100 times. If the memory is insufficient, the memory capacity can be increased. The current single 4 GB memory, 8 GB memory capacity is cost-effective. CPU upgrades are generally troublesome because of the impact of the CPU architecture, and the rapid development of the CPU, the cost is high, so there is little to do, for the old server CPU is insufficient, generally choose to directly eliminate, purchase a new one.
By upgrading hardware, you can quickly solve system performance problems and offer a good price for predictable system capacity. However, the top configuration or new hardware is too expensive, and the latest hardware often has some unknown bugs. Therefore, the hardware upgrade generally does not select a device with a new architecture within one year, however, it is generally better to choose more mature hardware over two years. However, hardware upgrades are often limited, and top-end or high-performance hardware is often less cost-effective. Therefore, after the hardware upgrade solves the problem, you need to analyze the problems that cause more hardware costs due to business growth. Selecting software optimization or hardware optimization is a technical cost balancing decision. Sometimes the software also needs to make specific optimizations for the hardware.
Vi. cache power
Caching is a good thing. Basically, 90% of System Architecture Optimization focuses on how to make good use of caching. Cache is really everywhere, hardware, hard disk cache, RAID card cache, storage cache, primary storage, numa features, CPU L3-L2-L1 and so on. In terms of software architecture, global data cache, private data cache, connection pool, Application Server cache, web server cache, CDN cache, Client File Cache, and client memory cache are available. Basically, a large system will have a lot of caches, otherwise it will require a very high hardware investment to solve the problem. The hardware cache is usually intelligent, or we do not need to modify the configuration in 99% cases, even if the performance improvement caused by the modification is generally not too much, unless your software has obvious defects, you have a deep understanding of hardware and software features. The software Cache architecture brings about performance improvement and also brings about negative problems, such as architecture complexity, data synchronization, real-time data, high maintenance costs, and complicated system debugging, therefore, any Cache architecture in the software architecture needs to be thoroughly analyzed to determine whether it is necessary. In my opinion, if a Cache architecture is added, it must be improved by at least five times. Otherwise, the cost will be analyzed. For small and medium systems, it is not recommended to have a complex Cache architecture, because it makes more sense to make the system develop faster than to provide better performance. The complicated Cache architecture usually requires more labor costs.
VII. Customer first
I don't know who proposed the customer first, but this word cannot be understood literally. Otherwise, it will lead us down the road. First of all, who are our customers, who are our direct customers, who are our end customers, and what we pay for is our customers? The customer type 1 conflicts with the customer type 2, and we should be the first. How to deal with conflicts between important customers and most customers. These seemingly unrelated topics of system performance optimization often influence all aspects of our system design. Does our work enable most users to think that the system is more complex or more difficult to use to optimize the interface process or performance of individual customers.
VIII. Over-Optimization
We all know the importance of performance optimization. Therefore, many of our experiences are system optimization, which sometimes leads to over-optimization errors. Some people say that his performance optimization has increased by 10 times, 100 times, and 1000 times, but sometimes we don't need this approach. This is more of a learning process in his optimization skills. When doing optimization, we should ask more questions: What changes does your optimization make the system:
Does your optimization make the system more complex and the maintenance cost will be higher in the future?
Is your optimization effective for most business scenarios, or is it just to make other main businesses slower or more complex for a small part of the scenario optimization?
Does your optimization have a sense of customer experience?
Is your optimization method sustainable and effective, or is it ineffective if the business logic changes slightly?
......
I think there must be a goal for system optimization, rather than simply improving the performance, because the performance optimization will certainly bring more or less system complexity. When the system performance optimization meets the requirements, it is necessary to determine whether it will make the system architecture more scalable, or the system stability, maintainability, and other problems.
Ye Zhengsheng
2012-03-20
My Sina Weibo: http://weibo.com/yzsind