Summary of common performance optimization strategies

Last Update:2016-12-05 Source: Internet

Author: User

Tags connection pooling ranges sorts

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Add by Zhj: I personally feel that performance optimization analyzes what factors affect performance, then sorts by the size of the influence, and then sorts.

Then further analyze why each factor affects performance, find out these factors, and then sort by size of influence. Basically, after

The analysis of these two layers is basically enough. Consider solutions to these factors.

1. Database tier

Our goal is to reduce IO access, or Load balance IO access to multiple servers and compute in parallel.

1.1 Database data is stored on the hard disk, the hard disk access speed is much slower than memory, that is, IO

1.2 Large data volume results in more scanning records, which indirectly leads to IO

1.3 All database accesses are centralized on a single database server, with a heavy load

2. Application Layer

Load Balancing

3. Front End

Original: http://tech.meituan.com/performance_tunning.html

In this article, I would like to thank one of the judges in the grading process, who proposed to refine and summarize the various performance-optimized cases and programs that were previously done, to precipitate them in the form of documents and to share them internally. Strive to achieve the following effects:

1. The formation of practical, can be used for reference, a variety of performance optimization options and selection considerations, while cooperating with specific real cases, other people encounter similar problems, do not start from scratch.

2. To broaden the horizons, in addition to performance optimization, but also to provide general common ideas and options for the selection of considerations, to help you develop in the selection of the concept of awareness, thinking and the ability to make various tradeoffs.

The article in the internal sharing, aroused strong sharing, has been a lot of colleagues and friends of the recognition and praise, feel that the daily work has a good guiding role. Given that these experiences may also be helpful to industry peers, it is publicly available in the U.S. team reviews technical teams blog.

Common performance optimization Policy classification codes

The reason to put the code first, because this is the most likely to cause the neglect of technical personnel. After a lot of technicians get a demand for performance optimization, words must be called cache, Async, JVM, and so on. In fact, the first step should be to analyze the relevant code, find the corresponding bottleneck, and then consider the specific optimization strategy. Some performance problems, completely because the code is not reasonable to write, by directly modifying the code can solve the problem, such as the For loop too many times, made a lot of meaningless conditional judgment, the same logic repeated several times .

Database

The tuning of the database is generally divided into the following three parts:

SQL tuning

This is the most common, every technician should master the basic SQL tuning means (including methods, tools, auxiliary systems, etc.). Here in MySQL, for example, the most common way is the slow query log or open-source slow query system to the specific problem of SQL, and then use explain, profile and other tools to gradually tuning, Finally after the test to achieve the effect of online. For details on this, you can refer to MySQL indexing principle and slow query optimization.

Architecture-level tuning

This type of tuning includes read and write separation, multi-slave load balancing, horizontal and vertical sub-database tables , and so on, the general need for large changes, but the frequency is not high SQL tuning, and generally requires the DBA to participate. So when do you need to do these things? We can use internal monitoring alarm system (such as Zabbix), regular tracking some indicators of data to reach the bottleneck, once the bottleneck or alert value, we need to consider these things. Typically, DBAs also regularly monitor these metrics values.

Connection Pooling Tuning

In order to achieve efficient access to database connections and to limit the flow of database connections, our application usually adopts the scheme of connection pooling class, that is, each application node manages a pool of connections to each database. With the increase of business access or data volume, the original connection pool parameters may not be able to meet the requirements, this time need to combine the current use of connection pooling principle, the specific connection pool monitoring data and the current business volume as a comprehensive judgment, through repeated several debugging to get the final tuning parameters.

Cache classification

Local cache (Hashmap/concurrenthashmap, Ehcache, guava cache, etc.), cache service (Redis/tair/memcache, etc.).

Usage Scenarios

What are the appropriate conditions for caching? Consider the following two scenarios:

In a short time, the same data repeatedly query repeatedly and the data update is not frequent, this time you can choose to first from the cache query, query and then load from the database and back to the cache way. This scenario is more suitable for single-machine caching.
High concurrency query hotspot data, back-end database overwhelmed, can be used to carry the cache.

Selection considerations

If the amount of data is small and does not grow and empty frequently (which results in frequent garbage collection), you can choose the local cache. Specifically, if you need some policy support (such as cache full eviction strategy), you can consider Ehcache, if you do not need to consider hashmap, if you need to consider multi-threaded concurrency scenarios, you can consider concurenthashmap.
In other cases, the caching service can be considered. At present, from the input of resources, operational dimension, whether dynamic expansion and supporting facilities to consider, we will give priority to Tair. Unless the current tair is not yet supported (e.g. distributed lock, hash type value), we consider Redis.

What is the design key point when the cache is updated? How to ensure the reliability and real-time of the update?

Updating the policy of the cache requires specific analysis of the problem. Here is the cache data for store POI as an example to illustrate the cache service type of cache update strategy? There are currently about 100,000 POI data using Tair as the caching service, with two specific updates:

Receive updates of store changes in real time.
Sets the 5-minute expiration time for each POI cache data, which is loaded from DB and then back to DB after it expires. This strategy is a powerful complement to the first strategy, which resolves the issue of the first policy failure caused by manual changes to the DB without messages, temporary errors in the message updater, and so on. Through this double-insurance mechanism, the reliability and real-time of POI cache data are effectively ensured.

is the cache full, and what happens when the cache is full?

For a caching service, in theory, with a growing cache of data, the cache is bound to be full one day, with limited capacity. How to respond?
① to the cache service, choose the appropriate cache eviction algorithm, such as the most common LRU.
② for the current set of capacity, set the appropriate alert values, such as 10G cache, when the cache data reached 8G, began to alarm, early troubleshooting problems or expansion.
③ give some keys that are not necessary for long-term retention and try to set the expiration time.

is the cache allowed to be lost? What if I lose it?

Depending on the business scenario, the loss is allowed. If not, a caching service with persistence is required to support it, such as Redis or Tair. In more detail, you can choose a more specific persistence strategy, such as a redis RDB or AOF, based on your business's tolerance for lost time.

Cache is "Breakdown"Problem

For some keys that have an expiration time set, these keys can be very "hot" data if they are accessed at some point in time that is extremely high and concurrent. At this point, you need to consider another issue: the cache is "penetrated" .

Concept: When a cache expires at a certain point in time, there are a large number of concurrent requests to the key at this point in time, and these requests find that cache expiration typically loads the data from the backend DB and back to the cache, when large concurrent requests can instantly overwhelm the back-end db.

How to solve: the industry's more common practice is to use mutexes. Simply put, when the cache fails (the value is empty), instead of going to load db immediately, use some of the cache tool's operations with a successful return value (such as Redis's setnx or memcache add) to set a mutex key, When the operation returns successfully, the operation of load DB is performed and the cache is reset, otherwise the entire get cache method is retried. Code similar to the following:

  public String getif (value = = null) {//represents cache value Expiration //set a timeout of 3min to prevent the Del operation from failing, the next cache expiration cannot load db if (redis.setnx (Key_mutex, 1, 3 * 60) = = 1) {//representative set success value = Db.get (key); Redis.set (key, value, expire_secs); Redis.del (Key_mutex); } else {//this time the other threads at the same time are already load db and back to the cache, then retry to get the cache value to sleep (50); Get (key); //Retry}} else {return value;}}

Asynchronous usage Scenarios

For some client requests, the server may need to do something subordinate to these requests, which the user does not care about or the user does not need to immediately get the results of these things, which is more appropriate to deal with these things in an asynchronous manner.

Role

Shorten the interface response time, make the user's request return quickly, the user experience is better.
Avoid threads running for long periods of time, which can cause the thread pool to have an insufficient number of threads available for a long period, which in turn causes the thread pooling task queue length to increase, blocking more request tasks and making more requests unavailable for technical processing.
A thread that is running for a long time may also cause a series of problems, such as system load, CPU usage, and overall machine performance degradation, and even an avalanche. The idea of async can effectively solve this problem without increasing the number of machines and CPUs.

Common practices

one way to do this is to create additional threads, where you can take an extra thread or use the thread pool to process the corresponding task on the IO thread (processing the request response), and let response return first in the IO thread.

If the amount of data that the asynchronous threading task designs is very large, then the blocking queue Blockingqueue can be introduced for further optimization. The practice is to allow a batch of asynchronous threads to constantly throw data into the blocking queue, and then add a processing thread that loops batches of pre-sized batches of data from the queue for batch processing (such as sending a bulk remote service request), which further improves performance.

Another approach is to use the Message Queuing (MQ) middleware service, where MQ is inherently asynchronous. Some additional tasks may not need to be handled by my system, but need to be handled by other systems. This time can be encapsulated into a message, thrown into the message queue, through the reliability of the message middleware to ensure that the message delivery to the system concerned about it, and then let the system to do the corresponding processing.

For example, C end in the completion of a bill of lading action, may require other end to do a series of things, but the results of these things will not immediately affect the C end user, then the C end of the order of the request response to the first return to the user, return to MQ to send a message. And these things should not be the C-side of the scope of responsibility, so this time with the MQ way to solve this problem is the most appropriate.

The difference between NoSQL and caching

First of all, the description here is different from the cache section, although the same data storage scheme (such as Redis or tair) may be used, but not in the same way, this section describes using it as a db. If used as DB, it is necessary to ensure the availability and reliability of the data storage scheme effectively.

Usage Scenarios

It needs to be combined with specific business scenarios to see if the data involved in this business is suitable for nosql storage, whether the data is manipulated in a nosql fashion, or if some of the extra features of NoSQL (such as Atomic plus minus) are needed.

If business data does not need to be associated with other data and does not require support such as transactions or foreign keys, and it is possible that the write will be unusually frequent, it is more appropriate to use NoSQL (such as HBase).

For example, the United States of America reviews inside has a exception to do the monitoring system, if the application system serious failure, it may be a short period of time to generate a large number of exception data, this time if the use of MySQL, will cause the moment of MySQL write pressure soared, Problems such as the rapid deterioration of the MySQL server's performance and the master-slave synchronization delay are likely to be more suitable for use in hbase-like NoSQL storage.

What is the JVM tuning time to tune?

Monitoring alarms for some machine key metrics (GC time, GC count, changes in memory size for each generational, load and CPU usage of the machine, number of threads for the JVM, etc.) are monitored by monitoring systems (such as systems that do not have a ready-made system, and a simple escalation monitoring system). You can also look at the output of commands such as GC log and Jstat, and then combine the performance data and the request experience of some of the key interfaces of the online JVM process service to basically pinpoint whether there is a problem with the current JVM and whether it needs tuning.

What's the tune?

If you find that peak CPU usage and load values are large, you can observe some JVM thread count and GC count (which may be mostly young GC count), if both values are larger than before (and can be compared to a historical experience value). Basically it can be positioned as young GC frequency is too high, this time can be solved by properly increasing the size or proportion of young area.
If you find that the critical interface response time is slow, you can combine GC time and the Stop the world in GC log to see if the entire app's stop time is more. If it is, you may need to reduce the total GC time, specifically from the number of times to reduce the GC and reduce the times of the single GC two dimensions to consider, generally speaking, these two factors are mutually exclusive factor, we need to adjust the corresponding parameters according to the actual monitoring data (such as the new generation and laosheng generation ratio, Eden and survivor ratio, MTT value, trigger the old zone ratio threshold of CMS recovery, etc.) to achieve an optimal value.
If a full GC or Old CMS GC is very frequent, this usually causes the STW to increase in length, which can also lead to slower interface response times. In this case, the approximate rate is a "memory leak", and the memory leak in Java means that some objects that should be freed are not released (and references are pulled). So how are these objects produced? Why wouldn't it be released? Does the code correspond to a problem? The crux of the problem is to understand this, find the appropriate code, and then remedy the situation. So the key to the problem is to turn it into looking for these objects. How to find? By using Jmap and Mat, you can basically navigate to specific code.

Multi-threading and distributed usage scenarios

Offline tasks, asynchronous tasks, big Data tasks, run-time long tasks * *, and use them appropriately to achieve accelerated results.

Note: In the case of high response time on the line, use less multithreading, especially when the service thread needs to wait for the task thread (many major accidents are related to this), if you must use, you can set a maximum waiting time for the service thread.

Common practices

If the processing power of a single machine can meet the needs of the actual business, then use single-threaded processing as much as possible to reduce the complexity; Conversely, a multi-machine multi-threading approach is required.

For single-machine multithreading , you can introduce a thread pool mechanism, the role of two:

Improve performance and save on thread creation and destruction overhead
Current limit, to the thread pool A fixed capacity, to reach this capacity value after the task came in, queued to ensure the stability of the machine under the limit pressure to use the JDK's own thread pool, you must carefully understand the meaning of the various parameters of the construction method, such as the core Pool size, max pool size, KeepAliveTime, worker queue , etc., are optimized by constantly testing and adjusting these parameter values on an understanding basis.

If the processing power of a single machine can not meet the demand, this time need to use multi-machine multi-threaded way. This time you need some knowledge of distributed systems. The first step is to introduce a separate node as the scheduler and the other machine nodes as executor nodes. The scheduler is responsible for splitting tasks, and distributing tasks to the appropriate executor nodes, and the executor node performs the tasks in a multithreaded manner (and possibly a single thread). At this point, our entire mission system is transformed from a click into a clustered system, and different machine nodes have different roles, each with their own functions, and interactions between the nodes. At this time, in addition to the multi-threading, thread pool and other mechanisms, such as RPC, Heartbeat and other network communication calls mechanism is not very small. Later I will make a simple distributed scheduling run framework.

Measurement system (monitoring, alerting, service dependency management)

Strictly speaking, the measurement system is not a category of performance optimization, but it is closely related to performance optimization, which can be said to provide a strong data reference and support for performance optimization. No measurement system, basically there is no way to locate the problem of the system, there is no way to effectively measure the effect of optimization. Many people do not value this aspect, but I think it is the cornerstone of system stability and performance assurance.

Key processes

If you want to design this system, what are the key processes in general that need to be designed?
① Determining the indicator
② Data acquisition
③ calculate data, store results
④ Presentation and Analysis

What metrics data do I need to monitor and alarm? What do you need to focus on?

According to the requirements, the main needs of two indicators:

Interface performance-related, including a single interface and all of the QPS, response time, call volume (Statistical time dimension of the finer the better; it is best to view related data either as a node dimension or as a service cluster dimension). It also involves the management of service dependencies, which requires a service-dependent management system
A single machine node, including CPU usage, load value, memory usage, network card traffic, and so on. If the nodes are special types of services (such as MySQL, Redis, tair), you can also monitor some of the key metrics that are specific to these services.

Data acquisition method

Asynchronous escalation is usually used in two ways: first, the local flume port, which is collected by the flume process into a remote Hadoop cluster or storm cluster, and the second, is sent to the monitoring server using both asynchronous and local queues, directly after the local operation.

Data calculation

You can use the offline operation (mapreduce/hive) or real-time/quasi-real-time operation (Storm/spark) method, the results of the operation into MySQL or hbase, in some cases, can also not calculate, direct acquisition to the monitoring server.

Presentation and Analysis

Provide a unified display analysis platform, need to have a report (List/Chart) monitoring and alarm functions.

Real Case analysis Case one: Business and control area relationship Refresh job background

This is a job that runs on a regular hourly basis to refresh the relationship between the merchant and the control area. The specific rules are according to the Merchant's distribution range (multiple) and the control area whether there is a intersection, if there is a intersection, the merchant is zoned to the scope of this control area.

Business requirements

The shorter the need for this process, the better, the best to stay within 20 minutes.

Optimization process

The main processing flow of the original code is:

Get a list of distribution ranges and control areas for all stores.
Traverse the list of control areas for each control area:
A. Traverse the list of merchant's distribution ranges to find a list of distribution ranges that intersect this control area.
B. Traverse the list of merchant distribution ranges listed above, and go to the merchant ID to save it to a collection.
C. Batch according to the above Merchant ID collection, take to the corresponding merchant collection.
D. Traverse the Merchant collection above to get each business object and handle it (depending on whether it is a popular merchant, self-employed, online payment, etc. to determine whether the relationship between the merchant and the control area needs to be inserted or updated).
E. Delete the list of merchant relationships that the control area currently has, but should not exist.

Analyze the code, find the step A and B of step 2nd, find out the distribution range set that intersects a control area and go to the merchant ID to be optimized by R-Tree spatial index. The specific approach is:

The task begins by updating the R-tree and then using the structure and matching algorithm of the R tree to get a list of distribution range IDs that intersect the control area.
Then batch according to the Distribution range ID list, get the distribution range list.
Then for this batch of distribution range list (the number is very small), with the original polygon intersection matching method to do further filtering, and the filtered merchant ID to go heavy.

This optimization has been launched in the first phase of optimization, and the entire process has been reduced from more than 40 minutes to less than 20 minutes .

After the first phase of optimization was changed to R-Tree, it ran for a period of time, and as the volume of data increased, the performance began to deteriorate gradually and deteriorated to more than 50 minutes after one months. So go further into code analysis, find two optimization points, arrange the second phase optimization and online.

These two optimization points are:

The 2nd step of the C step, originally based on the Store ID list from the DB batch acquisition stores, can now be changed to Mget way from the cache batch acquisition (at this time the business data has been cached);
The 2nd step of the D step, according to whether it is a popular merchant, self-employed, online payment and other conditions to determine whether the need to insert or update the relationship between the merchant and the control area.

After-line effect

Through the log observation, the execution time from more than 50 minutes to less than 15 minutes , is intercepting the day of the 4 machine log time (in milliseconds ):

As you can see, the effect is still very obvious.

Case two: POI cache design and implementation background

2014 Q4, the database on the POI (here can be simply understood as a takeaway store) related data read a sharp rise, although added from the library node can solve part of the problem, but after all, the increase of the node will reach the limit, to reach the limit of the master from replication will reach the bottleneck, may result in inconsistent data. Therefore, it is urgent to introduce a new technology scheme to share the pressure of database, and reduce the read traffic of database POI related data. In addition, any scenario that takes into account the addition of DB from the library will cause some waste to the resources.

Implementation scenarios

Based on the proven technical solution, I chose Tair as the cache storage scheme to help the DB share the pressure of the read traffic from the POI data on each application side. The reasons are based on a combination of usability, high performance, scalability, the test of large-scale data on-line and high concurrent traffic, the availability of professional operations teams, and the maturity of tools.

Detailed design First edition design

Cached update policy, based on the characteristics of the business, the existing technical solutions and implementation costs, selected to use MQ to receive POI change messages to trigger the cached updates, but this process may fail, while the expiration policy of key is enabled, and the caller will first determine whether the expiration, such as expired, The data is loaded from the backend db and back to the cache, then back. Ensure the availability of cached data with two aspects.

Second Edition design

The first version of the design ran for a while, and we found two questions:

In some cases there is no guarantee of real-time data consistency (such as a technician manually altering DB data, using MQ to update the cache), which can only wait for a 5-minute expiration time, and some businesses are not allowed.
Adding an expiration time causes another problem: Tair will try to load data from the hard disk at the moment the cache misses, if the hard drive does not go back to the load data in db. This will undoubtedly further extend the tair response time, which not only increases the service timeout ratio, but also causes the performance of the tair to become worse.

To solve these problems, we learned from our colleagues in charge of infrastructure that databus can solve the problem of inconsistent cache data in some situations, and can eliminate the expiration time mechanism, thus improve the efficiency of query, and avoid Tair query hard disk when memory is not hit. And in order to prevent Databus single point of failure affecting our business, we have retained the previous MQ Message update cache scheme, made a switch to use this scheme for fault tolerance, the overall structure is as follows:

After-line effect

On-line, through continuous monitoring of data discovery, with the increase in the number of calls, the flow to the DB has been significantly reduced, greatly reducing the pressure of the db. The response time of these data interfaces has also been significantly reduced. The dual-guarantee mechanism of cache updating also basically guarantees the availability of cached data. See:

Case Three: Performance optimization background for business operations background related pages

With the rapid development of the business, the number of visits and the rapid increase in the volume of data, through our corresponding monitoring system can be found that some of the system's performance began to deteriorate. This is also evidenced by feedback from the user side. At this moment, it is necessary to quickly schedule, agile development, to tune these pages.

Welcome page

Requirements Background: Welcome page is to push people and even the headquarters of various role personnel into the home of the backstage, will show people most want to see the most concerned about the core data, its importance is self-evident, so the deterioration of the page's performance will seriously affect the user experience. Therefore, the first thing you need to optimize is the Welcome page. Through the corresponding positioning and analysis, it is found that there are two main causes of deterioration of performance: Data interface layer and computing presentation layer.
Solution: The remedy, divide and conquer. After careful investigation, analysis and positioning, the data interface layer adopts the method of the interface call batch, asynchronous RPC call to optimize effectively, the calculation presentation layer decides to use the method of pre-calculation, and then calculate the result cache to improve the query speed. Among them, the caching scheme chooses Redis according to the business scenario and technical characteristics. After the plan is set, the rapid development is on-line.
On-line effect: After-line performance comparison chart, as follows:

Organization Structure Page

Requirements Background: Organizational Structure page, the use of four-layer tree structure diagram, together with the loading, the first version on-line found that performance is very poor. Users are eager to tune the performance of this page.
Solution: After analyzing the code, navigate to a more classic question: SQL queries that perform too many small amounts of data. The result is that multiple SQL is combined into a large SQL, and then the local cache is used to cache the data, reasonably estimate the amount and performance of the data, and fully test and go live.
On-line effect: After-line performance comparison chart, as follows:

Order related Building page

Demand background: As the order volume is increasing, the data of order table accumulate more and more, the performance of the Order related building page is also getting worse (the response time is rising linearly). This page and the performance of the ground-pushing staff is closely related, so push people to use the page frequency is very high, the deterioration of performance greatly affected the user experience of the push people.
Solution: After analysis and design, decided to use the existing order two index month table to replace the original order table for the front-end query request, and limit the filter time conditions, so that the start time and end time of the filter can not be cross-month (prior and user communication, acceptable, to meet the basic needs of users), This will only take one months to index the table, with the appropriate functional constraints to achieve the performance tuning. This results in a collection of order IDs for the final paging from the two-level index monthly table, based on a variety of query criteria, and then the corresponding order data collection from the order database based on the order ID.
On-line effect: On-line found in the adjustment of the amount of almost no change in the case, performance improvement is obvious, such as:

Other

In addition to the above, optimization also involves front end, distributed File System, CDN, full-text index, spatial index and so on. Confined to space, we will leave it to the future to do the introduction.

Summary of common performance optimization strategies (RPM)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More