A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
For high-performance database operation, it is not enough to design the optimal library table structure, and to establish the best index, but also need reasonable design query. If the query is poorly written, high performance cannot be achieved even if the library table structure is reasonable and the index is appropriate. Query optimization, index optimization, library table structure optimization need to go hand in hand, one does not fall.
6.1 Why query speed is slow
In general, the life cycle of a query can be roughly in order: from client >> server >> parse on server >> generate execution Plan >> execute >> return results to client. Execution can be thought of as the most important phase of the entire life cycle, including a large number of calls to retrieve data to the storage engine, as well as post-invocation data processing, including sorting, grouping, and so on. Understanding the life cycle of a query and the time consuming of a clear query can make a lot of sense for optimizing queries.
6.2 Optimizing data access
The most basic reason for poor query performance is that there is too much data to access. Most poorly performing queries can be optimized by reducing the amount of data accessed.
1. Verify that the application is retrieving a large amount of data than is needed. This usually means that too many rows have been accessed, but sometimes it is possible to access too many columns.
2. Verify that the MySQL server layer is analyzing a large number of data rows that are more than needed.
6.2.1 If unwanted data was requested from the database
Requesting extra data can add additional burden to the MySQL server, increase network overhead, and also consume the CPU memory and resources of the application server. Here are some typical cases:
1, the query does not need the record: for example, in the news site to remove 100 records, but only on the page display 10. In fact, MySQL will query out all the results crow, the client's application will receive all the result set data, and then discard most of the data. The simplest and most effective solution is to add a limit after such a query.
2. Returns all columns when associated with multiple tables, for example:
3, always take out all the columns: every time you see a SELECT * you need to wonder if you really need to return all the columns? By removing all columns, the main optimizer is unable to complete an index overlay scan such optimizations, and it also brings additional IO, memory, and CPU consumption to the server. If your application uses some kind of caching mechanism, or if you have other considerations, it might be beneficial to get more data than you need, but don't forget what it costs. The query that gets and caches all the columns may be more beneficial than multiple independent queries that fetch only a subset of the columns.
4. Repeatedly query the same data: Do not repeatedly execute the same query, and then return the exact same data each time. It is obviously better to cache this data when it is first queried and take it out of the cache when needed.
6.2.2 MySQL is scanning for additional records
For MySQL, the simplest measure of query overhead is three metrics: response time, number of rows scanned, number of rows returned. These three metrics are recorded in a slow log of MySQL, so checking for slow logging is a good way to find queries that scan too many rows.
Response time is the sum of two parts: service time and queue time, generally common and important wait is IO and lock wait.
Number of rows scanned and number of rows returned
When you parse a query, it is helpful to view the number of rows scanned by the query. To a certain extent, it can be explained that the query finds the data it needs to be inefficient. Ideally, the number of rows scanned and the number of rows returned should be the same. Of course, this is just the ideal situation. Generally, the ratio of scanned rows to the number of rows returned is usually very small, typically between 1:1 and 10:1.
Number of rows scanned and type of access
There are several ways to access MySQL to find and return a row of results. Some access methods may require many rows to be scanned to return a row of results, and some access methods may return results without having to scan.
The access type is returned in the Type column of the explain statement. If the query has no way to find the right type of access, the best way to solve it is usually to add a suitable index. The index allows MySQL to find the records that are needed in the most efficient and minimal way to scan rows.
In general, MySQL is able to apply the Where condition in the following three ways, from good to bad:
1. Use the Where condition in the index to filter for unmatched records. This is done at the storage engine level.
2. Use index overlay Scan (using index in extra column) to return records, filter unwanted records directly from the index and return hit results. This is done at the MySQL server layer, but does not require a return table query record.
3. Return data from the data table and filter records that do not meet the criteria (the using where is present in the extra column). This is done at the MySQL server layer, and MySQL needs to read the records from the data table and then filter.
6.3 How to Refactor queries
6.3.1 a complex query or multiple simple queries
MySQL has the ability to scan millions of rows of data per second in memory, compared to the MySQL response data to the client is much slower. When other conditions are the same, it is certainly better to use as few queries as possible. But sometimes it is also necessary to break up a large query into multiple small queries.
6.3.2 Slicing Query
Sometimes for a big query we need to "divide and conquer", for the deletion of old data, if a large statement once completed, you may need to lock up a lot of data at once, fill the entire transaction log, exhaustion of system resources, blocking many small but important queries. Splitting a large DELETE statement into multiple smaller queries can affect MySQL performance as little as possible while reducing the latency of MySQL replication. For example, we need to run the following query every month:
Then you can do the same thing in the following way:
6.3.3 Decomposition Association Query
At first glance, this does not do any good, but it has the following advantages:
1, make the cache more efficient. For MySQL query caching, if a table in the association changes, then the query cache cannot be used, and if a table is rarely changed after splitting, the query cache for that table can be reused.
2, after the query, the execution of a single query can reduce the competition of the lock.
3, the query performance has also improved, using in () instead of the associated query, you can let MySQL in order to query the ID sequence, which is more efficient than the random association.
6.4 Basics of query execution
When you want MySQL to be able to run queries with higher performance, the best way is to figure out how MySQL optimizes and executes queries.
6.4.1 MySQL client/server-side communication protocol
The communication protocol between MySQL client and server is "half duplex", at any one time, either the server sends the data to the client to the server, or the client sends the data to the servers, the two actions cannot occur simultaneously.
Once the client sends the request, what it can do is wait for the result, and if the query is too large, the server will refuse to receive more data and throw the corresponding error, so parameter max_allowed_packet is especially important. Instead, the average server responds to the user with a lot of data, consisting of multiple packets. When the server starts responding to a client request, the client must completely receive the entire return result, rather than simply taking the previous few results, and then the master server stops sending the data. In this case, it is not a good idea for the client to receive the full result and then take the previous few results, or to receive a few results and then rudely disconnect the connection. This is also the reason why you need to add limit limits to your queries when necessary.
Another way to explain this behavior is that when the client pulls data from the server, it looks like a process of pulling data, but it is actually the process of MySQL pushing the data to the client. Clients are constantly receiving data pushed from the server, and the client is unable to stop the server.
When retrieving data from MySQL using most of the library functions that connect to MySQL, the result looks like fetching data from the MySQL server, and actually fetching the data from the cache of the library function. This is not a problem in most cases, but it is not good if you need to return a large result set, because the library function spends a lot of time and memory to store all the result sets. If you can start processing this data as early as possible, you can greatly reduce the intrinsic consumption, in which case the results can be processed without using the cache to record the result. PHP mysql_query (), at which time the data is in the PHP cache, and Mysql_unbuffered_query () does not cache the results.
Query Status: You can use the show full Processlist command to view the execution status of the query. Sleep, Query, Locked, analyzing and statistics, Copying to TMP table[on disk], sorting result, sending data
6.4.2 Query Cache
Before parsing a query statement, if the query cache is open, MySQL will first check whether the query hits the data in the query cache. This is done by checking for a hash lookup that is case sensitive. If the query cache is hit by the current query, MySQL checks the user permissions before returning the query results. If there is no problem with the permissions, MySQL skips the execution phase and gets the result directly from the cache and returns it to the client.
6.4.3 Query Optimization Processing
The next step in the query lifecycle is to convert a SQL into an execution plan, and MySQL interacts with the execution plan and the storage engine. This includes multiple sub-stages: Parsing SQL, preprocessing, and optimizing the SQL execution plan.
1, syntax parser and preprocessing first MySQL through the keyword to parse the SQL statement, and generate a parse tree. The MySQL parser will validate and parse the query using MySQL syntax rules. For example, if the wrong keyword is used, or if the order of the keywords is correct, the quotation marks can be correctly matched before and after.
2, the preprocessor is based on some MySQL rules to further check whether the parse tree is legitimate, such as checking the existence of data tables and data columns, but also resolve names and aliases to see if they are ambiguous.
3, the next step preprocessing will verify the permissions.
query optimizer : A statement can be executed in many ways, and the same result is returned at the end. The optimizer's role is to find the best execution plan. MySQL uses a cost-based optimizer that tries to predict the cost of a query when it uses some kind of execution plan, and selects one of the least expensive. The smallest unit of cost is the cost of randomly reading a 4K data page, and adding some factors to estimate the cost of a certain invocation. You can know the cost of the current query for MySQL calculation by querying the value of the current session's Last_query_cost.
This is calculated based on a series of statistics: the number of pages per table or index, the cardinality of the index (the number of different values in the index), the length of the index and data rows, and the distribution of the indexes.
Of course, many reasons cause the MySQL optimizer to choose the wrong execution plan: for example, inaccurate statistics or cost estimates in execution plans are not equivalent to actual execution costs.
How MySQL executes the associated query : MySQL performs a nested loop association operation on any association, that is, MySQL first loops through a single table of data, then nesting loops into a table to find matching rows, and then down until the matching behavior is found. The columns that are required in the query are then returned based on the rows that match each table. (nested Loop Association)
Execution Plan : MySQL generates a command tree for the query and then executes the command tree through the storage engine and returns the results. The final execution plan contains all the information for the refactoring query. If you execute explain EXTENDED on a query and then execute show WARNINGS, you can see the refactored query.
The MySQL execution plan is a tree with a left depth priority.
However, if you have an association of more than n tables, you need to check the factorial association order of N. We call this the "search space" for all possible execution plans. In fact, the "greedy" search mode is chosen when the associated table exceeds the optimizer_search_depth limit.
Sort Optimization : Regardless of the sort is a costly operation, so from a performance perspective, you should avoid sorting as much as possible or avoid sorting large amounts of data as much as possible. If the amount of data that needs to be sorted is less than the sort buffer, MySQL uses memory for a "quick sort" operation. If there is not enough memory to sort, then MySQL will first block the data, use "quick sort" for each individual block, sort the results of each block, and then merge the sorted blocks to return the sorted results.
There are two ways to sort MySQL:
Two transfers sort (legacy), read the row pointer and the field that needs to be sorted, sort it, and then read the data rows that you want based on the sort results. Obviously two transmissions, especially when reading sorted data (second time) a large number of random I/O, so two times the cost of transmission is high.
A single transfer order (new version), one read out all the required or SQL query specified columns, and then according to the sorting, sort, directly return the sorted results. Sequential I/O, Cons: Extra space if there are more columns.
The temporary storage space that MySQL needs to use when sorting files can be much larger than expected, because when MySQL is sorted, it allocates a long enough fixed space for each sort record to store. This fixed-length space must be sufficient to accommodate the longest string.
If you need to sort the query, MySQL will handle this sort of file in two different cases. If all the columns of the ORDER BY clause are from the first associated table, then MySQL sorts the files when the first table is associated with it. If so, you can see that the extra field will have a using filesort in the MySQL explain results. In all other cases, MySQL stores the associated results in a temporary table, and then sorts the files after all the associations have finished. In this case the extra field can see the using temporary; Using Filesort. If there is a limit in the query, the limit is also applied after the sort, so even if you need to return less data, the temporary table and the amount of data that needs to be sorted are still very large.
6.4.4 Query Execution engine
Query execution is simpler than query optimization, and MySQL executes only incrementally according to the instructions output from the execution plan. Directives are done by calling the storage engine's API, which is generally called the handler API, in fact, the MySQL optimization phase creates a handler instance for each table, with the handler instance getting information about the table (column name, index statistics, and so on).
The storage Engine interface is very rich in functionality, but there are only dozens of of the underlying interfaces, which can do most of the work of the query, just like building blocks. For example, there is an interface that queries the first row of an index, and then a function that queries the next purpose of an index condition, with these two functions to complete the full index scan operation.
6.4.5 returns the result to the client
The final stage of query execution is to return the results to the client. Even though the query does not need to return a result set to the client, MySQL will still return some information about the query, such as the number of rows affected by the query.
MySQL returning the result set to the client is an incremental, stepwise process of return. Once the server has finished processing the last associated table and starts generating the first result, MySQL can begin to return the result set to the client incrementally.
There are two benefits to this process: the server does not need to store too many results, and it does not consume too much memory because it returns too many results. In addition, this processing also allows the MySQL client to get the results returned the first time.
Of course, the optimizer has its limitations, as well as some specific types of optimizations that are interesting to find in the book.
High-performance MySQL reading notes-query performance optimization
Start building with 50+ products and up to 12 months usage for Elastic Compute Service