A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
High Performance MySQL Reading Notes-query performance optimization, high performance mysql
For high-performance database operations, it is not enough to design the optimal database table structure and build the best index. You also need to design the query reasonably. If the query is poorly written, high performance cannot be achieved even if the database table structure is reasonable and the index is appropriate. Query Optimization, index optimization, and database/table structure optimization must go hand in hand.
6.1 why query speed is slow
Generally, the query life cycle can be viewed in the order: from the client> Server> Resolution on the server> Generate an execution plan> execute> return the result to the client. Execution can be considered as the most important stage in the entire lifecycle, including a large number of calls to the storage engine for data retrieval and data processing after calling, including sorting and grouping. Understanding the query lifecycle and clear query time consumption are of great significance for optimizing the query.
6.2 Data Access Optimization
The most basic reason for poor query performance is that too much data is accessed. Most low-performance queries can be optimized by reducing the amount of data accessed.
1. Check whether the application is retrieving a large amount of data that exceeds the required data. This usually means that too many rows are accessed, but sometimes too many columns are accessed.
2. Check whether the MySQL server layer is analyzing a large number of data rows that exceed the requirements.
6.2.1 whether unnecessary data is requested from the database
Requests for excess data will impose additional burden on the MySQL server, increase network overhead, and consume the CPU memory and resources of the application server. Here are some typical cases:
1. query unnecessary records: for example, retrieve 100 records from a news website, but only display 10 records on the page. In fact, MySQL will query all the results, and the client application will receive all the result set data, and then discard most of the data. The simplest and most effective solution is to add LIMIT after such a query.
2. When multiple tables are joined, all columns are returned, for example:
3. Always retrieve all columns: Do you need to doubt whether to return all columns every time you see SELECT? If you retrieve all the columns, the primary optimizer will not be able to perform index overwrite scanning. This will also cause additional IO, memory, and CPU consumption on the server. If an application uses a caching mechanism or has other considerations, that exceeds the required amount may also benefit, but do not forget what the cost is. Retrieving and caching all column queries is more advantageous than multiple independent queries that only obtain some columns.
4. Repeat the same data query: Do not repeatedly execute the same query, and then return the same data each time. This data is cached at the time of the first query and retrieved from the cache when needed, which obviously delivers better performance.
6.2.2 whether MySQL is scanning additional records
For MySQL, the three most simple indicators for measuring query overhead include response time, number of scanned rows, and number of returned rows. All these three indicators are recorded in the slow log of MySQL. Therefore, checking slow log records is a good way to find a query with too many lines scanned.
Response time is the sum of two parts: service time and queue time. Generally, IO and lock wait are common and important.
Number of scanned rows and number of returned rows
It is very helpful to view the number of scanned rows in the query. To some extent, the efficiency of finding the required data for this query is not high. Ideally, the number of scanned rows is the same as the number of returned rows. Of course, this is only an ideal situation. In general, the ratio of the number of scanned rows to the number of returned rows is usually very small, generally between and.
Number of scanned rows and access types
MySQL has several access methods to search for and return a row of results. Some access methods may require scanning many rows to return a row of results, and some access methods may return results without scanning.
The access TYPE is returned in the TYPE column of the EXPLAIN statement. If you cannot find an appropriate access type for a query, the best solution is to add a suitable index. The index allows MySQL to find the desired records in the most efficient and least-scanned rows.
In general, MySQL can use the following three methods to apply the WHERE condition, from good to bad:
1. Use the WHERE condition in the index to filter unmatched records. This is done at the storage engine layer.
2. Use index overwrite scan (using index appears in the extra column) to return records, filter unwanted records directly from the index, and return hit results. This is done at the MySQL server layer, but there is no need to query records in the table.
3. return data from the data table and filter records that do not meet the conditions (using where appears in the extra column ). This is done at the MySQL server layer. MySQL needs to read records from the data table and then filter the records.
6.3 rebuilding the query method
6.3.1 A complex query or multiple simple queries
MySQL can scan millions of rows of data in the memory per second. In contrast, MySQL returns data to the client much slower. When other conditions are the same, it is better to use as few queries as possible. However, it is also necessary to break a large Query into multiple small queries.
6.3.2 split Query
Sometimes we need to "divide and conquer" a large query. If a large statement is used to delete old data, you may need to lock a lot of data, occupy the entire transaction log, exhaust system resources, and block many small but important queries. Splitting a large DELETE statement into multiple smaller queries can affect MySQL performance as little as possible, and reduce the delay of MySQL replication. For example, we need to run the following query every month:
You can do the same job in a way similar to the following:
6.3.3 decomposition association query
At first glance, this does not have any advantages, but it has the following advantages:
1. Make the cache more efficient. For MySQL query cache, if a table in the association changes, the query cache cannot be used. After splitting, if a table rarely changes, the query cache of the table can be reused.
2. After querying, executing a single query can reduce the lock competition.
3. the query performance is also improved. Using IN () instead of association query can make MySQL query by ID, which is more efficient than random association.
6.4 basic query execution
When you want MySQL to run queries with higher performance, the best way is to find out how MySQL optimizes and executes queries.
6.4.1 MySQL client/server communication protocol
The communication protocol between the MySQL client and the server is "Half Duplex". At any time, either the server sends data to the client or the client sends data to the server, these two actions cannot occur simultaneously.
Once the client sends a request, it can only wait for the result. If the query is too large, the server rejects receiving more data and throws an error. Therefore, the max_allowed_packet parameter is particularly important. On the contrary, the server usually responds to a large amount of data, which is composed of multiple data packets. When the server starts to respond to the client request, the client must completely receive the entire returned result, instead of simply taking the first few results, and then the master server stops sending data. In this case, it is not a good idea for the client to receive the complete results and then take the first few results, or to receive several results and then roughly disconnect. This is also the reason why limit restrictions need to be added to the query when necessary.
Another way is to explain this behavior: when the client retrieves data from the server, it looks like a data pulling process, but it is actually the process that MySQL is pushing data to the client. The client continuously receives data pushed from the server, and the client cannot stop the server.
When most database functions connected to MySQL are used to from MySQL, the results seem to be getting data from the MySQL server, but they actually get data from the cache of the database function. In most cases, this is fine, but it is not good if a large result set needs to be returned, because the library function will spend a lot of time and memory to store all the result sets. If you can start to process the data as soon as possible, you can greatly reduce the internal consumption. In this case, you can directly process the results instead of using the cache to record the results. Mysql_query () in PHP, the data has been stored in the PHP cache, while mysql_unbuffered_query () does not cache the results.
Query status: You can use the show full processlist command to view the query execution status. Sleep, Query, Locked, Analyzing and statistics, Copying to tmp table [on disk], Sorting result, Sending data
6.4.2 query Cache
Before parsing a query statement, if the query cache is opened, MySQL will first check whether the query hits the data in the query cache. This check is implemented through a case-sensitive hash search. If the current query hits the query cache, MySQL checks the user permissions before returning the query results. If the permission is correct, MySQL will skip the execution phase and directly obtain the result from the cache and return it to the client.
6.4.3 Query Optimization
The next step for querying the lifecycle is to convert an SQL statement into an execution plan. MySQL then interacts with the storage engine according to the execution plan. This includes multiple sub-stages: SQL parsing, preprocessing, and SQL Execution Plan optimization.
1. Syntax parser and preprocessing MySQL first parses SQL statements using keywords and generates a parsing tree. The MySQL parser uses MySQL syntax rules to verify and parse queries. For example, whether incorrect keywords are used, whether the order of keywords is correct, and whether quotation marks can be correctly matched before and after.
2. The pre-processor further checks whether the resolution tree is valid according to some MySQL rules, such as checking whether data tables and data columns exist, and parsing the names and aliases to see if they are ambiguous.
3. In the next step, the pre-processing will verify the permissions.
Query Optimizer: A statement can be executed in many ways, and the same result is returned at the end. The optimizer is used to find the best execution plan. MySQL uses the cost-based optimizer, which will try to predict the cost of a query using a certain execution plan, and select one with the lowest cost. The minimum unit of cost is the cost of randomly reading a 4 K data page, and some factors are added to estimate the cost of a pilot operation. You can query the Last_query_cost value of the current session to find out the cost of the current query calculated by MySQL.
This is calculated based on a series of statistical information: the number of pages for each table or index, and the base of the index (the number of different values in the index), index and data row length, index distribution.
Of course, many reasons may cause the MySQL optimizer to select an incorrect execution plan: for example, inaccurate statistics or cost estimates in the execution plan are not the same as actual execution costs.
How to execute association query in MySQL: MySQL performs nested circular join operations on any association. That is, MySQL first extracts a single piece of data from a table cyclically, and then nested loops into a table to find matching rows, continue until a matched row is found. Then, return the columns required for the query based on the matching rows of each table.(Nested loop Association)
Execution Plan: MySQL generates a query command tree, and then runs it through the storage engine to complete the command tree and return results. The final execution plan contains all information about the restructured query. If you execute explain extended for a query and then execute show warnings, you can see the restructured query.
The execution plan of MySQL is a left-side depth-first tree.
However, if there are more than n tables associated, you need to check the factorial Association Sequence of n. We call it the "search space" for all possible execution plans ". In fact, when the table to be joined exceeds the optimizer_search_depth limit, the "greedy" search mode will be selected.
Sorting Optimization: No matter how you sort data, it is a very high-cost operation. Therefore, from the performance perspective, you should try to avoid sorting or try to avoid sorting large amounts of data. If the amount of data to be sorted is smaller than the sorting buffer, MySQL uses the memory to perform the "quick sorting" operation. If the memory is not sorted enough, MySQL first sorts the data in blocks, uses "Quick Sort" for each independent block, and stores the sorting results of each block on the disk, then, the sorted blocks are merged and the sorting results are returned at the beginning.
MySQL has two sorting methods:
Two sort transfers (old version): Read the row pointer and the field to be sorted, sort it, and then read the required data row according to the sorting result. It is obviously two transfers, especially when reading the sorted data (the second time) a large number of random I/O, so the cost of two transfers is high.
Sort by single transmission (new version). Read and retrieve all required or specified columns in SQL query at a time, and then sort and return the sorted results directly. Sequential I/O. disadvantage: extra space is occupied if multiple columns exist.
The temporary storage space required by MySQL for File Sorting may be much larger than expected, because during sorting, each sorting record is allocated a long enough fixed-length space for storage. This fixed-length space must be sufficient to accommodate the longest string.
If you need to sort related queries, MySQL will process such sorting in two cases. If all the columns in the order by clause come from the first joined table, MySQL sorts the files when associating the first table. If so, you can see that the Extra field has Using filesort In the MySQL EXPLAIN result. In all other cases, MySQL stores the associated results in a temporary table, and sorts the files after all the associations are completed. In this case, the Extra field shows Using temporary; Using filesort. If there is a LIMIT in the query, the LIMIT will be applied after sorting, so even if a small amount of data needs to be returned, the temporary table and the amount of data to be sorted will still be very large.
6.4.4 query execution engine
Compared with query optimization, query execution is simpler. MySQL only executes the commands output according to the execution plan step by step. Commands are completed by calling the storage engine API, which is generally called the handler API. In fact, a handler instance is created for each table in the MySQL optimization stage, use a handler instance to obtain table information (such as column names and index statistics ).
The storage engine interfaces have rich functions, but there are only dozens of underlying interfaces. These interfaces can perform most of the query operations like building blocks. For example, there is an interface for querying the first row of an index, and there is another function for querying the next row of an index condition. With these two functions, you can complete the full index scan operation.
6.4.5 return the result to the client
The last stage of query execution is to return the result to the client. Even if the query does not need to return the result set to the client, MySQL returns some information about the query, such as the number of rows affected by the query.
MySQL returns the result set to the client incrementally and incrementally. Once the server finishes processing the last joined table and starts to generate the first result, MySQL can start to gradually return the result set to the client.
There are two advantages: the server does not need to store too many results, so it will not consume too much memory because it will return too many results. In addition, such processing also allows the MySQL client to obtain the returned results immediately.
Of course, the optimizer has its limitations and some specific optimization types. If you are interested, you can find the answer in the book.
Start building with 50+ products and up to 12 months usage for Elastic Compute Service