This article is "high-performance MySQL" Reading notes
Slow Query Basics: Optimizing data access
The most basic reason for poor query performance is that there is too much data to access. For inefficient queries, we found that the following two steps to analyze are always effective:
- Verify that the application is retrieving a large amount of data than needed. This usually means that too many rows have been accessed, but sometimes it is possible to access too many columns.
- Verify that the MySQL server layer is analyzing a large number of data rows that are more than needed.
Whether unwanted data was requested from the database
Some queries request more data than is actually needed, and then the extra data is discarded by the application. This gives the MySQL server an additional burden and increases network overhead, and also consumes the CPU and memory resources of the application server.
Some typical cases:
is MySQL scanning for additional records
After you have determined that the query returns only the data you need, you should look at the query in order to return the data to the result that it is not scanning too much data. For MySQL, the simplest three metrics to measure query overhead are:
- Response time
- Number of rows scanned
- Number of rows returned
These three metrics are recorded in a slow log of MySQL, so checking for slow logging is a good way to find queries that scan too many rows.
Number of rows scanned and number of rows returned:
Scanning too many rows indicates that the query is inefficient to find the data it needs, in fact, it is not optimized for indexing.
The number of rows scanned is equal to the number of rows returned, which is the optimal query.
Number of rows scanned and type of access:
is not to choose the optimal type of access to reduce the number of scanned rows.
When evaluating query overhead, consider the cost of finding a row of data from a table. There are several ways to access MySQL to find and return a row of results. Some access methods may require many rows to be scanned to return a row of results, and some access methods may return results without having to scan.
The type column in the Explain statement reflects the access types. There are many types of access, from full table scans to index scans, range scans, unique index queries, constant references, and so on. These are listed here, the speed is from slow to fast, the number of rows scanned is also small to large. You don't need to remember these types of access, but you need to understand the concepts of scan tables, scanned indexes, scope access, and single-valued access.
Create a proper index & query only the columns you need.
How to Refactor queries
Simplifies SQL, splits large SQL, and decomposes associated queries.
MySQL Query execution process
View current query thread status:SHOW FULL PROCESSLISTcommand.
Status enumeration value:
- Sleep: The thread is waiting for the client to send a new request.
- Query: The thread is executing the queries, or is sending the results to the client.
- Locked: At the MySQL server layer, the thread is waiting for the table. Locks implemented at the storage engine level, such as InnoDB row locks, are not raised in the thread state.
- Analyzing and Statistics: threads are collecting statistics about the storage engine and generating execution plans for the query.
- Copying to TMP table [on disk]: the thread is executing the query and copying its result set into a temporary table, which is typically either a group by operation, a file sort operation, or a union operation. If there is a "on disk" tag behind this state, it means that MySQL is saving a temporary memory table to disk.
- Sorting result: The thread is sorting the results set.
- Sending data: The thread may be transferring the information between multiple states, either generating a result set, or returning data to the client.
Before parsing a query statement, if the query cache is open, MySQL will first check whether the query hits the data in the query cache. This check is implemented by a hash lookup that is case sensitive. Query and cache queries do not match the cached results even if only one byte is different, in which case the query goes to the next stage of processing.
If the query cache is hit by the current query, MySQL checks the user permissions before returning the query results. This is still not necessary to parse the query SQL statement, because the query cache already contains the table information that the current query needs to access. If there is no problem with the permissions, MySQL skips all other stages, directly in the cache to get the results and returns to the client. In this case, the query will not be parsed without generating an execution plan and will not be executed.
Query optimization processing
The next step in the life cycle of a query is to convert a SQL to an execution plan, and MySQL then interacts with the execution plan and storage engine. This stage can be divided into several sub-stages: Parsing SQL, preprocessing, optimizing the SQL execution plan.
Syntax Parser and preprocessing:
First, MySQL parses the SQL statement with the keyword and generates a corresponding "parse tree". The MySQL parser will validate and parse the query using MySQL syntax rules. For example, it officer whether to use the wrong keyword, or if the order of the keywords is correct, or if it verifies that the quotation marks match correctly before and after.
The preprocessor further checks to see if the parse tree is legitimate based on some MySQL rules, for example, to check if the data table and rule columns are present, and to parse the names and aliases for ambiguity.
A query can be executed in many ways, and the same result is returned at the end. The optimizer's role is to find one of the best execution plans.
The MySQL query optimizer is a very complex component that uses a number of optimization strategies to generate an optimal execution plan. Optimization strategy can be divided into two simple, one is static optimization, one is dynamic optimization. The static optimization can analyze the parse tree directly and complete the optimization. For example, the optimizer can convert the where condition to another equivalent situation through some simple algebraic transformations. Static optimizations do not depend on special values, such as some changshu that are brought into the where condition, and so on. Static optimizations remain in effect after the first completion, and repeatedly executing queries with different parameters does not change over time. You can think of this as a "compile-time optimization."
In contrast, dynamic optimizations are related to the query context and may also be related to many other factors, such as the value in the Where condition, the data row corresponding to the entry in the index, and so on. Needs to be re-evaluated each time it is queried, it can be thought of as "optimization at run Time".
Here are some of the types of optimizations that MySQL can handle:
- Redefine the order of the associated tables;
- Convert external links into internal connections;
- Use equivalent transformation rules (such as 5=5 and A>5 will be rewritten as a>5);
- Optimize count (), Min (), and Max ();
- Estimate and convert to constant definition;
- Overwrite index scan;
- Sub-query optimization;
- Early termination of inquiries;
- Equivalent propagation;
- The comparison of the object in ();
How MySQL executes the associated query
The meaning of "relevance" in MySQL is more broadly understood than in the general sense. In general, MySQL considers any query to be an "association"--not just a query that needs to match two tables to be associated, so in MySQL, every query, every fragment (including subqueries, even select based on a single table) is likely to be associated.
We understand associative queries based on the example of Union queries. For union queries, MySQL first puts a series of individual query results into a temporary table, and then reads the temporary table again to complete the union query. In the MySQL concept, each query is an association, so reading the resulting temporary table is also an association.
The current MySQL association execution strategy is simple: MySQL performs a nested loop association operation on any association, that is, MySQL loops through a single piece of data in a table, and then nesting loops into the next table to find the matching rows, one at a time, to find the matching behavior of all tables. The columns that are required for the query result summary are then returned based on the rows that match each table. MySQL tries to find all the matching rows in the last associated table, and if the last associated table cannot find more rows, MySQL returns to the previous level of the associated table to see if more matching records can be found, and an iterative execution of the analogy is performed.
In this way, the records of the first table are found, the next associated table is nested, and then goes back to the previous table, which is implemented by nesting loops in MySQL. Take a look at the simple query in the following example:
Assuming that MySQL is associated in the order of the tables in the query, we can use the following pseudocode to indicate how MySQL will complete this query:
Query optimization optimization for a specific type count () query
Count () can count the number of values in a column or count the number of rows. The column values are required to be non-null when the column values are counted (not statistics null). If an expression of a column or column is specified in the parentheses of Count (), the count is the number of results (not NULL) for which the expression has a value.
The other function of count () is to count the number of rows in the result set, which is actually counted when MySQL confirms that the expression value in parentheses cannot be empty. The simplest is when we use COUNT (), in which case the wildcard does not expand into all the columns as we would suppose, in fact, it ignores all the columns and directly counts all the rows.
A common mistake is to specify a column within parentheses but want to count the number of rows in the result set. If you want to know the number of rows in the result set, it's best to use COUNT (*), which makes sense and performance good.
MyISAM is very fast only if there is no where condition count (*), because there is no need to actually calculate the number of rows in the table at this time. MySQL can take advantage of the characteristics of the storage engine to get this value directly. If MySQL knows that a column col cannot be a null value, then MySQL internally optimizes the count (col) expression to count (*). MyISAM is not much different from other database engines when counting the number of result set rows in the WHERE clause.
Optimizing Associated Queries
- Make sure that there is an index on the column in the on or using clause. When creating an index, consider the order of the associations. When table A and table B are associated with column C, if the optimizer's association order is b, A, then there is no need to index the corresponding column of table B. Indexes that are not used only bring additional burdens. In general, you only need to create an index on the corresponding column of the second table in the association order, unless there are other reasons.
- Make sure that any expression in group by and order by involves only the columns in one table, so that MySQL can use the index to optimize the process.
Use associative queries instead of subqueries whenever possible.
Optimize GROUP by optimization
The optimization of group by is mainly divided into indexed and non-indexed cases.
When indexes are unavailable, group by uses two strategies for grouping: Using temporary tables or sorting files to make a grouping, in fact, a full table scan to filter the data to form a temporary table, and then sort by the column specified by the group by. In this temporary table, the data rows for each group are contiguous. Once you have finished sorting, you can discover all the groups, and you can execute the aggregation function. Therefore, we often see "Using temporary" after explain; Using Filesort ".
If the row sequence is not specified by the ORDER by clause, the result set is automatically sorted by the grouped fields when the query uses the GROUP BY clause. If you do not care about the order of the result set, which in turn causes the need for file ordering, you can use ORDER by NULL to make MySQL no longer file sort. You can also use the DESC or ASC keyword directly in the GROUP BY clause so that the grouped result set is sorted in the desired direction.
Optimizing Limit Paging queries
Let's say the following page-SQL statement:
This is a typical limit statement, a common usage scenario is that some queries return very much, and the client processing power is limited, I want to take only a subset of the results at a time to process.
The implementation mechanism of the above SQL statement is:
- Reads the offset+rows row record from the table tables.
- Discards the previous offset row record, returning the rows following the record as the final result.
This implementation mechanism has a drawback: although you only need to return rows of rows, you must first access records that are not used by offset rows. When querying a table with a large amount of data, the offset value can be very large, at which point the limit statement is very inefficient.
Use an overlay index to optimize:
Whenever possible, use an index overlay scan, determine which primary key to return a row, and so on, and then return the required columns as needed. Typically, this optimization is accomplished using subqueries.
For example, we have the following SQL statement:
We can first use Coverage index query speed of the advantages of the first query the corresponding page section of the number, and then according to the study number to do correlation query, the second step is to directly use the primary key to do the association, is also very fast. The optimized SQL is as follows:
selectfromasINNERJOIN (selectidfromlimit1000,10ason stu.id = tmp.id;
Determine the page start value to reduce the number of scanned rows
We can remember where we last took the data, and next time we can start scanning the data directly from that location and take the specified length. Assuming that the last student number you obtained last time is: 20180131200, we can change to the following SQL:
The above SQL first uses the primary key for sorting, because the properties of the clustered index, so the primary key ID in the index tree itself has been stored in an orderly manner, so the order by here is very fast. Then using the primary key for filtering is also very fast.
Excellent web text: MySQL Big data volume limit optimization
MySQL Query performance optimization