Analysis of MySQL Query Optimizer (1) 1 definition
The Mysql Query optimizer selects an appropriate execution path for the query statement. The code of the query optimizer is usually changed, which is not the same as that of the storage engine. Therefore, you need to understand how the query optimizer of the latest version is organized. For more information, see the source code. On the whole, the Optimizer has many similarities. The optimizer of a single mysql version can be mastered as a whole. it is similar to the optimizer of a new mysql version and other databases.
The Optimizer converts query statements into equivalent query statements. For example, the optimizer converts the following statement:
SELECT... WHERE 5 =;
The converted equivalent statement is:
SELECT... WHERE a = 5;
Because the result sets of these two statements are consistent, the two statements are equivalent.
Here I need to note that order by is not included in the query statement. The result of the query statement 1 is (), (), and the result of the query statement 2 is (), (). we will think this is equivalent, because the query statements without order by are unordered, they can all be sorted.
2. code organization
In the kernel, the handle_select () function is the top-level function that processes query statements. There are two branches in it. one is to handle the case with union and the other is to process the case without union, here we only list a simple path for easy description. for details about the call level, see.
Handle_select ()
Mysql_select ()
JOIN: prepare ()
Setup_fields ()
JOIN: optimize ()/* optimizer is from here ...*/
Optimize_cond ()
Opt_sum_query ()
Make_join_statistics ()
Get_quick_record_count ()
Choose_plan ()
/* Find the best way to access tables */
/* As specified by the user .*/
Optimize_straight_join ()
Best_access_path ()
/* Find a (sub-) optimal plan among all or subset */
/* Of all possible query plans where the user */
/* Controlls the exhaustiveness of the search .*/
Greedy_search ()
Best_extension_by_limited_search ()
Best_access_path ()
/* Perform an exhaustive search for an optimal plan */
Find_best ()
Make_join_select ()/*... to here */
JOIN: exec ()
The indentation above indicates the mutual call relationship between functions. Therefore, we can see that handle_select () calls mysql_select (), mysql_select () calls JOIN: prepare (), and so on.
Mysql_select () first calls the JOIN function: prepare () for statement analysis, metadata settings, subquery conversion, and so on. Then call the JOIN: optimize () function to optimize and select the final execution plan. Finally, the JOIN: exec () function is called to execute the execution plan.
Despite the word "JOIN", these optimization functions serve all query statements, no matter what type of query you are.
The optimize_cond () and opt_sum_query () functions perform some conversion operations. The make_join_statistics () function analyzes the statistics of all available indexes.
3. constant conversion
The following expressions can be converted:
WHERE column1 = column2 AND column2 = 'x ';
Because we know that if A = B and B = C, then A = C. Therefore, the above expression can be converted:
WHERE column1 = 'x' AND column2 = 'x ';
For column1 column2, similar conversions can be performed as long as it belongs to one of the following operators:
=, <,>, <=, >=, <>, <=>, LIKE
We can also see that there is no conversion for.
4. Exclude invalid code
See the following expression:
WHERE 0 = 0 AND column1 = 'y'
Because the first condition is always true, you can remove it and change it:
WHERE column1 = 'y'
See the following expression again:
WHERE (0 = 1 AND s1 = 5) OR s1 = 7
Because the expression in the first bracket is always false, you can remove the expression and change it:
WHERE s1 = 7
In some cases, you can even remove the entire WHERE clause. See the following expression:
WHERE (0 = 1 AND s1 = 5)
We can see that the WHERE clause is always FALASE, so the WHERE condition is impossible. Of course, we can also say that the WHERE condition is optimized.
If the column definition is not allowed to be NULL, then:
WHERE not_null_column IS NULL
This condition is always false:
WHERE not_null_column IS NOT NULL
This condition is always true, so such expressions can also be deleted from the condition expression.
Of course, there are special cases. for example, in out join, a column defined as not null may also contain NULL values. In this case, the is null condition IS retained.
Of course, the optimizer does not detect all the situations, because it is too complicated. For example:
Create table Table1 (column1 CHAR (1 ));
...
SELECT * FROM Table1 WHERE column1 = 'Canada ';
The optimizer does not remove it even if the condition is invalid.
5. constant calculation
The following expression:
WHERE columb1 = 1 + 2
Convert:
WHERE columb1 = 3
6. constants and constant tables
The constant table is defined as follows:
1) a table has only 0 rows or 1 row of data.
2) The WHERE clause contains the condition column = constant, and these columns are primary keys, or these columns are UNIQUE (assuming that the UNIQUE is also defined as not null ). The generated query result can also be a constant table.
If the table Table0 definition contains:
... Primary key (column1, column2)
Let's look at the following syntax:
FROM Table0... WHERE column1 = 5 AND column2 = 7...
This statement returns a constant table.
To give a simpler example, the definition of building Table1 includes:
... Unique_not_null_column INT NOT NULL UNIQUE
Let's look at the following syntax:
FROM Table1... WHERE unique_not_null_column = 5
This statement also returns a constant table.
From the example, we can see that the constant table has only one row at most. MySQL evaluates the constant table in advance, finds this value, and then introduces this value to the query statement for optimization. for example:
SELECT Table1.unique _ not_null_column, Table2.any _ column
FROM Table1, Table2
WHERE Table1.unique _ not_null_column = Table2.any _ column
AND Table1.unique _ not_null_column = 5;
When evaluating this query statement, MySQL first finds that Table1.unique _ not_null_column is a constant table. Then, retrieve the value.
If the retrieval operation fails (no row in Table 1 meets the condition unique_not_null_column = 5), the constant table contains 0 rows. if you execute the EXPLAIN operation on the statement, a message is displayed:
Impossible WHERE noticed after reading const tables
Another case is that the retrieval operation is successful (only one row in Table 1 meets the condition unique_not_null_column = 5). then, the constant table contains a piece of data and MySQL converts the query statement:
SELECT 5, Table2.any _ column
FROM Table1, Table2
WHERE 5 = Table2.any _ column
AND 5 = 5;
In fact, this example is a complex example, where constant conversion is also used.
7. access type
When we evaluate a conditional expression, MySQL determines the access type of the expression. Below are some access types arranged in the order from the optimum to the worst:
System... System table and constant table
Const... Constant table
Eq_ref... Unique/primary index, and '=' is used for access
Ref... The index is accessed using '= '.
Ref_or_null... The index is accessed using '=' and may be NULL.
Range... Indexes are accessed using BETWEEN, IN, >=, and LIKE.
Index... Full Index scan
ALL... Full table scan
The Optimizer selects the appropriate driver expression based on the access type. Consider the following query statement:
The following is a reference clip:
SELECT *
FROM Table1
WHERE indexed_column = 5 AND unindexed_column = 6
Because indexed_column has a better access type, it is more likely to use this expression as the driver expression. Here we only consider simple cases, not special cases.
So what does the driver expression mean? There are two possible execution methods for this query statement:
1) bad execution path: read each row of the table (called "full table scan"). For each row read, check whether the value meets the conditions of indexed_column and unindexed_column.
2) good execution path: Search for the B tree using the key value indexed_column = 5. for each row that meets the condition, determine whether the condition corresponding to unindexed_column is met.
In general, index search requires less access paths than full table scan, especially when the table data volume is large and the index type is UNIQUE. Therefore, it is called a good execution path. the indexed_column column is used as the driver expression.