When you submit a query, MySQL will analyze it to see if some optimization can be done to make it faster to process the query. This section describes how the query optimizer works. If you want to know the optimization methods used by MySQL, you can refer to the MySQL reference manual.
Of course, the MySQL query optimizer also uses indexes, but it also uses other information. For example, if you submit a query as shown below, MySQL runs the query very quickly regardless of the data table size:
SELECT * FROM tbl_name WHERE 0;
In this example, MySQL checks the WHERE clause and recognizes that there are no data rows that meet the query conditions. Therefore, it does not consider searching data tables. You can see this situation by providing an EXPLAIN statement, which allows MySQL to display some information about the SELECT query that has been executed but has not actually been executed. If you want to use EXPLAIN, you only need to put the EXPLAIN word before the SELECT statement:
Mysql> explain select * FROM tbl_name WHERE 0G
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: NULL
Type: NULL
Possible_keys: NULL
Key: NULL
Key_len: NULL
Ref: NULL
Rows: NULL
Extra: Impossible WHERE
In general, EXPLAIN returns more information than the above information, it also includes non-NULL information such as the index used to scan the data table, the join type used, and the estimated number of data rows in each data table to be checked.
How the optimizer works
The MySQL query optimizer has several goals, but the primary goal is to use the index as much as possible and use the strictest index to eliminate as many data rows as possible. Your final goal is to submit a SELECT statement to find data rows, rather than exclude data rows. The reason the optimizer tries to exclude data rows is that the faster it can exclude data rows, the faster it can find data rows that match conditions. If you can perform the strictest test first, the query can be executed faster. Assume that your query checks two data columns, each of which has an index:
SELECT col3 FROM mytable
WHERE col1 = 'some value' AND col2 = 'some other value ';
Assume that the test on col1 matches 900 data rows, the test on col2 matches 300 data rows, and the test at the same time only has 30 data rows. First, test Col1 and there will be 900 data rows. Check them to find 30 matching records with the values in col2, and 870 of them will fail. First, test col2 will have 300 data rows. Check them to find 30 records matching the values in col1. Only 270 of the records failed, therefore, less computing and disk I/O are required. The result is that the optimizer tests col2 first, because the overhead is smaller.
You can use the following guidance to help the optimizer make better use of indexes:
Try to compare columns with the same data type. When you use indexed data columns in a comparison operation, use columns of the same data type. The same data type has higher performance than different types. For example, INT and BIGINT are different. CHAR (10) is considered as CHAR (10) or VARCHAR (10), but is different from CHAR (12) or VARCHAR (12. If the data columns you compare have different types, you can use alter table to modify one of them to match their types.
Try to make the index column independent in the comparison expression. If you use a data column in a function call or more complex arithmetic expression condition, MySQL does not use an index because it must calculate the expression value of each data row. Sometimes this can be avoided, but in many cases you can re-compile a query to make the index column appear independently.
The WHERE clause below shows this situation. They have the same functions, but they differ greatly in the optimization objectives:
WHERE mycol< 4/2
WHERE mycol * 2 <4
For the first line, the optimizer simplifies expression 4/2 to 2, and then uses indexes on mycol to quickly find values smaller than 2. For the second expression, MySQL must retrieve the mycol value of each data row, multiply it by 2, and then compare the result with 4. In this case, indexes are not used. Each value in the data column must be retrieved to calculate the value on the left of the comparison expression.
Let's look at another example. Suppose you have indexed the date_col column. If you submit a query as follows, this index is not used:
SELECT * FROM mytbl where year (date_col) <1990;
This expression does not compare 1990 with the index column; it compares 1990 with the value calculated from the data column, and each data row must calculate this value. The result is that the index on date_col is not used, because the full table scan is required to execute such a query. How can this problem be solved? You only need to use the text date, and then you can use the index on date_col to find the matched values in the column:
WHERE date_col <'2017-01-01'
However, suppose you do not have a specific date. You may want to find some records of dates that are fixed with today. There are many ways to express this type of comparison-their efficiency is different. There are three types:
WHERE TO_DAYS (date_col)-TO_DAYS (CURDATE () <cutoff
WHERE TO_DAYS (date_col) <cutoff + TO_DAYS (CURDATE ())
WHERE date_col <DATE_ADD (CURDATE (), INTERVAL cutoff DAY)
The index is not used for the first row, because each data row must be retrieved to calculate the value of TO_DAYS (date_col. The second line is better. Both Cutoff and TO_DAYS (CURDATE () are constants. Therefore, before processing a query, the right side of the comparison expression can be calculated by the optimizer at one time, instead of every data row. However, the date_col column still appears in function calls, which blocks the use of indexes. The third line is the best of these. Similarly, before executing a query, the right side of the comparison expression can be calculated as a constant at a time, but now its value is a date. This value can be directly compared with the date_col value, and no longer needs to be converted to days. In this case, indexes are used.
Do not use wildcards at the beginning of the LIKE mode. Some string searches use the following WHERE clause:
WHERE col_name LIKE '% string %'
If you want to find the strings that appear in any position of the data column, this statement is correct. However, do not put "%" on either side of the string because you are used to it. If you are looking for a string that appears at the beginning of the data column, delete the previous "% ". Suppose you want to find the names that start with "Mac", such as MacGregor or macgregall. In this case, the WHERE clause is as follows:
WHERE last_name LIKE 'mac %'
The optimizer checks the text at the beginning of the word in this mode and uses indexes to find the data rows that match the following expression. The following expression is another form of last_name index:
WHERE last_name> = 'mac' AND last_name <'mad'
This optimization cannot be applied to pattern matching using the REGEXP operator. REGEXP expressions will never be optimized.
It helps the optimizer to better judge the index efficiency. By default, when you compare the value of an index column with a constant, the optimizer assumes that the key value is evenly distributed within the index. When deciding whether to use indexes for constant comparison, the optimizer will quickly check the index and estimate how many entities (entries) will be used ). For MyISAM, InnoDB, and BDB data tables, you can use analyze table to let the server perform key value analysis. It provides better information for the optimizer.
Use EXPLAIN to verify the optimizer operation. The EXPLAIN statement tells you whether an index is used. This information is helpful when you try to write statements or check whether adding an index improves query execution efficiency.
Give the optimizer some tips when necessary. Under normal circumstances, the MySQL optimizer is free to determine the order of scanning data tables to retrieve data rows as quickly as possible. In some cases, the optimizer does not make the best choice. If you notice this happens, you can use the STRAIGHT_JOIN keyword to reload the optimizer selection. The join with STRAIGHT_JOIN is similar to a cross join, but forces the data table to join in the order specified in the FROM clause.
In the SELECT statement, you can specify STRAIGHT_JOIN in two places. You can specify the position between the SELECT keyword and the selection list, which will affect all the cross joins in the statement. You can also specify the position in the FROM clause. The following two statements have the same functions:
SELECT STRAIGHT_JOIN... FROM t1, t2, t3 ...;
SELECT... FROM t1 STRAIGHT_JOIN t2 STRAIGHT_JOIN t3 ...;
Run the query with STRAIGHT_JOIN and without STRAIGHT_JOIN respectively; mySQL may not use indexes in the order you think best (you can use EXPLAIN to check the execution plan for MySQL to process each statement) for any reason ).
You can also use force index, use index, or ignore index to guide the server to USE indexes.
Use the optimizer to improve the region. MySQL can execute connections and subqueries, but subqueries are only supported recently and are added in MySQL 4.1. In many cases, the optimizer makes better adjustments to join operations than to subqueries. When your subquery execution is slow, this is a real prompt. Some subqueries can be re-expressed using logically equal joins. If feasible, you can rewrite the subquery as a join to see if it is executed faster.
Test query standby mode, run multiple times. When you test the standby form of the query (for example, the comparison between the subquery and the equivalent Join Operation), each method should be run multiple times. If both modes run only once, you usually find that the second query is faster than the first one because the information obtained from the first query is still in the cache, so that the second query did not actually read data from the disk. You should also run the query when the system load is relatively stable to avoid other transactions affecting the results in the system.
Avoid excessive use of MySQL automatic type conversion. MySQL will perform automatic type conversion, but if you can avoid this conversion operation, you will get better performance. For example, if num_col is an integer data column, the following query returns the same result:
SELECT * FROM mytbl WHERE num_col = 4;
SELECT * FROM mytbl WHERE num_col = '4 ';
However, the second query involves type conversion. In order to convert the integer type and string type to the double-precision type for comparison, the conversion operation degrades the performance. More seriously, if num_col is indexed, the comparison operation involving type conversion will not use the index.
Comparison of the opposite type (comparing string columns with numeric values) also prevents the use of indexes. Suppose you have compiled the following query:
SELECT * FROM mytbl WHERE str_col = 4;
In this example, str _