MySQL query optimizer

Source: Internet
Author: User

The purpose of this article is mainly by telling you what the query optimizer does for us, how we do it, to make the query optimizer optimize our SQL, and to enlighten us on how SQL statements are written to be more efficient. So what exactly is MySQL going to do to optimize that, following the following aspects to explore:

1. Constant conversion

It is able to convert constants in SQL statements, such as the following expression: WHERE col1 = col2 and col2 = ' x '; Based on transitivity: if A=b and b=c, then a=c can be obtained. So the above expression MySQL query optimizer can be optimized as follows: WHERE col1 = ' x ' and col2 = ' x '; For Col1 col2, a similar conversion can be performed as long as one of the following operators is part of the operator: =,<,>,<=,>=,<>,<=>,like

From this we can also see that the case of between is not converted. This may be related to its specific implementation.

2. Exclusion of Invalid code

The query optimizer filters out some useless conditions, such as WHERE 0=0 and column1= ' Y ' because the first condition is always true, so you can remove the condition and change to: where column1= ' Y ' goodbye to the following expression: where (0=1 and s1=5) OR s1=7 because the expression in the previous parenthesis is always false, you can remove the expression and change to: WHERE s1=7

In some cases, you can even remove the entire where clause, see the following expression: where (0=1 and s1=5) we can see that WHERE clause is always falase, then the where condition is not possible. Of course we can also say that the where condition is optimized out.

If a column is defined as NULL, then: where not_null_column is null the condition is always false, and then see: where Not_null_column is not null the condition is always true, Therefore, such expressions can also be removed from the conditional expression.

Of course, there are special cases, such as in out joins, columns that are defined as NOT NULL may also contain null values. In this case, the is null condition is reserved.

Of course, the optimizer does not detect all the cases, because it is too complex. As an example: CREATE TABLE Table1 (column1 CHAR (1));

SELECT * from Table1 WHERE column1 = ' Canada '; Although the condition is an invalid condition, the optimizer does not remove it.

3. Constant calculation

The following expression: where col1 = 1 + 2 is converted to: where col1 = 3 MySQL evaluates a constant expression and then generates the result condition

4. Access type

When we evaluate a conditional expression, MySQL determines the type of access to the expression. Here are some access types, arranged in order from best to worst:

system table, and is a constant table

Const constant Table

Eq_ref Unique/primary Index, and use ' = ' for access

Ref index using ' = ' for access

The ref_or_null index is accessed using ' = ' and may be null

The range index is accessed using between, in, >=, like, and so on

Index index full scan

All table Full Scan

The optimizer chooses the appropriate drive expression based on the access type. Consider the following query statement, which is the reference fragment:

SELECT *  from Table1  WHERE indexed_column=5and  unindexed_column=6  

The Indexed_column has a better access type, so it is more likely to use the expression as a driving expression. Here only the simple case is considered, regardless of special circumstances. So what does it mean to drive an expression? Given that there are two possible ways to execute this query statement:

1) Bad execution path: reads each row of the table (called "Full table Scan"), and for each row read, checks to see if the corresponding value satisfies the indexed_column and unindexed_column corresponding conditions.

2) Good execution path: Find the B-tree by key-value indexed_column=5, and for each row that meets the criteria, determine whether the unindexed_column corresponds to the condition.

In general, index lookups require fewer access paths than full-table scans, especially when the amount of table data is large and the type of the index is unique. So call it a good execution path, using the Indexed_column column as the driver expression.

5. Scope Access Type

Some expressions can use indexes, but range lookups that belong to an index. These expressions usually correspond to the operators:>, >=, <, <=, in, like, between.

For the optimizer, the following expression:

Column1 in (All-in-all)

The expression is equivalent to the following expression:

Column1 = 1 or column1 = 2 or Column1 = 3

And MySQL also thinks that they are equivalent, so there is no need to manually change in to or, or change or to in.

The optimizer uses an index range lookup for the following expression: Column1 like ' X ', but the following expression will not be used for the index: Column1 '%x ', this is because when the first character is a wildcard, there is no way to use the index for scope lookups.

For the optimizer, the following expression: Column1 between 5 and 7 the expression is equivalent to the following expression: Column1 >= 5 and Column1 <= 7 Similarly, MySQL considers them to be equivalent.

If you need to examine too many index key values, the optimizer discards the use of index range lookups instead of using full table scans. This is often the case when the index is a multilevel level two index, with the query condition ' < ' and ' > '.

6. Index access type

The

considers the following query statement: select Column1 from Table1; if Column1 is an indexed column, the optimizer is more likely to choose an index full scan instead of a full scan of the table. This is because the index overrides the columns we need to query.  Consider the following query statement: SELECT column1,column2 from Table1; If the index is defined as follows, then you can use the index full scan: CREATE index ...  On Table1 (COLUMN1,COLUMN2); In other words, all columns that need to be queried must appear in the index. However, the following query can only go through the full table scan: select Col3 from Table1; since col3 is not indexed, it can only go through a full table scan. In fact, the index established in our CN table actually has some problems:

PRIMARY KEY   (' CID '),UNIQUEkey  ' idx_cn_cname ' (' CNAME '),key  ' Index_cn_ Cid_uid ' (' CID ', ' CUSTOMERID '),key  ' Index_cn_prodtype ' (' Prodtype '),key  ' Index_cn_p_c ' (' Prodtype ', ' cnstatus '),KEY ' index_cn_uid ' (' CUSTOMERID ')

For example, the CID is a unique index, and the CID has been able to uniquely determine a record, so it is actually superfluous to index the CID and CustomerID. Similarly, it is problematic to establish a composite index of Prodtype and Cnstatus, and then to build Prodtype indexes, even if you use the Prodtype field as a conditional query, and you do not necessarily use the Prodtype index because they have the same prefix. So the optimizer simply doesn't know which index you want to use, so try to avoid indexing the same prefix.

7. Conversion

MySQL supports conversions for simple expressions. For example, the following syntax: WHERE-5 = Column1 converted to: WHERE Column1 = 5 Nonetheless, there is no conversion to the existence of a mathematical operation. For example, the following syntax: where 5 =-column1 is not converted to: where Column1 =-5, so minimize the operation on the column, and put the operation on the constant. For example, we write SQL when consciously will 5=-columb1=> column1=-5;

8.AND

The query with and is in the format: and, consider the following query statement:

WHERE column1='x' and Column2='y ' 

Steps to optimize:

1) If none of the two columns are indexed, use a full table scan.

2) Otherwise, if one of the columns has a better access type (for example, one has an index and the other has no index; or, one is a unique index and the other is a non-unique index), the column is used as the driver expression.

3) Otherwise, if the two columns each have an index, and the access type corresponding to the two conditions is consistent, then the index is defined first when the index is defined.

Examples are as follows:

CREATE TABLETable1 (S1INT, S2INT);CREATE INDEXIndex1 onTable1 (S2);CREATE INDEXIndex2 onTable1 (S1);SELECT *  fromTable1WHERES1=5  andS2=5;

The optimizer chooses s2=5 as the driver expression because the index on S2 is created earlier.

9.OR

The query format with or is: or, consider the following query statement: WHERE column1= ' x ' OR column2= ' y '

The optimizer makes the choice of using a full table scan. Of course, in some specific cases, you can use index merging, which is not explained here. If the column that is designed in two conditions is the same column, then another case, consider the following query statement: where column1= ' x ' OR column1= ' y ' in this case, the query statement takes an indexed range lookup.

10.UNION

All the query statements with union are optimized separately, consider the following query statement, and here is the reference fragment:

SELECT *   from  Table1   WHERE  column1='x'UNIONAll  SELECT* from Table1  wher  column2='y  '

If both Column1 and Column2 have indexes, each query uses an index query and then merges the result set.

11.not,<>

Consider the following expression: column1<> 5 Logically, the expression is equivalent to the following expression:

Column1<5 OR column1>5 However, MySQL does not make such a conversion. If you think it would be better to use a range lookup, you should convert it manually.

Consider the following expression: where not (column1!=5) logically, the expression is equivalent to the following expression: where column1=5 Likewise, MySQL does not make such conversions

12.ORDER by

In general, the role of order by is to make the result set sorted in a certain order, and if you can produce sequential results without this operation, you can skip the order by operation. Consider the following query statement:

SELECT  from ORDER  by ' x ';

The optimizer removes the ORDER BY clause because the ORDER BY clause here does not make sense. Consider another query statement:

SELECT  from ORDER  by Column1;

In this case, if an index exists on the Column1 class, the optimizer uses that index for a full scan, resulting in an ordered result set that does not require an order by operation.

Consider another query statement: SELECT column1 from Table1 ORDER by column1+1; Assuming an index exists on the column1, we might think that the optimizer will scan the Column1 index completely, and do not perform an order by operation. In fact, this is not the case, the optimizer uses an index on the Column1 column to perform a full sweep, simply because the index full scan is more efficient than the table full scan. The order by sort operation is still performed for the result set of the index full scan.

13.GROUP by

The methods for optimizing the GROUP BY clause and related set functions are listed here:

1) If an index exists, GROUP by uses the index.

2) If there is no index, the optimizer will need to sort, usually using the hash table method.

3) If the situation is similar to "GROUP by x Order by X", the optimizer will find that the ORDER BY clause is not necessary because the result set produced by the group by is sorted by X.

4) Try to elevate the conditions in the HAVING clause to the WHERE clause as much as possible.

5) for MyISAM table, "Select COUNT (*) from Table1;" Returns the result directly without the need for a full scan of the table. However, for InnoDB tables, this rule is not appropriate. To add, if the definition of column1 is not NULL, then the statement "select COUNT (column1) from Table1;" Equivalent to "SELECT COUNT (*) from Table1;".

6) Consider the optimization of Max () and Min (). Consider the following query statement: The following is a reference fragment:

SELECT MAX (Column1)  from Table1 WHERE column1<'a';

If an index exists on the Column1 column, the optimizer uses ' a ' for index positioning and then returns to the previous record.

7) Consider the following query statement:

SELECT DISTINCT  from Table1;

In a particular case, the statement can be translated into:

SELECT  from GROUP  by Column1;

The precondition for conversion is that there is an index on the column1, there is only one single table on the From, there is no where condition and there is no limit condition.

Reference:

Http://blog.sina.com.cn/s/blog_78dab8d20100x973.html

MySQL query optimizer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.