MySQL query optimizer working principle analysis

Last Update:2016-05-30 Source: Internet

Author: User

Tags mysql query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview of the query optimizer on the manual

The task of the query optimizer is to discover the best scenario for executing a SQL query. Most query optimizers, including the MySQL query optimizer, always search for the best solution in all possible query evaluation scenarios. For join queries, the number of possible scenarios investigated by the MySQL optimizer increases exponentially with the number of tables referenced in the query. For a small number of tables (typically less than 7-10), this is not an issue. However, when a query is submitted larger, the time it takes to query optimization can easily become a major bottleneck for server performance.
A more flexible approach to query optimization is to allow the user to control the optimizer to search the best query evaluation scheme in detail. The general idea is that the fewer scenarios the optimizer investigates, the less time it takes to compile a query. On the other hand, because the optimizer skipped some scenarios, it might miss the best scenario.
The optimizer's behavior with regard to the evaluation of the number of scenarios can be controlled by two system variables:

The Optimizer_prune_level variable tells the optimizer to skip some scenarios based on an estimate of the number of rows accessed per table. Our experiments show that this type of "based guess" rarely misses the best scenario and can significantly reduce the number of query edits. This is why this option is on (optimizer_prune_level=1) by default. However, if you think the optimizer missed a better query scenario, this option can be turned off (optimizer_prune_level=0), and the risk is that the query edits take longer. Note that even with this heuristic, the optimizer can still detect an exponential number of scenarios.
The ptimizer_search_depth variable tells the optimizer how much depth should be viewed for each unfinished "future" scenario to assess whether it should be further expanded. A small optimizer_search_depth value can greatly reduce the number of query edits. For example, if Optimizer_search_depth is close to the number of tables in the query, queries for 12, 13, or more tables are likely to take hours or even days to compile. Also, if you edit with optimizer_search_depth equal to 3 or 4, for the same query, the compiler compilation time can be less than 1 minutes. If a reasonable optimizer_search_depth value cannot be determined, the variable can be set to 0, which tells the optimizer to automatically determine the value.
We can see these parameters through show variables.

Note (manual URL: http://doc.mysql.cn/mysql5/refman-5.1-zh.html-chapter)

Personal understanding

From the official manual, it can be understood that MySQL uses a cost-based optimizer to determine the most efficient way to process a query, that is, before executing the query, it selects a self-optimal scheme and executes the scheme to obtain the result. In many cases, MySQL is able to calculate the best possible query plan, but in some cases MySQL does not have enough information about the data, or provides too much information about the data, and the estimate is less friendly.
But the feeling of the handbook doesn't say how does MySQL find the best solution?
by querying the appropriate information, personal understanding of the following
the MySQL optimizer, one of the main goals is that as long as possible to use the index, and use the most stringent indexes to eliminate as much as possible, the data rows that do not conform to the index criteria, and plainly is to choose how to use the index, Of course, the optimizer is also affected by other factors. To be more intuitive, here's an example to illustrate.
Create a table:

CREATE TABLE t8(id1 INT NOT NULL ,id2 INT NOT NULL,KEY id1_key(`id1`),KEY id2_key(`id2`)) ENGINE=MYISAM DEFAULT CHARSET=utf8;

Insert a few rows of data as follows:

How does the query optimizer optimize when I execute the following query statement?

select * from t8 where id1=1 and id2=0;

Of course, MySQL will not be silly, starting from a row in the T8 table, then a row of comparisons, Id1 and Id2. The optimizer analyzes the data table first, knowing that there are indexes Id1_key and Id2_key, and if you first Judge Id1_key, then you need to exclude 3 rows of data from 4 rows, and then you need to exclude 1 rows from 2 rows If you first Judge Id2_key. For people, there is no difference between these two approaches, but for programs, it is important to judge that Id2_key requires less computation and disk input and output. Therefore, the query optimizer prescribes the program, first to examine the Id2_key index, and then to pick out the rows of data from Id2 to 0.
Through, we can see that the index can be selected with Id1_key and Id2_key, but the actual use of the index only Id2_key

The same is true if you change the SQL statement to select * from t8 where id1=1 and id2=0; execute, not before or after. Such as:

Of course, if the program is modified to the following

select * from t8 where id1=5 and id2=0;

It can also be analyzed using the Id1_key index

Of course, if you are creating a composite index

ALTER TABLE t8 ADD KEY id1_id2_key(`id1`,`id2`)

At this point, select * from t8 where id1=1 and id2=0; you will certainly consider using the Id1_id2_key index in this execution.

From the above example, you can understand that the query optimizer chooses which index to use as the most appropriate index when querying. In addition to this, we are also prompted to carefully choose to create an index. For example, three indexes (Id1_key, Id1_key, Id1_id2_key) are created above, but the optimizer optimizer can only choose the most appropriate one at a time, and if too much is created, it not only brings pressure to the update and insertion of the data, but also increases the pressure on the optimizer.

Analyze the information in optimizer optimization process

In fact, the above has been viewed in the optimizer optimization process of information, is nothing more than the use of explain. Here, in the concentrated talk, the meaning of the parameters inside. Such as

ID: MySQL query Optimizer The sequence number that is queried in the selected execution plan. Represents the order in which a SELECT clause or action table is executed in a query, the higher the ID value, the higher the priority, and the first execution. The IDs are the same, and the execution sequence is top to bottom.
select_type: Query type, simple, PRIMARY, Union, DEPENDENT Union, etc.
table: Shows which table the data for this row is about
type: This is an important column that shows what type of connection is used. The best to worst connection types are const, EQ_REG, ref, range, Indexhe, and all
Possible_keys: Displays the indexes that may be applied to this table. If it is empty, there is no possible index. You can select an appropriate statement from the where statement for the related domain
key: The actual index used. If NULL, the index is not used. In rare cases, MySQL chooses an index that is poorly optimized. In this case, use Index (indexname) can be used in the SELECT statement to force an index or use ignore index (indexname) to force MySQL to ignore the index
Key_len: The length of the index used. The shorter the length the better, without loss of accuracy
ref: Shows which column of the index is being used and, if possible, a constant
rows: The number of rows that MySQL considers must be checked to return the requested data
Extra: Additional information on how MySQL resolves queries.

Tuning the optimization of the MySQL optimizer affects the selection of indexes

The optimizer Id1_id2_key the index when we execute select * from t8 where id1=1 and id2=0; the statement, but we can influence the selection of the index by IGNORE Index, IGNORE Index

Force index

Using the Force index (Index 1[, index 2]) or using the Use Index (index 1[, index 2]), specify which index to use, or you can specify multiple indexes for the optimizer to pick from.

Ignore index

You can use ignore index (index 1[, index 2]) to omit some indexes so that the optimizer does not consider using all of them and reduces optimizer tuning time.

affect the order in which the optimizer uses the data table

In general, the MySQL optimizer will decide in which order to scan the data table to retrieve the data as quickly as possible, but we can force the optimizer to use the data table in a specific order by Straght_join, after all, the optimizer does not always make the best judgment. The principle of use is to have the most restrictive selection performed first. The straight_join can be placed behind a select or placed in the FROM clause.
Such as

As you can see, the tables in T6 are retrieved first, whether from T8,t6 or from T6,t8. But if you use Straight_join, you will follow the order in SQL.

Why should the optimizer choose to judge the data in T6 first? A major reason because there is less data in T6.

If you delete a few rows of data in T8, it is clear that the order of the MySQL optimizer selection order data table will change.

Controlling the precedence of SQL statements

In high-concurrency sites, because MySQL defaults to write-first, it may cause some read operations to not get execution opportunities within the time, high_priority can be used in select and insert operations, let MySQL know, This operation takes precedence.

Low_priority can be used in the insert and Update operations to let MySQL know that this operation will have a lower priority.

Insert delayed tells MySQL that this operation will be delayed insertion.
INSERT DELAYED into, is the client submits data to Mysql,mysql return OK state to the client. Instead of inserting data into a table, it is stored in memory waiting to be queued. When MySQL is free, insert again. Another important benefit is that insertions from many clients are lumped together and written into a block. This is much faster than performing many separate inserts because it has fewer I/O operations. The downside is that you can't return an auto-incrementing ID, and when the system crashes, MySQL hasn't had time to insert the data, and that data will be lost.

Control query buffering

In real-world development, some data on the real-time requirements are particularly high, or not often used (may be executed one or two times a few days), so that the buffer must be closed, regardless of whether the SQL statement has been executed, the server will not look for the data in the buffer, each time it is read from the disk. Because if the real-time requirements are particularly high, the data in the cache may not be synchronized with the disk, and if the data is not used frequently, it is cached and consumes memory.
In My.ini, the Query_cache_type is used to control the table cache. This variable has three values: 0,1,2, which represents off, on, and demand, respectively.
0: Indicates that query cache is off.
1: Indicates that the query always looks in the query cache first, even though Sql_no_cache still queries the cache because Sql_no_cache simply does not cache the query results, rather than not using the query results.
2: Indicates that the data is queried from the buffer only after Sql_cache is used, and the query results are still cached.
My local cache is off, as in.

About MySQL cache can refer to here
(http://blog.csdn.net/hsd2012/article/details/51526707)

MySQL query optimizer working principle analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More