How MySQL query optimizer works

Source: Internet
Author: User
In the manual, the query optimizer provides an overview of the tasks of the query optimizer, which is the best solution for discovering the execution of SQL queries. Most query optimizers, including the MySQL query optimizer, always search for the best solution in all possible query and evaluation solutions. For join queries, the number of possible solutions investigated by the MySQL optimizer increases exponentially with the number of tables referenced in the query.

In the manual, the query optimizer provides an overview of the tasks of the query optimizer, which is the best solution for discovering the execution of SQL queries. Most query optimizers, including the MySQL query optimizer, always search for the best solution in all possible query and evaluation solutions. For join queries, the number of possible solutions investigated by the MySQL optimizer increases exponentially with the number of tables referenced in the query.

Overview of the query optimizer in the manual

The task of the query optimizer is the best solution for SQL query execution. Most query optimizers, including the MySQL query optimizer, always search for the best solution in all possible query and evaluation solutions. For join queries, the number of possible solutions investigated by the MySQL optimizer increases exponentially with the number of tables referenced in the query. This is not a problem for small numbers of tables (typically less than 7-10. However, when more queries are submitted, the query optimization time will easily become the main bottleneck of server performance.
A more flexible method for query optimization is to allow the user control optimizer to search for the best query and evaluation solution in detail. The general idea is that the less the optimizer investigates, the less time it takes to compile a query. On the other hand, the Optimizer may miss an optimal solution because it skips some solutions.
Two system variables can be used to control the optimizer's behavior on program quantity evaluation:

  • The optimizer_prune_level variable tells the optimizer to skip some schemes based on the estimation of the number of rows accessed in each table. Our experiments show that the class "based on guesses" rarely misses the best solution and can greatly reduce the number of queries and edits. This is why this option is on (optimizer_prune_level = 1) by default ). However, if you think the optimizer_prune_level = 0 is missing a better query solution, you can disable this option (optimizer_prune_level = 0). The risk is that the query editing takes a longer time. Note that, even with this inspiration, the optimizer can still detect exponential data.

  • The ptimizer_search_depth variable tells the optimizer how deep each unfinished "future" scheme should be checked to assess whether it should be further expanded. If the value of optimizer_search_depth is small, the number of queries edited is greatly reduced. For example, if optimizer_search_depth is close to the number of tables in the query, it may take several hours or days to compile the query for 12, 13, or more tables. At the same time, if optimizer_search_depth is used for editing, the compiler Compilation Time for the same query can be less than 1 minute. If you cannot determine a reasonable optimizer_search_depth value, this variable can be set to 0, telling the optimizer to automatically determine this value.
    We can use show variables to view these parameters.

    Remarks (manual URL: http://doc.mysql.cn/mysql5/refman-5.1-zh.html-chapter)

Personal Understanding

From the official manual, it can be understood that MySQL uses an overhead-based optimizer to determine the optimal solution for processing queries, that is, before the query is executed, you will first select a self-thought-best solution, and then execute this solution to obtain the results. In many cases, MySQL can calculate the best possible query plan, but in some cases, MySQL does not have enough information about the data or provides too much relevant data information, the estimation is not so friendly.
However, I don't know how MySQL can find the optimal solution in the manual?
By querying the relevant information, I personally understand the following:
One of the major goals of the MySQL optimizer is to use indexes as long as they are possible and use the strictest-condition indexes to exclude data rows that do not meet the index conditions as quickly as possible, to put it bluntly, we choose how to use indexes. Of course, the optimizer is also affected by other factors. In order to be more intuitive, the following examples are provided.
Create a table:

CREATE TABLE t8(id1 INT NOT NULL ,id2 INT NOT NULL,KEY id1_key(`id1`),KEY id2_key(`id2`)) ENGINE=MYISAM DEFAULT CHARSET=utf8;

Insert several rows of data as follows:

When I execute the following query statement, how does the query optimizer optimize it?

select * from t8 where id1=1 and id2=0;

Of course, MySQL will not be silly, starting from a row in the t8 table, and then comparing one row, id1 and id2. The optimizer analyzes the data table and finds that the index id1_key and id2_key exist. If the id1_key is judged first, three rows of data need to be excluded from the four rows of data. If the id2_key is determined first, then we need to exclude one row from two rows. There is no difference between the two methods, but for programs, it is necessary to determine that id2_key requires less computing and disk input and output. Therefore, the query optimizer requires the program to first check the id2_key index and then pick out the data rows with id2 0.
We can see that the available indexes include id1_key and id2_key, but only id2_key is used.

If you change the SQL statementselect * from t8 where id1=1 and id2=0;The same is true for execution. For example:

Of course, if you change the program to the following

select * from t8 where id1=5 and id2=0;

You can also analyze the results and use the id1_key index.

Of course, if you create a composite index

ALTER TABLE t8 ADD KEY id1_id2_key(`id1`,`id2`)

Runselect * from t8 where id1=1 and id2=0;Of course, id1_id2_key index will be considered.

Through the above example, we can understand which index is selected by the query optimizer as the most appropriate index during query. In addition, we are also prompted to carefully choose to create an index. For example, three indexes (id1_key, id1_key, and id1_id2_key) are created. However, when the optimizer is used, only one of the most suitable indexes can be selected at a time. If too many indexes are created, it not only puts pressure on data update and insertion, but also increases the pressure on the optimizer.

Analyze information during optimizer Optimization

In fact, I have checked the information in the optimizer optimization process above, but I just want to use explain. Here, let's talk about the parameter meanings in the set. For example

Id: Serial Number queried in the execution plan selected by MySQL Query Optimizer. Indicates the order in which the select clause or operation table is executed in the query. The higher the id value, the higher the priority. The execution sequence is from top to bottom.
Select_type: Query type, such as SIMPLE, PRIMARY, UNION, and dependent union.
Table: Displays the data of this row about which table
Type: This is an important column that shows the type used by the connection. The connection types from the best to the worst are const, eq_reg, ref, range, indexhe, and all.
Possible_keys: Displays indexes that may be applied to this table. If it is null, there is no possible index. You can select an appropriate statement from the where statement for the relevant domain.
Key: Actually used index. If it is null, no index is used. In rare cases, mysql selects an optimized index. In this case, you can use index (indexname) in the select statement to force an index or use ignore index (indexname) to force mysql to ignore the index.
Key_len: The length of the index used. The shorter the length, the better.
Ref: Displays which column of the index is used. If possible, it is a constant.
Rows: The number of rows that mysql considers to be checked to return request data
Extra: Extra information about how mysql parses the query.

Adjust the influence of MySQL optimizer Optimization on index selection

When we are executingselect * from t8 where id1=1 and id2=0;The optimizer will id1_id2_key INDEX during the statement, but we can use ignore index and ignore index to influence the selection of indexes.

Forced Index

You can use force index (INDEX 1 [, INDEX 2]) or use index (INDEX 1 [, INDEX 2]) to specify which INDEX to USE. You can also specify Multiple indexes, let the optimizer select from it.

Ignore Index

You can use ignore index (INDEX 1 [, INDEX 2]) to IGNORE some indexes. In this way, the optimizer will not consider using these indexes to reduce the optimization time of the optimizer.

Sequence of data tables used by the optimizer

Generally, the MySQL optimizer determines the sequence in which the data table is scanned to retrieve the data as quickly as possible. However, we can use the STRAGHT_JOIN force optimizer to use the data table in a specific order, after all, the Optimizer may not always make optimal judgments. The principle is to first execute the most restrictive selection operation. STRAIGHT_JOIN can be placed after SELECT or in the FROM clause.
For example


It can be seen that no matter from t8, t6 or from t6, t8, tables in t6 are retrieved first. However, if you use STRAIGHT_JOIN, it will follow the SQL order.

Why does the optimizer need to determine the data in t6 first? One major reason is that t6 has less data.

If you delete several rows of data in t8, the order of the MySQL optimizer to select the sequence data table will change.

Control the priority of SQL statements

In highly concurrent websites, because MySQL takes precedence over write by default, some read operations may not be executed within the effective time. HIGH_PRIORITY can be used inSelectAndInsertIn operation, let MYSQL know that this operation takes priority.

LOW_PRIORITY can be used inInsertAndUpdateOperation, let mysql know, this operation will reduce the priority.

Insert delayed tells MySQL that this operation will delay insertion.
Insert delayed into: the client submits data to MySQL, and MySQL returns OK to the client. This is not to insert data into the table, but to store the data in the memory and wait for the queue. When mysql is free, insert it again. Another important benefit is that inserts from many clients are centralized and written into a block. This is much faster than executing many independent inserts because it has fewer I/O operations. The disadvantage is that you cannot return an auto-incrementing ID, and if the system crashes, MySQL will lose the data before it can be inserted.

Control Query Buffer

In actual development, some data requires high real-time performance or is not frequently used (it may be executed once or twice in a few days). In this way, you need to disable the buffer, no matter whether the SQL statement is executed or not, the server does not search for the data in the buffer zone and reads the data from the disk each time. If the real-time requirement is particularly high, the cached data may not be synchronized with the disk. If the data is not frequently used, the cached data will occupy the memory.
In my. ini, query_cache_type is used to control table cache. This variable has three values: 0, 1, 2, which represent off, on, and demand.
0: indicates that the query cache is disabled.
1: The query always goes to the query cache. Even if SQL _no_cache is used, the query is still cached, because SQL _no_cache only does not cache the query results, rather than not using the query results.
2: indicates that data is queried from the buffer only after SQL _CACHE is used, and the query results are still cached.
My local cache is disabled, for example.

For more information about MySQL cache, see here.
Http://blog.csdn.net/hsd2012/article/details/51526707)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.