MySQL join query

Source: Internet
Author: User
ArticleDirectory
    • 1. MySQL select Process
    • 2. MySQL optimization rules
    • 3. Suggestions and summary

Generally, if you want to design a small databaseCodeBut it needs to adapt to the performance requirements of massive data and access. The most effective method is to select one or more cores with excellent performance for the main application scenarios.AlgorithmAs an engine, and then try to implant some non-main application scenarios into the engine as special cases or variants of the algorithm.
This is what MySQL and PostgreSQL do. In MySQL's SELECT query, the core algorithm is the join query algorithm. Other query statements move closer to join: Single-table queries are treated as special cases of join; subqueries are converted to join queries as much as possible ......
From MySQL 5.0Source codeThis section briefly analyzes the processes and ideas for MySQL to process join queries.

1. MySQL select Process

Is the main function flow chart after a select SQL statement is passed to the MySQL server. Each rectangle in the graph represents a function. The starting point of the arrow is the caller, And the ending point is the called function. The arrow points to a large box, indicating that the caller has called a group of functions, and the order is basically from top to bottom.
Note: The SQL statements mentioned here do not contain the union clause. because MySQL uses a separate Union engine to process the corresponding SQL statements, we rarely use Union queries for general development.

Among all the above functions, the mysql_excute_command () function is the unified entry for MySQL to process various SQL statements. After simple Syntax Parsing, the SQL statement is sent here for further analysis by the function, and the corresponding handle interface is called for subsequent processing. For select-related statements, handle_select () and mysql_select () are called.
Handle_select () can be used to process conditions where the SELECT statement contains the union clause. In a simple structure without union, the mysql_select () function is also called directly.
The mysql_select () function is the entry point function of the SELECT statement without the Union clause. Normally, after each call, it will call the prepare (), optimize (), exec () of the join class in sequence () the three functions are used to complete the preprocessing, optimization, execution, and result output functions of select statements.
Join: Prepare () is a pre-processing function. This module mainly checks the validity of parameters, performs syntax analysis, generates more accurate computer descriptions, opens a record table, and converts subqueries.
Join: optimize () is the key to the entire select process. It is responsible for optimizing the preceding description structures. The optimization process is based on a large number of rules, which will be detailed later. Join: we will not describe a series of functions called by optimize (). We will also summarize and summarize the optimization rules in each function in the following sections.
Join: exec () also performs some runtime optimization, which may cause the actual execution process to be inconsistent with the display in the explain. However, in most cases, join: exec () is executed according to the previously optimized process, so the join process is basically determined in optimize.
Join: the most relevant part of exec () to join is to call the do_select () function to obtain data. Do_select () calls the sub_select () function. This call uses a recursive method to merge two adjacent tables according to the dependency relationship to obtain the final result set.
The operation returned by the result set is also executed in join: exec (), or returned to a temporary table, input to a file, or sent to a socket. These are not the focus of our attention.

2. MySQL optimization rules

The MySQL optimizer is designed based on rules. If the rules have defects, some applications may suffer performance loss. Unlike some advanced large databases, the performance loss of MySQL may be permanent (for a fixed version ). Because large databases conduct statistical evaluations on the execution of various optimization results during execution to automatically improve the subsequent execution optimization status, MySQL currently does not have these features. Therefore, understanding the MySQL optimization rules provides great guidance for better designing SQL statements and improving execution efficiency.
The following lists some rules that MySQL 5 designed to process select queries.

Rule 1: If the operation only involves a regular table, the distinct clause is removed; otherwise, if there is only one table, the distinct is converted to a group by query in the following cases:

    • Group by can be implemented through indexes (no sort of indexes). Order by only sorts the selected record set (in this case, the optimizer performs additional Optimization on group by and order ).
    • Limit is not used, so full table scan is required.

Full table scan usually occurs in the following two scenarios:

    • SQL _calc_found_rows is used in the query.
    • The order by clause used cannot be optimized.

When the SELECT statement contains the limit clause (when the limit clause is mentioned later, the SQL _calc_found_rows clause does not exist by default), the optimizer will not use this optimization rule, in this case, the optimizer creates a temporary table and places the number of records restricted by limit, and then returns.
Note: the coexistence of limit clauses and distinct, group by, and order by clauses is complex. In this case, the use of the limit clause not only reduces the time consumed in the process of sending records, but generally does not expect more speed increases. Because either of the following three clauses may make the limit clause do the same, or even a little more calculations, regardless of whether the limit clause exists. 

This section introduces the concept of a regular table. Common tables include the following types:

    • A table with no or only one row of records.
    • The expression of a table is restricted by the WHERE clause. The expression format is "column = constant ", and the column is the primary key or unique key of the table (assuming that the unique column is also defined as not null ).

Rule 2: The optimizer will consider creating temporary tables in the following situations:

    • The SELECT statement contains the distinct clause (the distinct that can be optimized based on criterion 1 has been optimized ).
    • The order by or group by operation is applied to the second and later tables in the table list.
    • Different order by and group by sequences are used, or the sorting operation is complicated.
    • The user wants us to buffer the results.
    • The limit clause is used.

Whether to create a temporary table is determined before all tables are read.

rule 3 : Convert outer join to inner join as much as possible and nest it as much as possible. Correspondingly, the condition expression of the on clause is also moved to the WHERE clause.
If a conditional expression exists in the WHERE clause or on Clause of nested loop join, all values with a null attribute in the internal table are excluded, Outer Join can be replaced by inner join.

for example,
select * from T1 left join T2 on t2.a = t1.a where t2. B <5 condition t2. B <5 removes null, and the query is first converted:
select * from T1 inner join T2 on t2.a = t1.a where t2. B <5 and convert it to the equivalent form:
select * from T1, T2 on t2.a = t1.a where t2. B <5 and t2.a = t1.a

Similarly, the following query:
select * from T1 left join (T2, T3) on t2.a = t1.a t3. B = t1. B where t2.c <5 to:
select * from T1, (T2, T3) where t2.c <5 and t2.a = t1.a t3. B = t1. B

one conversion may trigger another
select * from T1 left join T2 on t2.a = t1.a
left join T3 on t3. B = t2. B
where T3 is not null is converted:
select * from T1 left join T2 on t2.a = t1.a, t3
where T3 is not null and t3. B = t2. B to
select * from T1, T2, t3
where T3 is not null and t3. B = t2. B and t2.a = t1.a

Rule 4: Convert multiple equations into equations as much as possible.

Rule 5: The order by operation should be applied to the result set rather than the source set as much as possible.
However, in the on clause of the join operation, there are equations or inequalities (excluding "! = ") And there is no constant on both sides of the equation, the source set may be sorted first and then joined.

Rule 6: If an index can obtain all columns required by the SELECT statement, the index is given priority.

Rule 7: Convert the subquery to join as much as possible.
In most cases, subqueries may require a large amount of temporary table storage, and the query speed is much slower than join.

Rule 8: If allowed, sort the join tables to improve the execution speed.
Tables with a small amount of data may be processed first, and tables with a large amount of data will be processed later. However, if the on clause explicitly specifies the dependency, the order cannot be adjusted based on the dependency.

 

3. Suggestions and summary

to study MySQL source code, we hope to find some new SQL optimization ideas from understanding the MySQL processing process. However, it is a pity that the join algorithm is as simple and core-oriented as possible while reading the MySQL source code. Among the tens of thousands of lines of code, the core of join-related algorithms is only the sorting and merging algorithms of several rows based on nested loops, the vast majority of the Code is complicated, such as optimization of various conditions, conversion of internal and external join conditions, and re-arrangement of the execution order of various clauses. The final purpose of these processing is only one: this allows select SQL to execute and output results according to the join core algorithm as soon as possible. In addition, there are multiple pieces of code for the optimization of a specific SELECT statement, with the nature of the patch code.
in this case, you can find a definite answer to some questions by studying the MySQL source code, however, it is difficult to find other measures beyond the database theory to greatly improve the database query performance. The idea and speed of the MySQL join algorithm seem good, even the join algorithm of large databases such as Oracle may not be able to improve the speed. Oracle hash join in CBO performs well when the sizes of the two tables differ greatly, but the use of hash join in small databases is narrow, not the primary development direction (the hash join algorithm is also used in the index join process of MySQL, but it is not focused on development ).
despite this, some tips can be obtained from code analysis to take full advantage of MySQL's core algorithm advantages while avoiding its disadvantages.

    • 1. Join is used more, and in and union are used less.
      MySQL spent a lot of effort on optimizing the join process, and the speed was indeed fast. However, most subqueries or Union operations are easy to understand, but computers are not easy to understand. As a result, the optimization function is inherently defective and the execution speed is much slower.
    • 2. for multi-table queries, if you can determine that tables can be processed in a fixed order to achieve better efficiency, we recommend that you add the straight_join clause to reduce the process of the optimizer performing table re-sorting optimization.
      On the one hand, this clause can be used by the optimizer to give SQL statements with the optimal arrangement; on the other hand, it can also be used by the optimizer to give SQL statements with the optimal arrangement, it takes a long time for MySQL to calculate the optimal arrangement.
      For the next condition, you can select the sequence of the table based on the explain prompt, and add the straight_join clause to fix the sequence. In this case, the premise is that the proportion of the data volume between several tables will remain in a certain order. Otherwise, it will be counterproductive if the number of each table is equal to this.
      For SQL statements that are frequently called, this method works better. The more tables you operate on, the better the effect.
    • 3. Use columns of the same type for comparison.
      Although MySQL provides some support for comparing different types of columns in the current version, it turns out that the comparison between these types causes unstable performance, sometimes it is incredible. In addition, forced type conversion wastes a lot of time and often forces the optimizer to fail to optimize and traverse all records. This is obviously not what we want. Therefore, you must pay attention to type Matching for columns that are frequently queried.
    • 4. Convert constants to different types to match the types at both ends of the comparison expression. This does not count on the optimizer. It is not doing well.
    • 5. In conditional expressions, try not to use type conversion or other functions for columns that can use indexes. Otherwise, the Optimizer may not be able to use indexes.
    • 6. strings are much slower than integers. Therefore, when designing a database, do not use strings if some fields can be represented by numbers. This does not seem to need to be explained.

References: MySQL internals manual and pro MySQL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.