MySQL JOIN query details

Source: Internet
Author: User
Generally, if you want to design a small database (with less code), but want to adapt to the performance requirements of massive data and access, the most effective method is for the main application scenarios.

Generally, if you want to design a small database (with less code), but want to adapt to the performance requirements of massive data and access, the most effective method is for the main application scenarios.

Generally, if you want to design a small database (with less code), but want to meet the performance requirements of massive data and access, the most effective method is to select one or more excellent Core algorithms as the engine for main application scenarios, then try to implant some non-main application scenarios into the engine as a special case or variant of the algorithm.
This is what MySQL and PostgreSQL do. In MySQL's SELECT query, the core algorithm is the JOIN query algorithm. Other query statements move closer to JOIN: Single-table queries are treated as special cases of JOIN; subqueries are converted to JOIN queries as much as possible ......
Starting with the source code of MySQL 5.0, we will briefly analyze the processes and ideas for MySQL to process JOIN queries.

1. MySQL SELECT Process
Is the main function flow chart after a select SQL statement is passed to the MySQL server. Each rectangle in the graph represents a function. The starting point of the arrow is the caller, And the ending point is the called function. The arrow points to a large box, indicating that the caller has called a group of functions, and the order is basically from top to bottom.
Note: The SQL statements mentioned here do not contain the UNION clause. because MySQL uses a separate UNION engine to process the corresponding SQL statements, we rarely use UNION queries for general development.
Among all the above functions, the mysql_excute_command () function is the unified entry for MySQL to process various SQL statements. After simple Syntax Parsing, the SQL statement is sent here for further analysis by the function, and the corresponding handle interface is called for subsequent processing. For SELECT-related statements, handle_select () and mysql_select () are called.
Handle_select () can be used to process conditions where the SELECT statement contains the UNION clause. In a simple structure without UNION, the mysql_select () function is also called directly.
The mysql_select () function is the entry point function of the SELECT statement without the UNION clause. Normally, after each call, it will call the prepare (), optimize (), exec () of the JOIN class in sequence () the three functions are used to complete the preprocessing, optimization, execution, and result output functions of SELECT statements.
JOIN: prepare () is a pre-processing function. This module mainly checks the validity of parameters, performs syntax analysis, generates more accurate computer descriptions, opens a record table, and converts subqueries.
JOIN: optimize () is the key to the entire SELECT process. It is responsible for optimizing the preceding description structures. The optimization process is based on a large number of rules, which will be detailed later. JOIN: we will not describe a series of functions called by optimize (). We will also summarize and summarize the optimization rules in each function in the following sections.
JOIN: exec () also performs some runtime optimization, which may cause the actual execution process to be inconsistent with the display in the EXPLAIN. However, in most cases, JOIN: exec () is executed according to the previously optimized process. Therefore, the JOIN process is basically determined in optimize.
JOIN: the most relevant part of exec () to JOIN is to call the do_select () function to obtain data. Do_select () calls the sub_select () function. This call uses a recursive method to merge two adjacent tables according to the dependency relationship to obtain the final result set.
The operation returned by the result set is also executed in JOIN: exec (), or returned to a temporary table, input to a file, or sent to a socket. These are not the focus of our attention.

2. MySQL optimization rules
The MySQL optimizer is designed based on rules. If the rules have defects, some applications may suffer performance loss. Unlike some advanced large databases, the performance loss of MySQL may be permanent (for a fixed version ). Because large databases conduct statistical evaluations on the execution of various optimization results during execution to automatically improve the subsequent execution optimization status, MySQL currently does not have these features. Therefore, understanding the MySQL optimization rules provides great guidance for better designing SQL statements and improving execution efficiency.
The following lists some rules that MySQL 5 designed to process SELECT queries.

Rule 1: If the operation only involves a regular table, the DISTINCT clause is removed; otherwise, if there is only one table, the DISTINCT is converted to a group by query in the following cases:

Group by can be implemented through indexes (no sort of indexes). order by only sorts the selected record set (in this case, the optimizer performs additional Optimization on group by and order ).
LIMIT is not used, so full table scan is required.
Full table scan usually occurs in the following two scenarios:

SQL _CALC_FOUND_ROWS is used in the query.
The order by clause used cannot be optimized.
When the SELECT statement contains the LIMIT clause (when the LIMIT clause is mentioned later, the SQL _CALC_FOUND_ROWS clause does not exist by default), the optimizer will not use this optimization rule, in this case, the optimizer creates a temporary table and places the number of records restricted by LIMIT, and then returns.
Note: the coexistence of LIMIT clauses and DISTINCT, group by, and order by clauses is complex. In this case, the use of the LIMIT clause not only reduces the time consumed in the process of sending records, but generally does not expect more speed increases. Because either of the following three clauses may make the LIMIT clause do the same, or even a little more calculations, regardless of whether the LIMIT clause exists.

This section introduces the concept of a regular table. Common tables include the following types:

A table with no or only one row of records.
The expression of a table is restricted by the WHERE clause. The expression format is "column = constant ", and the column is the primary key or unique key of the table (assuming that the UNIQUE column is also defined as not null ).
Rule 2: The optimizer considers creating temporary tables in the following situations:

The SELECT statement contains the DISTINCT clause (the DISTINCT that can be optimized based on criterion 1 has been optimized ).
The order by or group by operation is applied to the second and later tables in the table list.
Different order by and group by sequences are used, or the sorting operation is complicated.
The user wants us to buffer the results.
The LIMIT clause is used.
Whether to create a temporary table is determined before all tables are read.

Rule 3: Convert outer join to inner join as much as possible and nest it as much as possible. Correspondingly, the condition expression of the on clause is also moved to the WHERE clause.
If a conditional expression in the WHERE clause or ON Clause of nested loop JOIN removes all NULL values from a certain attribute in the internal table, outer join can be replaced with inner join.

For example, in the following query:
SELECT * FROM t1 left join t2 ON t2.a = t1.a WHERE t2. B <5 condition t2. B <5 removes NULL. The query is first converted:
SELECT * FROM t1 inner join t2 ON t2.a = t1.a WHERE t2. B <5 and convert it to the equivalent form:
SELECT * FROM t1, t2 ON t2.a = t1.a WHERE t2. B <5 AND t2.a = t1.a

Similarly, the following query:
SELECT * FROM t1 left join (t2, t3) ON t2.a = t1.a t3. B = t1. B WHERE t2.c <5 is converted:
SELECT * FROM t1, (t2, t3) WHERE t2.c <5 AND t2.a = t1.a t3. B = t1. B

One conversion may trigger another
SELECT * FROM t1 left join t2 ON t2.a = t1.a
Left join t3 ON t3. B = t2. B
WHERE t3 is not null will be converted:
SELECT * FROM t1 left join t2 ON t2.a = t1.a, t3
WHERE t3 is not null and t3. B = t2. B AND then convert it:
SELECT * FROM t1, t2, t3
WHERE t3 is not null and t3. B = t2. B AND t2.a = t1.a

Rule 4: Convert multiple equations into equations as much as possible.

Rule 5: the order by operation should be applied to the result set rather than the source set as much as possible.
However, in the ON clause of the JOIN operation, there are equations or inequalities (excluding "! = ") And there is no constant on both sides of the equation, the source set may be sorted first and then joined.

Rule 6: if an index can obtain all columns required by the SELECT statement, the index is given priority.

Rule 7: Convert the subquery to JOIN as much as possible.
In most cases, subqueries may require a large amount of temporary table storage, and the query speed is much slower than JOIN.

Rule 8: if allowed, sort the JOIN tables to improve the execution speed.
Tables with a small amount of data may be processed first, and tables with a large amount of data will be processed later. However, if the ON clause explicitly specifies the dependency, the order cannot be adjusted based ON the dependency.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.