The realization principle and optimization idea of MySQL join

Last Update:2014-06-25 Source: Internet

Author: User

Tags database join mysql query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The implementation principle of Join

In MySQL, there is only one join algorithm, the nested Loop join, and there is no hash join provided by many other databases, and there is no sort Merge join. As the name implies, the Nested loop Join is actually using the result set of the driver table as the loop base data, and then one line of data from the result set as a filter to the next table to query the data, and then merge the results. If there is a third participating join, then the join result set for the first two tables is used as the circular base data, and once again, the data is queried in the third table by looping through the criteria, and so forth.

Below we will illustrate how MySQL's nested Loop join is implemented using a three-table join statement example.

Query as follows:
Select M.subject msg_subject, C.content msg_content
From User_group g,group_message m,group_message_content C
where g.user_id = 1
and m.group_id = g.group_id
and c.group_msg_id = M.id

Index Condition:

User_group table: user_id

Group_message table: group_id

Group_message_content table: group_msg_id

then take a look at our query execution plan:
[email protected]> explain select M.subject msg_subject, C.content
msg_content
From user_group g,group_message m,group_message_content c
-- where g.user_id = 1
and m.group_id = g.group_id
and c.group_msg_id = M.id\g
*************************** 1. Row ***************************
id:1
Select_type:simple
table:g
Type:ref
Possible_keys:user_group_gid_ind,user_group_uid_ind,user_group_gid_uid_ind
Key:user_group_uid_ind
Key_len:4
Ref:const

Rows:2
Extra:
*************************** 2. Row ***************************
id:1
Select_type:simple
table:m
Type:ref
Possible_keys:primary,idx_group_message_gid_uid
Key:idx_group_message_gid_uid
Key_len:4
ref:example.g.group_id
Rows:3
Extra:
*************************** 3. Row ***************************
id:1
Select_type:simple
Table:c
Type:ref
possible_keys:idx_group_message_content_msg_id
key:idx_group_message_content_msg_id
Key_len:4
ref:example.m.id
Rows:2
Extra:

As we can see, MySQL Query Optimizer chose User_group as the driver table, first using the criteria we passed in user_id the index ref for the const condition through the indexes user_group_uid_ind above the table, The group_id field of the result set filtered in the User_group table is then used as the query condition, the group_message is queried, and then the group_message in the result set of the User_group and group_ two tables is passed. The ID of the message is queried as a condition compared to the group_message_content group_msg_id to get the final result.

Let's say we remove the index of the group_msg_id field above the Group_message_content table and see what the execution plan will look like:
[Email protected]> drop INDEX idx_group_message_content_msg_id on
Group_message_content;
Query OK, affected (0.11 sec)

[email protected]> explain
Select M.subject msg_subject, c.content msg_content
From user_group g,group_message m,group_message_content c
-- where g.user_id = 1
and m.group_id = g.group_id
and c.group_msg_id = M.id\g
*************************** 1. Row ***************************
id:1
Select_type:simple
table:g
Type:ref
Possible_keys:idx_user_group_uid
Key:idx_user_group_uid
Key_len:4
Ref:const
Rows:2
Extra:

*************************** 2. Row ***************************
id:1
Select_type:simple
table:m
Type:ref
Possible_keys:primary,idx_group_message_gid_uid
Key:idx_group_message_gid_uid
Key_len:4
ref:example.g.group_id
Rows:3
Extra:
*************************** 3. Row ***************************
id:1
Select_type:simple
Table:c
Type:all
Possible_keys:null
Key:null
Key_len:null
Ref:null
rows:96
extra:using where; Using Join Buffer

We see that not only the access to the User_group table has changed from ref to all, but also that the extra information in the last row has changed from nothing to a using where; Using the join buffer, that is, it is easy to understand from ref to all, there is no index to use index, of course, the full table scan, the Using where is also because after becoming a full table scan, we need to get the content A field can only be obtained by where filtering the data in the table, but what is the using join buffer that appears later?
In fact, the join here takes advantage of a cache parameter that we mentioned in the previous "MySQL Server Performance Optimization" chapter, which is the join buffer we set through the Join_buffer_size parameter.
In fact, the join Buffer can only be used if our join type is all (as in the example), Index,rang or index_merge, so when we remove the Group_message_content table group_msg Before the index of the _id field, because join is of type ref, we do not see a use of join Buffer in our execution plan.

join statement optimization

1. Minimize the total number of loops in the join statement for the nested loop;

why? Because the larger the drive result set means that more cycles are required, that is, the more queries you need to perform on the drive result set. For example, when two tables (table A and Table B) Join, if Table A is filtered by a Where condition with 10 records, and Table B has 20 records. If we choose Table A as the driver table, that is, the result set of the driver table is 20, then we filter the driver table (table B) by the join condition 10 times. Conversely, if we choose Table B as the driver table, we need to filter the comparison of table a 20 times. Of course, the precondition for this optimization is that the resource consumption of each access to each table is not significantly different by the join condition. If there is a big difference in access (generally because of the difference in the index), we cannot simply judge the order of the join statement by the size of the result set, but rather how to optimize the drive by comparing the number of cycles and the amount of product consumed by each cycle.

2. Prioritize the inner loop of the nested loop;

3. Ensure that the join condition field on the drive table in the JOIN statement has been indexed;
Ensure that the join condition field on the driver table has been indexed for the purposes of the above two points, only to let the driver table's Join condition field is indexed, in order to ensure that each query in the loop can consume less resources, This is the actual optimization method to optimize the inner loop.
4. If the Join condition field of the driver table is not guaranteed to be indexed and the memory resource is sufficient, do not hesitate to join
buffer settings:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More