How to optimize Mysql join queries

Last Update:2017-01-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MySQL supports SQL subqueries. This technique can use the SELECT statement to create a single column query result, and then use this result as a filter condition in another query. Subqueries can be used to complete SQL operations that require multiple logical steps at a time. At the same time, transactions or tables can be prevented from being locked and can be easily written. However, in some cases, subqueries can be replaced by JOIN queries more efficiently.

How join works

Join is implemented by using the Nested Loop Join algorithm, that is, the result set of the driving table is used as the basic data of the Loop, the data in the result set is used as the filter condition to query data in the next table one by one, and then the results are merged. If multiple joins exist, the previous result set is used as the cyclic data, and the data is queried again in the next table as the cyclic condition.

Next, we use a three-table join query to describe the implementation of mysql's Nested Loop Join.

The code is as follows:	Copy code
Select m. subject msg_subject, c. content msg_content From user_group g, group_message m, group_message_content c Where g. user_id = 1 And m. group_id = g. group_id And c. group_msg_id = m. id

Use explain to see the execution plan:

The code is as follows:

Copy code

Explain select m. subject msg_subject, c. content msg_content from user_group g, group_message m,
Group_message_content c where g. user_id = 1 and m. group_id = g. group_id and c. group_msg_id = m. id \ G; the result is as follows:

* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: g
Type: ref
Possible_keys: user_group_gid_ind, user_group_uid_ind, user_group_gid_uid_ind
Key: user_group_uid_ind
Key_len: 4
Ref: const
Rows: 2
Extra:
* *************************** 2. row ***************************
Id: 1
Select_type: SIMPLE
Table: m
Type: ref
Possible_keys: PRIMARY, idx_group_message_gid_uid
Key: idx_group_message_gid_uid
Key_len: 4
Ref: g. group_id
Rows: 3
Extra:
* *************************** 3. row ***************************
Id: 1
Select_type: SIMPLE
Table: c
Type: ref
Possible_keys: idx_group_message_content_msg_id
Key: idx_group_message_content_msg_id
Key_len: 4
Ref: m. id
Rows: 2

Extra: from the results, we can see that the explain selects user_group as the driving table. First, index user_group_uid_ind to search for the index ref of the const condition, then, the group_id field of the result set filtered out in the user_group table is used as the query condition to query group_message cyclically, then, the group_message id in the filtered result set is used as the condition to perform cyclic comparison query with group_msg_id of group_message_content to obtain the final result.

This process can be represented by the following code:

The code is as follows:	Copy code
For each record g_rec in table user_group that g_rec.user_id = 1 { For each record m_rec in group_message that m_rec.group_id = g_rec.group_id { For each record c_rec in group_message_content that c_rec.group_msg_id = m_rec.id Pass the (g_rec.user_id, m_rec.subject, c_rec.content) row Combination to output; } }

If you remove the index of the group_msg_id field on the group_message_content table, the execution plan will be different.

The code is as follows:	Copy code
Drop index idx_group_message_content_msg_id on group_message_content; Explain select m. subject msg_subject, c. content msg_content from user_group g, group_message m, Group_message_content c where g. user_id = 1 and m. group_id = g. group_id and c. group_msg_id = m. id \ G;

The execution plan is as follows:

The code is as follows:

Copy code

* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: g
Type: ref
Possible_keys: user_group_uid_ind
Key: user_group_uid_ind
Key_len: 4
Ref: const
Rows: 2
Extra:
* *************************** 2. row ***************************
Id: 1
Select_type: SIMPLE
Table: m
Type: ref
Possible_keys: PRIMARY, idx_group_message_gid_uid
Key: idx_group_message_gid_uid
Key_len: 4
Ref: g. group_id
Rows: 3
Extra:
* *************************** 3. row ***************************
Id: 1
Select_type: SIMPLE
Table: c
Type: ALL
Possible_keys: NULL
Key: NULL
Key_len: NULL
Ref: NULL
Rows: 96

Extra: Using where; Using join buffer because the index is deleted, the access to group_message_content is changed from ref to ALL, and the information related to keys is also changed to NULL, extra information is also changed to Using Where and Using join buffer. That is to say, the content must be obtained only by performing where filter on the data of the entire table. Using join buffer refers to the Cache, which is used only when the join type is ALL, index, rang or index_merge. The following code can be used to describe the usage process:

The code is as follows:

Copy code

For each record g_rec in table user_group {
For each record m_rec in group_message that m_rec.group_id = g_rec.group_id {
Put (g_rec, m_rec) into the buffer
If (buffer is full)
Flush_buffer ();
}
}
Flush_buffer (){
For each record c_rec in group_message_content that c_rec.group_msg_id = c_rec.id {
For each record in the buffer
Pass (g_rec.user_id, m_rec.subject, c_rec.content) row combination to output;
}
Empty the buffer;
}

In the implementation process, we can see that the result sets of user_group and group_message are put in the join buffer, instead of having to associate user_group and group_message with group_message_content each time, which is unnecessary; note that only the column values in the query results are retained in the join buffer. The size of the column values does not depend on the table size. We can see in the pseudo code that when the join buffer is filled up, mysql will flush buffer.

Suppose we want to retrieve all users without order records, we can use the following query to complete:

The code is as follows:	Copy code
SELECT * FROM customerinfo WHERE CustomerID NOT in (SELECT CustomerID FROM salesinfo)

If you use JOIN to complete this query, the speed will be much faster. Especially when the salesinfo table has an index on CustomerID, the performance will be better. The query is as follows:

The code is as follows:	Copy code
SELECT * FROM customerinfo Left join salesinfoON customerinfo. CustomerID = salesinfo. CustomerID WHERE salesinfo. CustomerID IS NULL

The reason why JOIN is more efficient is that MySQL does not need to create a temporary table in the memory to perform the query in two steps.

Left join is often used when two tables intersect. When a large number of tables contain a large set of result data, it may even cause the consequences of unavailability.
The following error message is prompted for the database:

The SELECT wocould examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL _BIG_SELECTS = 1 or SET SQL _MAX_JOIN_SIZE = # if the SELECT is okay

Solution 1: Follow the prompts to execute the corresponding code. I tested it.

SET SQL _BIG_SELECTS = 1

When the preceding statements are executed before the SQL statement is executed, the query can proceed smoothly.

Solution 2: Add an index to the two associated fields, and the program runs smoothly.

Solution 3: modify SQL statements to reduce the result set.

The code is as follows:

Copy code

Before modification

SELECT w. *, r. sound_url
FROM sti_words AS w
Left join sti_word_list_ja AS r ON w. word = r. word
WHERE w. insert_date = '2017-11-16'
AND w. user_id = 54
Group by w. word
Order by w. id
LIMIT 0, 30

After modification

Select ww. *, r. sound_url
From (
SELECT w .*
FROM sti_words AS w
WHERE w. insert_date = '2017-11-16'
AND w. user_id = 54
) As ww
Left join sti_word_list_ja AS r ON ww. word = r. word
Group by ww. word
Order by ww. id
LIMIT 0, 30

Optimization summary of join statements

1. Use a small result set to drive a large result set and minimize the total number of Nested Loop cycles in the join statement;

2. Optimize the inner Loop of the Nested Loop first, because the inner Loop has the most execution times in the Loop, and the performance of each Loop can be greatly improved with a small increase;

3. Create an index on the join field of the driven table;

4. When the index cannot be created on the join field of the drive table, set enough Join Buffer Size.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to optimize Mysql join queries

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to optimize Mysql join queries

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support