MySQL supports SQL subqueries. This technique can use the SELECT statement to create a single column query result, and then use this result as a filter condition in another query. Subqueries can be used to complete SQL operations that require multiple logical steps at a time. At the same time, transactions or tables can be prevented from being locked and can be easily written. However, in some cases, subqueries can be replaced by JOIN queries more efficiently.
How join works
Join is implemented by using the Nested Loop Join algorithm, that is, the result set of the driving table is used as the basic data of the Loop, the data in the result set is used as the filter condition to query data in the next table one by one, and then the results are merged. If multiple joins exist, the previous result set is used as the cyclic data, and the data is queried again in the next table as the cyclic condition.
Next, we use a three-table join query to describe the implementation of mysql's Nested Loop Join.
The code is as follows: |
Copy code |
Select m. subject msg_subject, c. content msg_content From user_group g, group_message m, group_message_content c Where g. user_id = 1 And m. group_id = g. group_id And c. group_msg_id = m. id |
Use explain to see the execution plan:
The code is as follows: |
Copy code |
Explain select m. subject msg_subject, c. content msg_content from user_group g, group_message m, Group_message_content c where g. user_id = 1 and m. group_id = g. group_id and c. group_msg_id = m. id \ G; the result is as follows: * *************************** 1. row *************************** Id: 1 Select_type: SIMPLE Table: g Type: ref Possible_keys: user_group_gid_ind, user_group_uid_ind, user_group_gid_uid_ind Key: user_group_uid_ind Key_len: 4 Ref: const Rows: 2 Extra: * *************************** 2. row *************************** Id: 1 Select_type: SIMPLE Table: m Type: ref Possible_keys: PRIMARY, idx_group_message_gid_uid Key: idx_group_message_gid_uid Key_len: 4 Ref: g. group_id Rows: 3 Extra: * *************************** 3. row *************************** Id: 1 Select_type: SIMPLE Table: c Type: ref Possible_keys: idx_group_message_content_msg_id Key: idx_group_message_content_msg_id Key_len: 4 Ref: m. id Rows: 2 |
Extra: from the results, we can see that the explain selects user_group as the driving table. First, index user_group_uid_ind to search for the index ref of the const condition, then, the group_id field of the result set filtered out in the user_group table is used as the query condition to query group_message cyclically, then, the group_message id in the filtered result set is used as the condition to perform cyclic comparison query with group_msg_id of group_message_content to obtain the final result.
This process can be represented by the following code:
The code is as follows: |
Copy code |
For each record g_rec in table user_group that g_rec.user_id = 1 { For each record m_rec in group_message that m_rec.group_id = g_rec.group_id { For each record c_rec in group_message_content that c_rec.group_msg_id = m_rec.id Pass the (g_rec.user_id, m_rec.subject, c_rec.content) row Combination to output; } } |
If you remove the index of the group_msg_id field on the group_message_content table, the execution plan will be different.
The code is as follows: |
Copy code |
Drop index idx_group_message_content_msg_id on group_message_content; Explain select m. subject msg_subject, c. content msg_content from user_group g, group_message m, Group_message_content c where g. user_id = 1 and m. group_id = g. group_id and c. group_msg_id = m. id \ G; |
The execution plan is as follows:
The code is as follows: |
Copy code |
* *************************** 1. row *************************** Id: 1 Select_type: SIMPLE Table: g Type: ref Possible_keys: user_group_uid_ind Key: user_group_uid_ind Key_len: 4 Ref: const Rows: 2 Extra: * *************************** 2. row *************************** Id: 1 Select_type: SIMPLE Table: m Type: ref Possible_keys: PRIMARY, idx_group_message_gid_uid Key: idx_group_message_gid_uid Key_len: 4 Ref: g. group_id Rows: 3 Extra: * *************************** 3. row *************************** Id: 1 Select_type: SIMPLE Table: c Type: ALL Possible_keys: NULL Key: NULL Key_len: NULL Ref: NULL Rows: 96
|
Extra: Using where; Using join buffer because the index is deleted, the access to group_message_content is changed from ref to ALL, and the information related to keys is also changed to NULL, extra information is also changed to Using Where and Using join buffer. That is to say, the content must be obtained only by performing where filter on the data of the entire table. Using join buffer refers to the Cache, which is used only when the join type is ALL, index, rang or index_merge. The following code can be used to describe the usage process:
The code is as follows: |
Copy code |
For each record g_rec in table user_group { For each record m_rec in group_message that m_rec.group_id = g_rec.group_id { Put (g_rec, m_rec) into the buffer If (buffer is full) Flush_buffer (); } } Flush_buffer (){ For each record c_rec in group_message_content that c_rec.group_msg_id = c_rec.id { For each record in the buffer Pass (g_rec.user_id, m_rec.subject, c_rec.content) row combination to output; } Empty the buffer; } |
In the implementation process, we can see that the result sets of user_group and group_message are put in the join buffer, instead of having to associate user_group and group_message with group_message_content each time, which is unnecessary; note that only the column values in the query results are retained in the join buffer. The size of the column values does not depend on the table size. We can see in the pseudo code that when the join buffer is filled up, mysql will flush buffer.
Suppose we want to retrieve all users without order records, we can use the following query to complete:
The code is as follows: |
Copy code |
SELECT * FROM customerinfo WHERE CustomerID NOT in (SELECT CustomerID FROM salesinfo)
|
If you use JOIN to complete this query, the speed will be much faster. Especially when the salesinfo table has an index on CustomerID, the performance will be better. The query is as follows:
The code is as follows: |
Copy code |
SELECT * FROM customerinfo Left join salesinfoON customerinfo. CustomerID = salesinfo. CustomerID WHERE salesinfo. CustomerID IS NULL
|
The reason why JOIN is more efficient is that MySQL does not need to create a temporary table in the memory to perform the query in two steps.
Left join is often used when two tables intersect. When a large number of tables contain a large set of result data, it may even cause the consequences of unavailability.
The following error message is prompted for the database:
The SELECT wocould examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL _BIG_SELECTS = 1 or SET SQL _MAX_JOIN_SIZE = # if the SELECT is okay
Solution 1: Follow the prompts to execute the corresponding code. I tested it.
SET SQL _BIG_SELECTS = 1
When the preceding statements are executed before the SQL statement is executed, the query can proceed smoothly.
Solution 2: Add an index to the two associated fields, and the program runs smoothly.
Solution 3: modify SQL statements to reduce the result set.
The code is as follows: |
Copy code |
Before modification SELECT w. *, r. sound_url FROM sti_words AS w Left join sti_word_list_ja AS r ON w. word = r. word WHERE w. insert_date = '2017-11-16' AND w. user_id = 54 Group by w. word Order by w. id LIMIT 0, 30 After modification Select ww. *, r. sound_url From ( SELECT w .* FROM sti_words AS w WHERE w. insert_date = '2017-11-16' AND w. user_id = 54 ) As ww Left join sti_word_list_ja AS r ON ww. word = r. word Group by ww. word Order by ww. id LIMIT 0, 30 |
Optimization summary of join statements
1. Use a small result set to drive a large result set and minimize the total number of Nested Loop cycles in the join statement;
2. Optimize the inner Loop of the Nested Loop first, because the inner Loop has the most execution times in the Loop, and the performance of each Loop can be greatly improved with a small increase;
3. Create an index on the join field of the driven table;
4. When the index cannot be created on the join field of the drive table, set enough Join Buffer Size.