How MySQL optimizes the left join

Source: Internet
Author: User

Today encountered a left join optimization problem, engaged in the afternoon, in the middle of a lot of information, the query plan for MySQL and query optimization has a further understanding, make a simple record:

Select c.* from hotel_info_original C
Left join Hotel_info_collection H
On C.hotel_type=h.hotel_type and c.hotel_id =h.hotel_id
where h.hotel_id is null

This SQL is used to query out the C table in the H table has no records, so think of the Left join features (return all records on the right table does not meet the matching criteria of the record corresponding row return null) to meet the demand, but this query is very slow. Look at the query plan first:



Rows represents the number of rows to scan for each row of this step relative to the previous step, and you can see that the SQL needs to scan the number of rows to 35773*8134, a very large number. The number of records for the C and H tables is 40000+ and 10000+, which is almost two of the cost of the Cartesian product (SELECT * from C,h).
So I went online to check the MySQL implementation of the principle of join, originally MySQL internal use of a called nested loop join algorithm. The Nested loop Join is actually the result set of the driver table as the loop base data, and then one line of data from the result set as a filter to the next table to query the data, and then merge the results. If there is a third join, then through the first two table join result set as the circular base data, once again through the loop query condition to the third table query data, so back and forth, basically MySQL is the most easily understood algorithm to implement Join. Therefore, the choice of the driver table is very important, the data of the driver table can significantly reduce the number of rows scanned.
So why is the join more efficient than the left join in general? Many people say that they do not understand the reasons, only the same, I have a little insight this afternoon. In general, the two tables involved in the joint query will be a small, if it is a join, in the absence of other filtering conditions, MySQL will choose a small table as the driver table, but the left join is generally used as a large table to join the small table, and The nature of the join itself determines that MySQL will use a large table to do the driver table, so that the efficiency is a lot worse, if I change the SQL above to
Select c.* from hotel_info_original C
Join Hotel_info_collection H
On C.hotel_type=h.hotel_type and c.hotel_id =h.hotel_id
The query plan is as follows:



Obviously, MySQL chose a small table as the driver table, and then the index on the (Hotel_id,hotel_type) instantly lowered a lot of magnitude ....
In addition, I learned today a general rule about the left join, that is, if the Where condition contains a non-null condition of the right table (except is null), then it is equivalent to the join statement and can be rewritten directly into a join statement.
Postscript:
This issue has been further understood with the view of MySQL reference manual. MySQL will divide the join into System/const/eq_ref/ref/range/index/all and so on when the join is executed, and the efficiency of the connection before
Descending, in turn, for my first SQL, the connection type is index, so it is almost the effect of a full table scan. But I'm surprised I've declared the unique key on the (Hotel_id,hotel_type) two columns, according to the official document connection type should be eq_ref.
This problem has been bothering me for two days, in Google and StackOverflow have not found to explain the issue of the article, I have no solution to this problem? With the determination to solve this problem, I looked through it again today. MySQL Official documentation
In the section on optimizing queries, you see this: one of the problems here is that MySQL can more efficiently use indexes on columns that declare the same type and size. I felt that I had found the problem, so I changed all the encoding and collation of the original and collection tables (the rules that determine the comparison of characters) to a unified Utf8_general_ci (HOTEL_TYPE,HOTEL_ID). , and then run the first SQL query plan again, and get the following result:



The connection type has been optimized to ref by index, and if Hotel_type is declared as NOT NULL it can be optimized to eq_ref, but the effect is small, and the SQL can run within 0.01ms after optimization.

So how to optimize left join:
1, the condition as far as possible to filter some will drive table become smaller, with small table to drive large table

2, the right table of the condition column must be indexed (primary key, unique index, prefix index, etc.), it is best to make the type reach range and above (Ref,eq_ref,const,system)

3, ignore the above two points, generally do not use left join~~!

How MySQL optimizes the left join

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.