MySQL Join principle

Source: Internet
Author: User
Tags joins

Let's take a look at the two tables of the experiment:

Table comments, Total row number 28856 table Comments_for, Total row number 57,comments_id is indexed, ID is listed as primary key. The above two tables are the basis of our testing, and then look at the index, comments_for this table comments_id is indexed, the ID is the primary key. Recently by a company a development asked to join the MySQL join problem, the number of under the release of my understanding of MySQL join is not very deep, so I also looked at a lot of documents, and finally in the Insidemysql public to see two of the analysis of the join, feeling written too good, Take it out and share my actual test of join. Here's a look at the MySQL algorithm for Join, a total of three (from Insidemysql): MySQL is only support a join algorithm Nested-loop join (nested loop link), unlike other commercial databases can support Hashiki and merge connections, However, MySQL's nested-loop join (nested loop link) also has many variants that can help MySQL perform join operations more efficiently: (1) Simple Nested-loop Join (image for Insidemysql) This algorithm is relatively simple, remove R1 from the driver table to match all the columns of S table, and then R2,R3, until all the data in the R table is matched, and then merge the data, you can see this algorithm to the S table for RN access, although simple, But it's still relatively expensive (2) index Nested-loop Join, implemented in such a way as: index nested contact because there are indexes on the non-driver table, the comparison is no longer required to compare the records, but can be indexed to reduce the comparison, thereby speeding up the query. This is the usual when we do the correlation query must require that the associated field has an index of one of the main reasons. When the algorithm is linked to a query, the driver table is looked up according to the index of the associated field, when the matching value is found on the index, and then the query is returned to the table, that is, the table will not be returned until the index is matched. As for the selection of the driver table, the MySQL optimizer generally chooses to use fewer records as the driver table, but when SQL is particularly complex, it does not rule out the wrong choice. In the way of the index nested link, if the association key of the non-driver table is the primary key, so the performance will be very high, if not the primary key, if the number of rows returned, the efficiency will be particularly low, because the table operation to multiple times. The index is associated first, and then the table-back operation is performed based on the primary key ID of the level two index. In this case, the performance will be very poor. (3) Block Nested-loop Join, implemented as follows: In the case of an index, MySQLWill try to use the index Nested-loop join algorithm, in some cases, may join column is no index, then the choice of MySQL is not the first introduction of the simple Nested-loop join algorithm, but will first use block The Nested-loop join algorithm. Block Nested-loop join compared to simple nested-loop join a process of intermediate processing, that is, join buffer, using the join buffer to the Driver table query join related columns are buffered to join Buffer, and then compare the batch with the non-driver table, which is also implemented, you can combine multiple comparisons once, reducing the frequency of non-driver table access. That is, you only need to access the S table once. In this case, there will be no multiple accesses to the non-driver table, and only then will the join buffer be accessed. In MySQL, we can set the value of the join buffer with the parameter join_buffer_size and then proceed. By default, join_buffer_size=256k will cache all required columns in the join buffer, including the Select column, instead of just caching the associated columns. In a SQL that has n joins associated with it, the N-1 join buffer is allocated at execution time. The above is finished, look at the concrete examples below (1) Full table join
EXPLAIN SELECT * from comments Gcjoin comments_for GCF on gc.comments_id=gcf.comments_id;

Take a look at the output information: You can see comments_for as the driver table when the full table is scanned, because the associated field is indexed, a full index scan of index IDX_COMMENTSID to match the non-driver table comments, one line at a time. The index Nested-loop Join is used at this time, and we can see that because the Comments_for table's magnitude is much smaller than comments, MySQL first chooses the small table comments_for as the driver table. (2) Full table join+ screening conditions
SELECT * from comments Gcjoin comments_for GCF on gc.comments_id==2056

At this time using the index Nested-loop Join, the driver table comments primary key is filtered, conforms to one, the non-driver table Comments_for index idx_commentsid to seek to match, the final match result is expected to affect a bar, This is the only one access to the IDX_COMMENTSID index of the non-driver table, the efficiency is relatively high. (3) Look at the case where the associated field is not indexed:
EXPLAIN SELECT * from comments Gcjoin comments_for GCF on gc.order_id=gcf.product_id

Let's take a look at the execution plan: from the execution plan we can see that this table join is using the Block Nested-loop join for table Association, first comments_for (only 57 rows) This small table as the driver table, and then comments_for The required data is cached in the join buffer, the comments table is scanned in batches, that is, only one match, provided that the join buffer is large enough to store the comments_for cache data. And we see a clear hint of the implementation plan: Using where; Using Join buffer (Block Nested Loop) The general situation is that our SQL needs to be optimized. Note that in this case, MySQL will also choose simple Nested-loop Join this method of violence, I do not understand how this optimizer is chosen, but generally use block Nested-loop Join, because the CBO is based on the cost, The performance of Block Nested-loop joins is much better than simple Nested-loop joins. (4) Look at the left join
EXPLAIN SELECT * from comments Gcleft joins Comments_for GCF on gc.comments_id=gcf.comments_id

Take a look at the execution plan: in this case, because our association field is indexed, so the index nested-loop join, but when there is no filter conditions will choose the first table as the driver table to join, to associate the index of the non-driver table Nested-loop Join. If you add the filter condition gc.comments_id = 2056, this will filter out an index Nested-loop Join for the non-driver table, which is highly efficient. If this is the following:
EXPLAIN SELECT * from comments_for gcfleft JOIN comments gc on gc.comments_id==2056

With the GCF table filtering, the GCF table is selected by default as the driver table, because it is clear that he has been screened, matching the conditions will be very small, specifically to see the execution plan: this, the join is basically very clear, not to be continued, welcome to point out the mistake, I will seriously correct ....

MySQL Join principle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.