Analysis and comparison of in and join in multiple table queries

Source: Internet
Author: User

In is the keyword of the subquery, join is the key word of the connection, in the project development often will use the multiple table query, and the subquery and the connection are the important ways to realize the multiple table query. That's how the two work. In and join which is better. The following is to analyze and compare.

Now there are test1 and test2 two tables, with no constraints like primary keys, foreign keys, and only one field. Two tables are irrelevant.

Now use the In keyword to implement the subquery, test2 as the subquery table (external table):

To view the execution plan:

Use the Join keyword to implement the connection, and also test2 as an external table:

To view the execution plan:

Analysis: When using an in subquery to implement a multiple-table query, it can be seen from the execution plan that the whole query is divided into 3 parts, the query with ID = 1 has two, and the query with ID = 2 has one. High level of ID, priority query. The query for id = 2 corresponds to the FTS of the test2 (subquery table). The query with id = 1 is then performed, and the query at the same level is executed sequentially from the top down. The plan shows that this query is a subquery (subquery), while querying test1, use the join buffer (Blocked Nested loop), the connection buffer (blocking nested loops). When you implement a multiple-table query using a join connection, you first query the Test2 table (the external table), almost as in (FTS), Query the Test1 table, and in the same way in, you use the join buffer (Blocked Nested Loop) What exactly does the buffer (Blocked Nested Loop) mean, I think this blog has already explained very clearly. Http://blog.itpub.net/22664653/viewspace-1692317/concludes that in unrelated (indexed), multiple-table queries, queries using in and join are first adding the query results of the external table to the connection buffer. Then the data from the internal table is fetched into the buffer for comparison (nested loops). There is little difference in the query plan. However, in the presence of a priority relationship, a subquery query is more than a join, in which case the join is more excellent.

Now add a primary key (index) to the Test1 table, add a FOREIGN KEY constraint (index) to the Test2 table, and the two tables are related.

To do the same query, return the same result:

To view an in-mode execution plan:

To view the execution plan for join mode:

Analysis: Now when querying using in mode, the subquery is no longer displayed as subquery (if the subquery has a different priority), it is a process of reference. The external table test2 is scanned with the help of the index, and the test1 is scanned by the index, which references the Test2 ID column. Using the Join method is also the same as having a reference process. In both ways, queries do not use the connection buffer and blocking nested loops mentioned above. To sum up, joint lookup is a reference process when two tables are related (the foreign key is connected), whether in or join.

Written here, it seems that in and join in table-related (logical foreign key), do not know which is better, the following to practice.

Practical Application:

The following is a sample database Sakila with MySQL (There are 599 customer information in the Customer table, and the primary key is customer_id.) There are 16044 rows of data in the rental table, where the primary key is rental_id, the foreign key column customer_id reference the primary key in the Customer table) executes in and join to implement the multiple table query separately:

In query statement: Select CONCAT (first_name,last_name) from the customer where customer_id in (select customer_id from rental where rental _id <=16000);

Result (599 Customer name information returned):

Slow query log:

# time:160717 21:17:58
# User@host:root[root] @ localhost [127.0.0.1]  Id:
# Query_time:0.000000
  lock_time:0.000000 rows_sent:599  rows_examined:1198 use
Sakila;
SET timestamp=1468761478;
Select CONCAT (first_name,last_name) from the customer where customer_id in (select customer_id from rental where rental_id ; =16000);

Join query statement: SELECT CONCAT (First_name,last_name) from the customer as a INNER JOIN rental as b on a.customer_id = b.customer_id WH ERE rental_id<=16000;

Results (returned 15995 rows of data, found that there are many duplicate names):

Slow query log:

# time:160717 21:19:17
# User@host:root[root] @ localhost [127.0.0.1]  Id:
# Query_time:0.030000
  lock_time:0.000000 rows_sent:15995  rows_examined:16643
SET timestamp=1468761557;
SELECT CONCAT (first_name,last_name) from the customer as a INNER JOIN rental as b on a.customer_id = b.customer_id WHERE Renta l_id<=16000;

Use the DISTINCT keyword to heavy join query statements: SELECT DISTINCT CONCAT (First_name,last_name) from customer as a INNER JOIN rental as B on A.custo mer_id = b.customer_id WHERE rental_id <=16000;

Slow query log:

# time:160717 21:20:31
# User@host:root[root] @ localhost [127.0.0.1]  Id:
# query_time:0.010000  lock_time:0.000000 rows_sent:599  rows_examined:1797
SET timestamp=1468761631;
SELECT DISTINCT CONCAT (first_name,last_name) from the customer as a INNER JOIN rental as b on a.customer_id = b.customer_id WH ERE rental_id <=16000;

Profiling: Because the customer_id column of the rental table acts as a foreign key column, the reference is to the primary key customer_id of the Customer table. So the two tables on the query are related tables. It has been analyzed that such in and join implementation of a multiple-table query does not exist the connection buffer and blocking nested loops. however, they are searched by reference relationships. By comparing the lookup time (SQL efficiency) with the number of rows retrieved (disk IO), in which case I would choose in to query.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.