SQL join-merge join

Source: Internet
Author: User

1 Overview
Merge join merge join. For two sets to perform merge join, an equivalent condition is required, and then two sorted sets are required.

2 one-to-operate and allow-to-operate
2.1 one-to-least
When two sets involved in merge join, one of them is unique in the equivalence condition (for example, select * from T1 inner join T2 on t1.a = t2. B, if T1 is unique in column A, it is one-to-one. The main steps are as follows: First, retrieve a record from each of the two sets for comparison. If the join condition is met, retrieve the row; otherwise, remove the small value record from the set, then the value is smaller than the next row of the set, and the comparison continues.
2.2 bytes-to bytes
When neither of the two sets involved in merge join is unique in the equivalence condition, use begin-to-minus (select * from T1 inner join T2 on t1.a = t2. B, when neither column A nor Column B is unique ). The main steps are as follows: both A and B have A1, a2 .. an, B1, B2 .. BN. Normally, each record of a (A1, a2 .. an) All B1 .. reading BN is a waste of performance. During processing, the database stores matching rows in B in tempdb. If the next row in a is the same, the content in tempdb is read; otherwise, the data in tempdb is deleted.
2.3 comparison between one-to-least and others-to-least
Obviously, one-to-one sequence is more efficient because it does not need a temporary table. So how can we let the query optimizer know that one of our sets is unique. The first method is to create a clustered index, and the second is the distinct and group by operators.

3 sorting and Indexing
One of the several major operations of the database is the sorting of large tables. Therefore, using merge join is not suitable for merge join if the table has a large amount of data and has no indexes. Therefore, when the data volume is large, you need to add an index for it.

4 Examples
Test Data

View code

 If    Exists ( Select   *   From SYS. Objects Where   Object_id   =   Object_id (N '  [DBO]. [goodstype]  '  ))  Drop   Table  [  DBO  ] . [  Goodstype  ]  Go  --  Item type table  Create   Table DBO. [  Goodstype  ]  (ID  Int  , Good_type_name Nvarchar ( 50  ));  Insert   Into  DBO. goodstype  Select   1 , '  Clothing  '  Union   All  Select   2 , ' Digital  '  Union   All  Select   3 , '  Household Appliances  '  If    Exists ( Select   *   From SYS. Objects Where   Object_id  =   Object_id (N '  [DBO]. [goods]  '  ))  Drop   Table   [  DBO  ] . [  Goods  ]  Go  --  Item type table Create   Table DBO. [  Goods  ]  (ID  Int  , Good_name  Nvarchar ( 50  ), Good_type  Int  );  Insert   Into  DBO. Goods Select   1 , '  ADT shirt  ' , 1  Union   All  Select   2 , '  Ad coat  ' , 1  Union  All  Select   3 , '  T002 TV  ' , 2  Union   All  Select   4 , '  Haier washing machine  ' , 2 Union   All  Select   5 , '  Hp222  ' , 3 

4.1
If no index is created, execute SQL

View code

Set StatisticsProfileOnSelect * FromGoodsAsGInner JoinGoodstypeAsGTOnG. good_type=GT. IDOption(MergeJoin)

Result:

Description
1> when no index is created, sort the two sets;
2> although it is unique in the connection condition, when a unique clustered index is not created, multiple-to-many connections are performed;

4.2
Create a non-clustered index and execute SQL

View code

 Create   Clustered   Index GT On  Goodstype (ID)  Create   Clustered   Index G On  Goods (good_type)  Set   Statistics Profile On  Select   *  From Goods As  G  Inner   Join Goodstype As GT On G. good_type =  GT. ID  Option (Merge Join )

Result:

Note:
1> after an index is created, the execution of merge join has no sorting overhead.
2> although both sets are indexed and the connection keywords are not repeated, there are still many-to-many connections because the optimizer does not know that they are unique.

4.3
Create a unique clustered index for one of the sets and execute SQL

View code

 Drop   Index GT On  Goodstype  Create   Unique   Clustered   Index Gut On  Goodstype (ID)  Set   Statistics Profile On  Select  *   From Goods As  G  Inner   Join Goodstype As GT On G. good_type =  GT. ID  Option (Merge Join )

Result

Note:
1> when creating a unique clustered index for a set, the connection is a one-to-one connection (the execution plan does not have the concept of a one-to-one connection)

5 Summary
when nested join is not suitable, merge join is considered. When using merge join, pay attention to two concepts: first, sorting, preferably index sorting; otherwise, real-time sorting of large data volumes will increase the cost too much; second, connection mode, whether it is one-to-many or many-to-many. If the keywords are not repeated, you can create a unique clustered index, that is, try to use one-to-many connections.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.