1 Overview
Merge join merge join. For two sets to perform merge join, an equivalent condition is required, and then two sorted sets are required.
2 one-to-operate and allow-to-operate
2.1 one-to-least
When two sets involved in merge join, one of them is unique in the equivalence condition (for example, select * from T1 inner join T2 on t1.a = t2. B, if T1 is unique in column A, it is one-to-one. The main steps are as follows: First, retrieve a record from each of the two sets for comparison. If the join condition is met, retrieve the row; otherwise, remove the small value record from the set, then the value is smaller than the next row of the set, and the comparison continues.
2.2 bytes-to bytes
When neither of the two sets involved in merge join is unique in the equivalence condition, use begin-to-minus (select * from T1 inner join T2 on t1.a = t2. B, when neither column A nor Column B is unique ). The main steps are as follows: both A and B have A1, a2 .. an, B1, B2 .. BN. Normally, each record of a (A1, a2 .. an) All B1 .. reading BN is a waste of performance. During processing, the database stores matching rows in B in tempdb. If the next row in a is the same, the content in tempdb is read; otherwise, the data in tempdb is deleted.
2.3 comparison between one-to-least and others-to-least
Obviously, one-to-one sequence is more efficient because it does not need a temporary table. So how can we let the query optimizer know that one of our sets is unique. The first method is to create a clustered index, and the second is the distinct and group by operators.
3 sorting and Indexing
One of the several major operations of the database is the sorting of large tables. Therefore, using merge join is not suitable for merge join if the table has a large amount of data and has no indexes. Therefore, when the data volume is large, you need to add an index for it.
4 Examples
Test Data
View code
If Exists ( Select * From SYS. Objects Where Object_id = Object_id (N ' [DBO]. [goodstype] ' )) Drop Table [ DBO ] . [ Goodstype ] Go -- Item type table Create Table DBO. [ Goodstype ] (ID Int , Good_type_name Nvarchar ( 50 )); Insert Into DBO. goodstype Select 1 , ' Clothing ' Union All Select 2 , ' Digital ' Union All Select 3 , ' Household Appliances ' If Exists ( Select * From SYS. Objects Where Object_id = Object_id (N ' [DBO]. [goods] ' )) Drop Table [ DBO ] . [ Goods ] Go -- Item type table Create Table DBO. [ Goods ] (ID Int , Good_name Nvarchar ( 50 ), Good_type Int ); Insert Into DBO. Goods Select 1 , ' ADT shirt ' , 1 Union All Select 2 , ' Ad coat ' , 1 Union All Select 3 , ' T002 TV ' , 2 Union All Select 4 , ' Haier washing machine ' , 2 Union All Select 5 , ' Hp222 ' , 3
4.1
If no index is created, execute SQL
View code
Set StatisticsProfileOnSelect * FromGoodsAsGInner JoinGoodstypeAsGTOnG. good_type=GT. IDOption(MergeJoin)
Result:
Description
1> when no index is created, sort the two sets;
2> although it is unique in the connection condition, when a unique clustered index is not created, multiple-to-many connections are performed;
4.2
Create a non-clustered index and execute SQL
View code
Create Clustered Index GT On Goodstype (ID) Create Clustered Index G On Goods (good_type) Set Statistics Profile On Select * From Goods As G Inner Join Goodstype As GT On G. good_type = GT. ID Option (Merge Join )
Result:
Note:
1> after an index is created, the execution of merge join has no sorting overhead.
2> although both sets are indexed and the connection keywords are not repeated, there are still many-to-many connections because the optimizer does not know that they are unique.
4.3
Create a unique clustered index for one of the sets and execute SQL
View code
Drop Index GT On Goodstype Create Unique Clustered Index Gut On Goodstype (ID) Set Statistics Profile On Select * From Goods As G Inner Join Goodstype As GT On G. good_type = GT. ID Option (Merge Join )
Result
Note:
1> when creating a unique clustered index for a set, the connection is a one-to-one connection (the execution plan does not have the concept of a one-to-one connection)
5 Summary
when nested join is not suitable, merge join is considered. When using merge join, pay attention to two concepts: first, sorting, preferably index sorting; otherwise, real-time sorting of large data volumes will increase the cost too much; second, connection mode, whether it is one-to-many or many-to-many. If the keywords are not repeated, you can create a unique clustered index, that is, try to use one-to-many connections.