Within SQL Server, three types of inner join operations are implemented, and most people have never heard of these connection types since they are not logical connections and are rarely used in code. So when are they going to be used? The answer depends on the situation. This means that you have to rely on recordsets and indexes. The query optimizer always chooses the optimal physical connection type intelligently. We know that the SQL optimizer creates a scheduling cost based on query overhead and chooses the best connection type based on this.
How exactly does the query optimizer choose the connection type from within?
SQL Server implemented some algorithms for the query optimizer's selection of connection types, let's take a look at some of the following examples of practice and conclude.
First I give some basic ideas about how the connection works and when it works, and how the optimizer decides which type of inner join to use.
· Depends on table size
· Depends on whether the connection column has an index
· Depends on whether the connection column is sorted
Test environment:
Memory: 4GB
database servers: SQL Server 2008 (RTM) [SQL] View plain copy create table tablea (id  (Int identity ,name varchar) declare @i int set @i=0 while (@i<100) begin insert into tablea (name) select name from master.dbo.spt_values set @i=@i+1 end --select count (*) from dbo.tableA --250600 go create table tableb (Id int identity ,name varchar) declare @i int set @i=0 while (@i<100) begin insert into tableb (name) select name from master.dbo.spt_values Set @i=@i+1 end -- select count (*) from dbo.tableb --250600 select * from dbo.tablea a join tableb b on ( A.id=b.id)
Test 1: Large table, no index
Now to create a clustered index: [SQL] View plain copy create unique clustered index Cx_tablea on TableA (ID) create unique clustered index Cx_tableb on TableB (ID)
Test 1: Large table, indexed
If any of the tables in the connection have an index, then a hash Join is used. I didn't put all the screenshots on the results, and if you're interested you can delete any of the indexes in the table to do the test.
Test 2: Middle table, no index
CREATE TABLE first: [SQL] View plain copy CREATE TABLE TableC (ID int identity,name varchar ()) insert into TableC (name) Select n Ame from Master.dbo.spt_values--Select COUNT (*) from Dbo.tablec--2506 CREATE TABLE tabled (id int identity,name var Char to tabled (name) select name from master.dbo.spt_values select * from Dbo.tablec C join tabled D On (c.id=d.id)--Select COUNT (*) from dbo.tabled--2506
Test 2: Medium table, indexed
First or foremost, create a clustered index: [SQL] View plain copy create unique clustered index Cx_tablec on TableC (ID) create unique clustered index Cx_tabled on tabled (ID)
For a medium-size table, if any of the tables in the connection have an index, the merge Join is used. Test 3: Small table, no index
[SQL] View plain copy CREATE TABLE Tablee (ID int identity,name varchar ()) insert into Tablee (name) SELECT Top N Ame from Master.dbo.spt_values--Select COUNT (*) from Dbo.tablee--10 CREATE TABLE tablef (id int identity,name Varch AR) insert into Tablef (name) select top name from Master.dbo.spt_values-select COUNT (*) from Dbo.tablef- -10
Test 3: Small table, indexed
Create a clustered index: [SQL] View plain copy create unique clustered index Cx_tablee on Tablee (ID) Create unique clustered index Cx_ta Blef on Tablef (ID)
For small tables, if there are indexes in any of the tables, then the nested Loop Join is used.
The same can be done in the other direction, such as comparison of the table in large table comparison small table. [SQL] View plain Copy select * Dbo.tablea a join TableC C on (a.id=c.id) SELECT * from Dbo.tablea a join TABL EE E on (a.id=e.id) SELECT * from Dbo.tablec C join Tablee E on (c.id=e.id)
In this case, if all or part of the table has an index, the nested Loop Join is used, and if neither is hashjoin.
Of course you can also force the optimizer to use any type of connection, but this is not a recommended practice. The query optimizer is intelligent and can dynamically choose the best one. Here I just show the call to Mergejoin, so the optimizer uses a mergejoin substitution that would have used Hashjoin (test 1 has no index). [SQL] View plain copy select * Dbo.tablea A join TableB B on (a.id=b.id) option (merge Join) SELECT * FROM dbo . TableA A Inner Merge Join TableB B on (a.id=b.id)
Table 1 Testing a unique clustered index
According to the table above:
Ø If two tables are not indexed, hash Join is selected inside the query optimizer
Ø If all two tables have indexes, the inner will select the merge join (large table)/nestedloop join (small table)
Ø if one of the tables has an index, the query optimizer will select the merge Join (the middle table)/hashjoin (large table)/nestedloop Join (Small table & large table vs small table)
Table 2 Testing the clustered index (createclustered Indexcx_tablea Ontablea (ID))
Table size
With index (Both)
Without Index (Both)
Either of table has index
Big (Both)
HASH
HASH
HASH
Medium (Both)
HASH
HASH
HASH
Small (Both)
NESTED LOOP
NESTED LOOP
HASH
Big Vs Small (medium)
HASH
HASH
HASH
According to the table above:
This test is done without using a unique clustered index, knowing that if you create an index without using the unique keyword, SQL Server will not be guaranteed to know that it is unique, so it defaults to creating a 4-byte integer as a unique identifier.
The mergejoin is not used if you create a clustered index without using the unique keyword, based on the table above.
Thanks for @dave's mail, now add a second chart.
Summarize:
Merge Join
The Merge join is for those tables indexed on the Join column, which can be either a clustered index or a nonclustered index. Merge is the best join type for this scenario, requiring two tables to be indexed, so it's sorted and easier to match and return data.
Hash Join
A Hash join is a large table that has no index or any of the indexes. For this case it is the best join type, for what? Because it works well in an environment where there are no indexes for large tables and parallel queries, and provides the best performance. Most people say it's a heavy lift for a join.
Nested Loop Join
The Nested Loop join is an indexed large table for those small tables that have an index or one of them. It joins those small tables that need to be cycled to work best from one to the other table in the case of a row comparison.
I hope you can now understand how the query optimizer chooses the optimal query type.
SOURCE Address: Merge join vs Hash join vs Nested Loop Join