Comparative analysis of Merge join, Hash join and Nested Loop join

Source: Internet
Author: User
Within SQL Server, three types of inner join operations are implemented, and most people have never heard of these connection types since they are not logical connections and are rarely used in code. So when are they going to be used? The answer depends on the situation. This means that you have to rely on recordsets and indexes. The query optimizer always chooses the optimal physical connection type intelligently. We know that the SQL optimizer creates a scheduling cost based on query overhead and chooses the best connection type based on this.

How exactly does the query optimizer choose the connection type from within?

SQL Server implemented some algorithms for the query optimizer's selection of connection types, let's take a look at some of the following examples of practice and conclude.

First I give some basic ideas about how the connection works and when it works, and how the optimizer decides which type of inner join to use.

· Depends on table size

· Depends on whether the connection column has an index

· Depends on whether the connection column is sorted

Test environment:

Memory: 4GB

        database servers: SQL Server 2008 (RTM) [SQL] View plain copy create table tablea  (id&nbsp (Int identity ,name varchar)    declare @i int   set @i=0    while  (@i<100)    begin   insert into tablea  (name)     select name from master.dbo.spt_values   set @i=@i+1   end   --select count (*)  from dbo.tableA --250600   go   create  table tableb  (Id int identity ,name varchar)    declare @i  int   set @i=0   while  (@i<100)    begin   insert  into tableb  (name)    select name from master.dbo.spt_values   Set  @i=@i+1   end  -- select count (*)  from dbo.tableb --250600   select * from dbo.tablea a join tableb b   on  ( A.id=b.id)   

Test 1: Large table, no index

Now to create a clustered index: [SQL] View plain copy create unique clustered index Cx_tablea on TableA (ID) create unique clustered index Cx_tableb on TableB (ID)

Test 1: Large table, indexed

If any of the tables in the connection have an index, then a hash Join is used. I didn't put all the screenshots on the results, and if you're interested you can delete any of the indexes in the table to do the test.

Test 2: Middle table, no index

CREATE TABLE first: [SQL] View plain copy CREATE TABLE TableC (ID int identity,name varchar ()) insert into TableC (name) Select n Ame from Master.dbo.spt_values--Select COUNT (*) from Dbo.tablec--2506 CREATE TABLE tabled (id int identity,name var   Char to tabled (name) select name from master.dbo.spt_values select * from Dbo.tablec C join tabled D On (c.id=d.id)--Select COUNT (*) from dbo.tabled--2506

Test 2: Medium table, indexed

First or foremost, create a clustered index: [SQL] View plain copy create unique clustered index Cx_tablec on TableC (ID) create unique clustered index Cx_tabled on tabled (ID)

For a medium-size table, if any of the tables in the connection have an index, the merge Join is used. Test 3: Small table, no index
[SQL] View plain copy CREATE TABLE Tablee (ID int identity,name varchar ()) insert into Tablee (name) SELECT Top N Ame from Master.dbo.spt_values--Select COUNT (*) from Dbo.tablee--10 CREATE TABLE tablef (id int identity,name Varch AR) insert into Tablef (name) select top name from Master.dbo.spt_values-select COUNT (*) from Dbo.tablef- -10

Test 3: Small table, indexed

Create a clustered index: [SQL] View plain copy create unique clustered index Cx_tablee on Tablee (ID) Create unique clustered index Cx_ta Blef on Tablef (ID)

For small tables, if there are indexes in any of the tables, then the nested Loop Join is used.

The same can be done in the other direction, such as comparison of the table in large table comparison small table. [SQL] View plain Copy select * Dbo.tablea a join TableC C on (a.id=c.id) SELECT * from Dbo.tablea a join TABL EE E on (a.id=e.id) SELECT * from Dbo.tablec C join Tablee E on (c.id=e.id)

In this case, if all or part of the table has an index, the nested Loop Join is used, and if neither is hashjoin.

Of course you can also force the optimizer to use any type of connection, but this is not a recommended practice. The query optimizer is intelligent and can dynamically choose the best one. Here I just show the call to Mergejoin, so the optimizer uses a mergejoin substitution that would have used Hashjoin (test 1 has no index). [SQL] View plain copy select * Dbo.tablea A join TableB B on (a.id=b.id) option (merge Join) SELECT * FROM dbo . TableA A Inner Merge Join TableB B on (a.id=b.id)

Table 1 Testing a unique clustered index

According to the table above:

Ø If two tables are not indexed, hash Join is selected inside the query optimizer

Ø If all two tables have indexes, the inner will select the merge join (large table)/nestedloop join (small table)

Ø if one of the tables has an index, the query optimizer will select the merge Join (the middle table)/hashjoin (large table)/nestedloop Join (Small table & large table vs small table)

Table 2 Testing the clustered index (createclustered Indexcx_tablea Ontablea (ID))

Table size

With index (Both)

Without Index (Both)

Either of table has index

Big (Both)

HASH

HASH

HASH

Medium (Both)

HASH

HASH

HASH

Small (Both)

NESTED LOOP

NESTED LOOP

HASH

Big Vs Small (medium)

HASH

HASH

HASH


According to the table above:

This test is done without using a unique clustered index, knowing that if you create an index without using the unique keyword, SQL Server will not be guaranteed to know that it is unique, so it defaults to creating a 4-byte integer as a unique identifier.

The mergejoin is not used if you create a clustered index without using the unique keyword, based on the table above.

Thanks for @dave's mail, now add a second chart.

Summarize:

Merge Join

The Merge join is for those tables indexed on the Join column, which can be either a clustered index or a nonclustered index. Merge is the best join type for this scenario, requiring two tables to be indexed, so it's sorted and easier to match and return data.

Hash Join

A Hash join is a large table that has no index or any of the indexes. For this case it is the best join type, for what? Because it works well in an environment where there are no indexes for large tables and parallel queries, and provides the best performance. Most people say it's a heavy lift for a join.

Nested Loop Join

The Nested Loop join is an indexed large table for those small tables that have an index or one of them. It joins those small tables that need to be cycled to work best from one to the other table in the case of a row comparison.

I hope you can now understand how the query optimizer chooses the optimal query type.

SOURCE Address: Merge join vs Hash join vs Nested Loop Join

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.