Intermediate join --------- hash join & merge join & nested loop join

Last Update:2015-05-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Nested loop join)

Loop nested join is the most basic link. As shown in its name, loop nesting is required. nested loop is the only method that supports inequality connection among the three methods, the process of this connection method can be simply described as follows:

Figure 1. Step 1 of loop nested join

Figure 2. Step 2 of loop nested join

From the above two figures, it is not difficult to see that the number of times the nested loop table searches for internal loops is equal to the number of rows in the External Loop. When there are no more rows in the external loop, the nested loop ends. In addition, we can see that this connection method requires an internal cycle of the table Order (that is, there is an index), and the number of rows in the external cycle table is smaller than the number of rows in the internal cycle, otherwise, the query analyzer is more inclined to hash join (which will be discussed later in this article ).

Through nested loop connections, we can see that with the increase of data volume, the consumption of performance will increase exponentially. Therefore, when the data volume reaches a certain level, this method is often used by the query analyzer.

Merge join)

In merge join, two ordered queues are connected, and both ends must be in order. Therefore, you do not need to constantly search the tables in the loop like loop join. Second, merge join requires at least one equal sign query analyzer in the table join condition to select merge join.

The merge join process can be described in the following figure:

Figure 8. Merge join Step 1

Merge join first retrieves the first row from the two input sets. If the row matches, the matching row is returned. If two rows do not match, the input set with a smaller value + is shown.

. The input set with smaller values goes down to 1.

Therefore, if the two ends of merge join are ordered, the merge join efficiency is very high. However, if you need to use the explicit sort to ensure the orderly implementation of merge join, hash join is a more efficient choice. However, there is also an exception, that is, the query analyzer may have to perform explicit sorting due to the existence of order by, group by, distinct, and so on. For the query analyzer, in all cases, the explicit sort has been implemented. Why not use the result of the sort directly to perform merge join with a lower cost? In this case, merge join is a better choice.

In addition, we can see from the merge join principle that when the join condition is an inequality (but not included! =), For example,> <> =, and so on, merge join has a better efficiency.

Hash join)

Hash matching is more complex than the previous two methods, but hash matching is better than merge join and loop join in case of a large amount of data and disorder. If the join columns are not sorted (that is, there is no index), the query analyzer tends to use Hash join.

Hash matching is divided into two phases: Generation and detection. The first is the generation phase. The specific process of the first phase is 12.

Figure 12. The first phase of hash matching

In Figure 12, each entry in the input source is computed by the hash function in different hash buckets. The selection of hash function and the number of hash buckets are both black boxes, microsoft has not published specific algorithms, but I believe they are already very good algorithms. In addition, entries in the hash bucket are unordered. Generally, the query optimizer uses a small input set at both ends of the connection as the first-stage Input Source.

Next is the test phase. For another input set, the hash function is also used for each row to determine the hash bucket in which the row should be matched with each row in the corresponding hash bucket, if yes, the corresponding row is returned.

By understanding the principle of hash matching, it is not difficult to see that hash matching involves hash functions, so the CPU consumption is very high. In addition, the rows in the hash bucket are unordered, therefore, the output results are unordered.

In the above cases, the memory can accommodate the memory required for the next generation phase. If the memory is tight, it will also involve Grace hash matching and recursive hash matching, this may use tempdb to consume a large amount of Io. Here I will not elaborate, interested students can move: http://msdn.microsoft.com/zh-cn/library/aa178403 (V = SQL .80). aspx.

Summary

The following table briefly summarizes the consumption and use cases of these connection methods:

	Nested loop connection	Merge connections	Hash connection
Applicable scenarios	Small outer loop, ordered memory loop condition Columns	Both ends of the input are ordered.	Large data volume without Indexing
CPU	Low	Low (if no explicit sorting is available)	High
Memory	Low	Low (if no explicit sorting is available)	High
Io	High or low	Low	High or low

References:

Http://msdn.microsoft.com/zh-cn/library/aa178403 (V = SQL .80). aspx

Http://www.dbsophic.com/SQL-Server-Articles/physical-join-operators-merge-operator.html

The following blog has done some time-consuming experiments on three types of Join Operations. You can refer to the analysis:

Http://www.jasongj.com/2015/03/07/Join1/

Intermediate join --------- hash join & merge join & nested loop join

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Intermediate join --------- hash join & merge join & nested loop join

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Intermediate join --------- hash join & merge join & nested loop join

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support