Intermediate join --------- hash join & merge join & nested loop join

Source: Internet
Author: User
Nested loop join)

Loop nested join is the most basic link. As shown in its name, loop nesting is required. nested loop is the only method that supports inequality connection among the three methods, the process of this connection method can be simply described as follows:

Figure 1. Step 1 of loop nested join

Figure 2. Step 2 of loop nested join

From the above two figures, it is not difficult to see that the number of times the nested loop table searches for internal loops is equal to the number of rows in the External Loop. When there are no more rows in the external loop, the nested loop ends. In addition, we can see that this connection method requires an internal cycle of the table Order (that is, there is an index), and the number of rows in the external cycle table is smaller than the number of rows in the internal cycle, otherwise, the query analyzer is more inclined to hash join (which will be discussed later in this article ).

Through nested loop connections, we can see that with the increase of data volume, the consumption of performance will increase exponentially. Therefore, when the data volume reaches a certain level, this method is often used by the query analyzer.

 

 

Merge join)

In merge join, two ordered queues are connected, and both ends must be in order. Therefore, you do not need to constantly search the tables in the loop like loop join. Second, merge join requires at least one equal sign query analyzer in the table join condition to select merge join.

The merge join process can be described in the following figure:

Figure 8. Merge join Step 1

Merge join first retrieves the first row from the two input sets. If the row matches, the matching row is returned. If two rows do not match, the input set with a smaller value + is shown.

. The input set with smaller values goes down to 1.

 

Therefore, if the two ends of merge join are ordered, the merge join efficiency is very high. However, if you need to use the explicit sort to ensure the orderly implementation of merge join, hash join is a more efficient choice. However, there is also an exception, that is, the query analyzer may have to perform explicit sorting due to the existence of order by, group by, distinct, and so on. For the query analyzer, in all cases, the explicit sort has been implemented. Why not use the result of the sort directly to perform merge join with a lower cost? In this case, merge join is a better choice.

In addition, we can see from the merge join principle that when the join condition is an inequality (but not included! =), For example,> <> =, and so on, merge join has a better efficiency.

 

Hash join)

Hash matching is more complex than the previous two methods, but hash matching is better than merge join and loop join in case of a large amount of data and disorder. If the join columns are not sorted (that is, there is no index), the query analyzer tends to use Hash join.

Hash matching is divided into two phases: Generation and detection. The first is the generation phase. The specific process of the first phase is 12.

Figure 12. The first phase of hash matching

In Figure 12, each entry in the input source is computed by the hash function in different hash buckets. The selection of hash function and the number of hash buckets are both black boxes, microsoft has not published specific algorithms, but I believe they are already very good algorithms. In addition, entries in the hash bucket are unordered. Generally, the query optimizer uses a small input set at both ends of the connection as the first-stage Input Source.

Next is the test phase. For another input set, the hash function is also used for each row to determine the hash bucket in which the row should be matched with each row in the corresponding hash bucket, if yes, the corresponding row is returned.

By understanding the principle of hash matching, it is not difficult to see that hash matching involves hash functions, so the CPU consumption is very high. In addition, the rows in the hash bucket are unordered, therefore, the output results are unordered.

 

In the above cases, the memory can accommodate the memory required for the next generation phase. If the memory is tight, it will also involve Grace hash matching and recursive hash matching, this may use tempdb to consume a large amount of Io. Here I will not elaborate, interested students can move: http://msdn.microsoft.com/zh-cn/library/aa178403 (V = SQL .80). aspx.

Summary

The following table briefly summarizes the consumption and use cases of these connection methods:

 

Nested loop connection

Merge connections

Hash connection

Applicable scenarios

Small outer loop, ordered memory loop condition Columns

Both ends of the input are ordered.

Large data volume without Indexing

CPU

Low

Low (if no explicit sorting is available)

High

Memory

Low

Low (if no explicit sorting is available)

High

Io

High or low

Low

High or low

 

References:

Http://msdn.microsoft.com/zh-cn/library/aa178403 (V = SQL .80). aspx

Http://www.dbsophic.com/SQL-Server-Articles/physical-join-operators-merge-operator.html

The following blog has done some time-consuming experiments on three types of Join Operations. You can refer to the analysis:

Http://www.jasongj.com/2015/03/07/Join1/

Intermediate join --------- hash join & merge join & nested loop join

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.