Sharp SQL2014: Join algorithm, sql2014 join algorithm

Source: Internet
Author: User
Tags sql server query management studio

Sharp SQL2014: Join algorithm, sql2014 join algorithm
 

When performing a query in Microsoft SQLServer Management Studio, if you select the button in the toolbar, you can see the execution plan generated for the query. The execution plan graphically displays the data retrieval methods selected by the SQL Server query optimizer, such as table scanning, sorting, and hash matching. For join queries, SQL Server selects nested loop join, merge join, or hash join based on the data and indexes between joined tables.

7.7.1 nested loop join

Nested loop join is also known as "nested iteration". It uses a join input as an external input table (displayed as the top input in the graphic Execution Plan) and another join input as an internal (bottom) input table. External loops process external input tables row by row. The internal loop is executed for each external row and searches for matching rows in the internal input table. To put it simply, scan one of the join tables and search for matching rows for each row in the other join table.

If the external input is small (less than 10 rows) and the internal input is large and an index is created in advance, nested loop join is particularly effective. In many small transactions (such as those that only affect a small group of rows), index nested loop joins are better than merge joins and hash joins. In large queries, nested loop join is usually not the best choice.

For example, the following query uses nested loop join because the number of rows in the Sales. Customer table is only one row while the data volume in the Sales. SalesOrderHeader is large. The generated execution plan is 7-11.

USE AdventureWorks2014;

GO

SELECT *

FROM Sales. Customer

INNER JOINSales. SalesOrderHeader

ONCustomer. CustomerID = SalesOrderHeader. CustomerID

WHERE Customer. CustomerID = 1;

Figure 7-11 execution plans using nested loops

There are two nested loops in the plan, with only the left nested loop Operator Used for Sales. customer and Sales. while the nested loop on the right is used for Sales. the link between the index search of the SalesOrderHeader and the physical row location (key search. The Sales. Customer table in the upper-right corner of the execution plan is used as an external input to search for customers in the clustered index. For each customer, the nested loop operation performs a search for the IX_SalesOrderHeader_CustomerID index on the SalesOrderHeader. mermerid column, and then uses a key to locate the data row to be accessed.

7.7.2 merge connections

Merge join requires that the two inputs are sorted ON the merge columns. The merge columns are defined by the equivalent (ON) clause of the join predicates. Since each input is sorted, The Join Operation retrieves and compares a row from each input. For example, for an inner join operation, if the rows are equal, return. If the rows are not equal, the rows with a smaller value are discarded and the other row is obtained from the input. This process repeats until all rows are processed.

The merge join operation can be a regular operation or multiple-to-multiple operation. Use temporary tables to store rows for multiple-to-multiple join operations. If there are repeated values in each input, when processing each repeated item in one of the input, the other input must be rewound to the start position of the repeated item.

The merge join operation is fast, but if no index is created for the merged column, it may take a lot of time to select the merge join operation because it first needs to sort the column. However, if the data volume is large and pre-ordered data can be obtained from the index, the merge join is usually the fastest available join algorithm.

For example, the following query statement obtains the Order details. Because SalesOrderHeader and SalesOrderDetail have clustered indexes on the merged column SalesOrderID, the columns have been sorted, therefore, the query optimizer selects the merge join. 7-12.

USE AdventureWorks2014;

GO

SELECT *

FROM Sales. SalesOrderHeader

INNER JOINSales. SalesOrderDetail

ONSalesOrderHeader. SalesOrderID = SalesOrderDetail. SalesOrderID;

Figure 7-12 execution plan using merged join

7.7.3 hash join

Hash join can effectively process unordered large non-index input. Therefore, it is useful for processing intermediate results of complex queries. The intermediate results of the query are unindexed and are not sorted properly for the next operation in the query plan. In addition, the query optimizer only estimates the size of intermediate results. For complex queries, the estimation may have a large error. Therefore, if the intermediate result is much larger than expected, the algorithm used to process the intermediate result must be effective and moderately weakened. Sorting columns are strictly required like merging joins. It is unrealistic for intermediate results. The cost of sorting may be far greater than the direct retrieval cost of data.

There are two cases of selecting hash join: first, no suitable index is created for the join, and second, the intermediate result is relatively large.

Hash join has two types of input: generate input and test input. The query optimizer selects the smaller of the two as the generated input, applies the hash function to the couplet column value, and distributes the rows in the generated input to the hash bucket. A hash bucket is a structure that stores the locations of accessed data. With it, you can avoid unnecessary table scans during data retrieval.

To verify the use of hash connections without indexes, use the following statement to create copies of the tables Sales. Customer and Sales. SalesOrderHeader.

USE AdventureWorks2014;

GO

Select top 10 *

INTO MyCustomer

FROM Sales. Customer

Order by CustomerID;

 

Select top 100 *

INTO MySalesOrderHeader

FROM Sales. SalesOrderHeader

Order by CustomerID;

Execute the following query. The execution plan shown in 7-13 is displayed.

SELECT *

FROM MyCustomer

INNER JOINMySalesOrderHeader

ONMyCustomer. CustomerID = MySalesOrderHeader. CustomerID;

Figure 7-13 execution plan using hash join

Next, let's look at an interesting example. Only Sales is selected in the following query statement. the rows with CustomerID = 1 and Sales in Customer. the SalesOrderHeader is joined. because the number of joined rows is small, the data volume of intermediate results is also small. Therefore, the query optimizer uses nested loop join for statements. 7-14.

USE AdventureWorks2014;

GO

SELECT *

FROM Sales. Customer

INNER JOINSales. SalesOrderHeader

ONCustomer. CustomerID = SalesOrderHeader. CustomerID

WHERE Customer. CustomerID = 1;

Figure 7-14 nested loop join when the data volume is small

Similarly, if the preceding join clause removes the WHERE filter, the data size increases significantly. When this statement is executed, the query optimizer uses the hash join method. 7-15.

SELECT *

FROM Sales. Customer

INNER JOINSales. SalesOrderHeader

ONCustomer. CustomerID = SalesOrderHeader. CustomerID;

Figure 7-15 use Hash join when the data volume is large

7.7.4 use join prompt force join Policy

The JOIN prompt specifies that the query optimizer enforces the JOIN policy between two tables. The prompt includes loop join, merge join, and hash join for nested LOOP, HASH, and merge join respectively. If multiple join prompts are specified, the optimizer selects the join policy with the least overhead from the allowed join policies. You can also use the OPTION clause to specify the join policy. However, this method affects all the joins in the query. It is usually used in the old join syntax.

1. Specify a separate connection policy for each join

You can use loop join, merge join, and hash join prompts in the from clause to specify a separate JOIN policy for each JOIN. For example, the following query statement specifies the nested loop join.

USE AdventureWorks2014;

GO

SELECT *

FROM Sales. Customer

Inner loopjoin Sales. SalesOrderHeader

ONCustomer. CustomerID = SalesOrderHeader. CustomerID;

For example, the following query statement specifies to use a merge join.

USE AdventureWorks2014;

GO

SELECT *

FROM Sales. Customer

Innermerge join Sales. SalesOrderHeader

ONCustomer. CustomerID = SalesOrderHeader. CustomerID;

When the join prompt is used in multi-table join, the join execution sequence is affected. As described earlier, the query optimizer selects the join to be executed first based on the efficiency-first principle without affecting the correctness of returned results. For example, as shown in the execution plan 7-16 of the following statement, we can see that the connection between Sales. SalesOrderHeader and Sales. SalesOrderDetail is executed first, and then the join result is connected with Sales. Customer.

USE AdventureWorks2014;

GO

SELECT *

FROM Sales. Customer

INNER JOINSales. SalesOrderHeader

ONCustomer. CustomerID = SalesOrderHeader. CustomerID

INNER JOINSales. SalesOrderDetail

ON SalesOrderHeader. SalesOrderID = SalesOrderDetail. SalesOrderID;

Figure 7-16 execution plan with no join prompt

The following statement is Sales. customer and Sales. the SalesOrderHeader specifies the merge join prompt, and this prompt only takes effect for the two tables, with Sales. the join policy of SalesOrderDetail is still determined by the query optimizer. Because the combination of Sales. Customer and Sales. SalesOrderHeader is explicitly specified, the optimizer executes the join first, instead of the join between Sales. SalesOrderHeader and Sales. SalesOrderDetail. Otherwise, the join results of Sales. Customer, Sales. SalesOrderHeader, and Sales. SalesOrderDetail will be merged. Figure 7-17 shows the execution plan of the statement.

SELECT *

FROM Sales. Customer

Innermerge join Sales. SalesOrderHeader

ONCustomer. CustomerID = SalesOrderHeader. CustomerID

INNER JOINSales. SalesOrderDetail

ONSalesOrderHeader. SalesOrderID = SalesOrderDetail. SalesOrderID;

Figure 7-17 execution plan after the connection prompt is used

If you want to merge the join results of Sales. Customer and Sales. SalesOrderHeader and Sales. SalesOrderDetail, you should use nested join. refer to the following statement:

SELECT *

FROM Sales. Customer

Innermerge join (Sales. SalesOrderHeader

Inner join Sales. SalesOrderDetail

ONSalesOrderHeader. SalesOrderID = SalesOrderDetail. SalesOrderID)

ONCustomer. CustomerID = SalesOrderHeader. CustomerID;

2. Specify a unified connection policy for all connections

When the old join syntax is used, the OPTION clause should be used to specify the join policy. However, this policy affects all joins in the statement and cannot specify different Join policies for each join. For example:

SELECT *

FROM Sales. Customer, Sales. SalesOrderHeader, Sales. SalesOrderDetail

WHERE Customer. CustomerID = SalesOrderHeader. CustomerID

AND SalesOrderHeader. SalesOrderID = SalesOrderDetail. SalesOrderID

OPTION (merge join );

The execution plan of this statement is 7-18. We can see that all three tables use the merge join policy.

Figure 7-18 shows the execution plan for all connections using the unified connection Policy

In the ansi SQL: 1992 specification, you can also use the OPTION clause, which also affects all join operations in the statement, such:

SELECT *

FROM Sales. Customer

INNER JOINSales. SalesOrderHeader

ONCustomer. CustomerID = SalesOrderHeader. CustomerID

INNER JOINSales. SalesOrderDetail

ONSalesOrderHeader. SalesOrderID = SalesOrderDetail. SalesOrderID

OPTION (merge join );

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.