Sharp SQL2014: The join algorithm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When you execute a query in Microsoft SQL Server Management Studio , if you select a button in the toolbar , you see the execution plan that is generated for the query. The execution plan graphically shows the data retrieval methods selected by the SQL Server query optimizer, such as table scans, sorting, hash matching, and so on. For join queries,SQL Server chooses to use nested loop joins, merge joins, or hash joins, depending on the data, indexes, and so on, between the joined tables.

7.7.1Nested loops Join

Nested loops joins are also known as nested iterations, which use one join input as an external input table (shown as the top input in a graphical execution plan) and another join input as an internal (bottom) input table. The outer loop processes the external input table row by line. The inner loop executes for each outer row, searching for matching rows in the internal input table. To put it simply, scan one of the join tables and search for matching rows in another join table for each row in the table.

Nested loops joins are especially effective if the external input is small (less than ten rows) and the internal input is large and the index is pre-created. In many small transactions, such as those that affect only a smaller set of rows, an indexed nested loop join is better than a merge join and a hash join. In large queries, however, nested loops joins are often not the best choice.

For example, the following query will use nested loops joins, because the sales.customer table row count is only 1 rows, and the sales.salesorderheader data volume is large, resulting execution plan As shown in 7-11 .

Use AdventureWorks2014;

SELECT *

From Sales.Customer

INNER Joinsales.salesorderheader

Oncustomer.customerid = Salesorderheader.customerid

WHERE Customer.customerid = 1;

Figure 7-11 execution plan using nested loops

There are two nested loops in the plan, where only the left nested loop character is used to Sales.Customer the join to Sales.SalesOrderHeader , and the nested loop on the right is used to Sales.SalesOrderHeader The join between the index lookup and the physical row locator (key lookup). The sales.customer table in the upper-right corner of the execution plan is used as an external input to find the customer in the clustered index. For each customer, the nested loop operation performs a lookup on the Ix_salesorderheader_customerid index on the salesorderheader.customerid column, Then a key lookup is followed to locate the row of data to be accessed.

7.7.2Merge Joins

A merge join requires that both inputs be sorted on the merged column, and the merged columns are definedby the equivalent (ON) clause of the JOIN predicate. Because each input is sorted, the merge join gets a row from each input and compares it. For example, for inner join operations, returns if the rows are equal. If the rows are not equal, the rows with smaller values are discarded and another row is obtained from the input. This process will be repeated until all the actions have been processed.

A merge Join operation can be either a normal operation or a many-to-many operation. Many-to-many merge joins use temporary table storage rows. If there are duplicate values in each input, the other input must be rewound to the beginning of the duplicates when each of the duplicates in one of the inputs is processed.

Merging the joins itself is fast, but if the merge column is not indexed, choosing a merge Join can be time-consuming because it first sorts the columns. However, if the amount of data is large and you can get pre-ordered data from the index, the merge join is usually the fastest available join algorithm.

For example, the following query statement gets the details of the order, because SalesOrderHeader and SalesOrderDetail have a clustered index on the merged column SalesOrderID . The columns have been sorted, so the query optimizer chooses to merge joins. As shown in 7-12 .

Use AdventureWorks2014;

SELECT *

From Sales.SalesOrderHeader

INNER Joinsales.salesorderdetail

Onsalesorderheader.salesorderid = Salesorderdetail.salesorderid;

Figure 7-12 execution plan using a merge join

7.7.3Hash Join

Hash joins can effectively handle unsorted large non-indexed inputs. Therefore, it is useful for dealing with intermediate results of complex queries. The intermediate results of the query are not indexed, and are typically not sorted appropriately for the next operation in the query plan. Also, the query optimizer only estimates the size of the intermediate results. For complex queries, estimates can have significant errors, so if the intermediate results are much larger than expected, the algorithm that processes the intermediate results must not only be effective but also moderately weakened. It is not realistic for intermediate results to be as strict as merging joins, and the cost of sorting may be much larger than the direct retrieval cost of data.

There are two cases of choosing a hash join: one is not creating an appropriate index for the join, and the other is a large intermediate result.

There are two types of input to a hash join: Build input and probe input. The query optimizer chooses the smaller of the two as the build input, applies a hash function to the join column values, and assigns the rows in the generated input to the hash bucket. A hash bucket is a structure that holds the location of the data being accessed, and with it, unnecessary table scans can be avoided when data is retrieved.

To verify the use of hash joins without indexes, first create a copy of the Sales.Customer and sales.salesorderheader tables using the following statement.

Use AdventureWorks2014;

SELECT TOP 10 *

Into MyCustomer

From Sales.Customer

ORDER by CustomerID;

SELECT TOP 100 *

Into Mysalesorderheader

From Sales.SalesOrderHeader

ORDER by CustomerID;

Execute the following query to see the execution plan shown in 7-13 .

SELECT *

From MyCustomer

INNER Joinmysalesorderheader

Onmycustomer.customerid = Mysalesorderheader.customerid;

Figure 7-13 execution plan using a hash join

Let's look at a more interesting example below. Only rows with CustomerID = 1 in sales.customer are selected in the following query statement to join with Sales.SalesOrderHeader , because the number of joined rows is small, The amount of data produced by intermediate results is also relatively small, so you can see that the query optimizer uses nested loops joins for the statement. As shown in 7-14 .

Use AdventureWorks2014;

SELECT *

From Sales.Customer

INNER Joinsales.salesorderheader

Oncustomer.customerid = Salesorderheader.customerid

WHERE Customer.customerid = 1;

Figure 7-14 using nested loops joins when the amount of data is small

The same is the above join, to remove the where filter condition after the amount of data increased significantly, the execution of this statement will find that the query optimizer uses a hash join method. As shown in 7-15 .

SELECT *

From Sales.Customer

INNER Joinsales.salesorderheader

Oncustomer.customerid = Salesorderheader.customerid;

Figure 7-15 using a hash join when the amount of data is large

7.7.4enforcing join policies with join hints

Join hints are used to specify that the query optimizer enforces a join strategy between two tables, including loop join,Merge Join , and hash join, which are used for nested loops, hashes, and merge joins, respectively. If more than one join hint is specified, the optimizer chooses the least expensive join policy from the allowed join policy. Alternatively, you can use the OPTION clause to specify a join policy. However, this approach affects all joins in the query and is typically used for legacy join syntax.

1． Specify a separate join policy for each join

You can use the LOOP join,MERGE join , and HASH join prompts in the FROM clause to specify the join policy separately for each join. For example, the following query statement specifies that a nested loop join is used.

Use AdventureWorks2014;

SELECT *

From Sales.Customer

INNER Loopjoin Sales.SalesOrderHeader

Oncustomer.customerid = Salesorderheader.customerid;

As another example, the following query statement specifies that a merge join is used.

Use AdventureWorks2014;

SELECT *

From Sales.Customer

Innermerge JOIN Sales.SalesOrderHeader

Oncustomer.customerid = Salesorderheader.customerid;

When you use join hints in a multi-table join, it affects the order in which the joins are executed. As described earlier, the query optimizer chooses the join to be performed first, without affecting the return result being correct, based on the principle of efficiency precedence. For example, as shown in execution plan 7-16 of the following statement, you can see that the first execution is a join of Sales.SalesOrderHeader and sales.salesorderdetail . The join results are then joined to the Sales.Customer .

Use AdventureWorks2014;

SELECT *

From Sales.Customer

INNER Joinsales.salesorderheader

Oncustomer.customerid = Salesorderheader.customerid

INNER Joinsales.salesorderdetail

On Salesorderheader.salesorderid =salesorderdetail.salesorderid;

Figure 7-16 execution plan with no join hints

The following statement specifies a merge join hint for Sales.Customer and Sales.SalesOrderHeader , and this hint only works for both tables, and Sales.SalesOrderDetail 's join strategy is still determined by the query optimizer. Because a merge join is explicitly specified for sales.customer and Sales.SalesOrderHeader , the optimizer executes the join first instead of executing the Sales.SalesOrderHeader the connection to the sales.salesorderdetail . Otherwise, the join result of Sales.Customer and sales.salesorderheader and sales.salesorderdetail is then performed. Figure 7-17 is the execution plan for the statement.

SELECT *

From Sales.Customer

Innermerge JOIN Sales.SalesOrderHeader

Oncustomer.customerid = Salesorderheader.customerid

INNER Joinsales.salesorderdetail

Onsalesorderheader.salesorderid = Salesorderdetail.salesorderid;

Figure 7-17 execution plan after using the join hint

If you want Sales.Customer to perform a merge join with the join results of Sales.SalesOrderHeader and sales.salesorderdetail , You should implement it using nested joins, referring to the following statement:

SELECT *

From Sales.Customer

Innermerge JOIN (Sales.SalesOrderHeader

INNER JOIN Sales.SalesOrderDetail

Onsalesorderheader.salesorderid = Salesorderdetail.salesorderid)

Oncustomer.customerid = Salesorderheader.customerid;

2． Specify a unified join strategy for all joins

When using the legacy join syntax, you should use the OPTION clause to specify a join policy, but this policy affects all joins in the statement and cannot specify a different join strategy for each join, such as:

SELECT *

From Sales.Customer, Sales.salesorderheader,sales.salesorderdetail

WHERE Customer.customerid =salesorderheader.customerid

and salesorderheader.salesorderid= Salesorderdetail.salesorderid

OPTION (MERGE JOIN);

As shown in execution plan 7-18 of the statement, you can see that the merge join strategy is used across all three tables.

Figure 7-18 execution plan using the unified join strategy for all joins

In the ANSI sql:1992 specification, you can also use the OPTION clause, which also affects all joins in the statement, such as:

SELECT *

From Sales.Customer

INNER Joinsales.salesorderheader

Oncustomer.customerid = Salesorderheader.customerid

INNER Joinsales.salesorderdetail

Onsalesorderheader.salesorderid = Salesorderdetail.salesorderid

OPTION (MERGE JOIN);

Sharp SQL2014: The join algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Sharp SQL2014: The join algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Sharp SQL2014: The join algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support