SQL Server nested loop join efficiency test

Last Update:2018-05-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As seen from many web pages, SQL Server has three join algorithms, nested loop join, merge join, and hash join. The most common of these is the nested loop join.

In many articles that introduce nested loop join, it is mentioned that if two tables do nested loop join, the table with the smaller rows is used as the outer loop table, the table with more rows is used as the inner loop table, and the join is more efficient.

One of the reasons is that if the Inner loop table does join the column on the appropriate index, then the outer loop of each input data can be indexed to seek, so that the entire inner loop will not read the table, especially the inner loop table is larger, saving more expensive. But if the internal and external loop tables do not have the proper index, then join, why the efficiency is also relatively high?

For example, the outer loop table has 10 rows of data, the Inner loop table has 1000 rows of data, according to the algorithm of the nested loop join, the outer loop table takes one, and all the data inside the loop table is matched once, output matching data rows. This is to carry out the matching of 10*1000=10000 times; If, in turn, the outer loop is 1000 lines, the inner loop is 10 rows, then the outer loop table takes one piece of data, and the Inner loop table traverses 10 rows of data, and the total is also 1000*10=10000 times. The rough looks the same. Why do you say that the outer circulation table is small, high efficiency? Let's do a test.

Use tempdb
Go
--Create two tables to test the efficiency of the nested loop join
CREATE TABLE Tempa (String VARCHAR (1000))
Go
CREATE TABLE TEMPB (String VARCHAR (1000))
Go
--Insert the data so that the data in the table Tempa exists exactly 1 pages
INSERT into Tempa SELECT REPLICATE (' a ', 1000)
INSERT into Tempa SELECT REPLICATE (' B ', 1000)
INSERT into Tempa SELECT REPLICATE (' C ', 1000)
INSERT into Tempa SELECT REPLICATE (' d ', 1000)
INSERT into Tempa SELECT REPLICATE (' E ', 1000)
INSERT into Tempa SELECT REPLICATE (' F ', 1000)
INSERT into Tempa SELECT REPLICATE (' G ', 1000)

--insert data into TEMPB to make the TEMPB data 100 times times the Tempa
INSERT INTO TEMPB select * from Tempa
Go 100
--Check the size of the table Tempa and TEMPB
SET STATISTICS IO on
SELECT * FROM Tempa
SELECT * FROM TEMPB
--the result of the return is as follows:
/*
Table ' Tempa '. Scan count 1, logical reads 1
Table ' TEMPB '. Scan count 1, logical reads 100
*/
--it can be seen that the table Tempa has 7 rows, stored in 1 pages; The TEMPB has 700 rows and is stored in 100 pages.

--Execute the following query, use Tempa as an outer loop table, TEMPB as the inner loop table to see how the cost of execution
SELECT *from Tempa a INNER LOOP JOIN TEMPB b
ona.string = b.string OPTION (Force order)
/*
Table ' TEMPB '. Scan count 1, logical reads 700
Table ' Tempa '. Scan count 1, logical reads 1
*/

From the results can be seen from the Tempa read 1 pages, from TEMPB read 700 pages, a total of 701 logical read, that is, the outer loop of the table, do a full table read, how many pages there are many logical read; The Inner loop table, which corresponds to every 1 records of the outer loop table, reads the full table once, that is, 7 times 100,700 logical reads.

If according to this law, change the position of the inside and outside circulating table, get the logical reading should be TEMPB of a full table read, 100 logical reads plus 700 rows multiplied by tempa full table read (1 pages), is 700 logical reads, total is 800 logical reads.

--Swap the order of joins and look at the execution cost:
SELECT *from TEMPB b INNER LOOP JOIN Tempa A
ona.string = b.string OPTION (Force ORDER)
/*
Table ' Tempa '. Scan count 1, logical reads 700
Table ' TEMPB '. Scan count 1, logical reads 100
*/

Sure enough, as expected.

So in this hypothetical case, if the outer loop table is smaller, the join cost is lower.

The table structure of the experiment is very special, if the general situation to deduce, you can make such assumptions:

Suppose that table x has a page, with an average of B rows per page, and a C page on the table Y, with an average of D rows per page.
The cost of the nested loop join is a + (A*B*C), and table y is an outer loop, and table x is an inner loop, then the costs of the nested loop join are c+ (c*d*a)
, which is the outer loop of table X.

SQL Server nested loop join efficiency test

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

SQL Server nested loop join efficiency test

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

SQL Server nested loop join efficiency test

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support