SQL Server nested loop join efficiency test

Source: Internet
Author: User

As seen from many web pages, SQL Server has three join algorithms, nested loop join, merge join, and hash join. The most common of these is the nested loop join.

In many articles that introduce nested loop join, it is mentioned that if two tables do nested loop join, the table with the smaller rows is used as the outer loop table, the table with more rows is used as the inner loop table, and the join is more efficient.

One of the reasons is that if the Inner loop table does join the column on the appropriate index, then the outer loop of each input data can be indexed to seek, so that the entire inner loop will not read the table, especially the inner loop table is larger, saving more expensive. But if the internal and external loop tables do not have the proper index, then join, why the efficiency is also relatively high?

For example, the outer loop table has 10 rows of data, the Inner loop table has 1000 rows of data, according to the algorithm of the nested loop join, the outer loop table takes one, and all the data inside the loop table is matched once, output matching data rows. This is to carry out the matching of 10*1000=10000 times; If, in turn, the outer loop is 1000 lines, the inner loop is 10 rows, then the outer loop table takes one piece of data, and the Inner loop table traverses 10 rows of data, and the total is also 1000*10=10000 times. The rough looks the same.  Why do you say that the outer circulation table is small, high efficiency? Let's do a test.

Use tempdb
Go
--Create two tables to test the efficiency of the nested loop join
CREATE TABLE Tempa (String VARCHAR (1000))
Go
CREATE TABLE TEMPB (String VARCHAR (1000))
Go
--Insert the data so that the data in the table Tempa exists exactly 1 pages
INSERT into Tempa SELECT REPLICATE (' a ', 1000)
INSERT into Tempa SELECT REPLICATE (' B ', 1000)
INSERT into Tempa SELECT REPLICATE (' C ', 1000)
INSERT into Tempa SELECT REPLICATE (' d ', 1000)
INSERT into Tempa SELECT REPLICATE (' E ', 1000)
INSERT into Tempa SELECT REPLICATE (' F ', 1000)
INSERT into Tempa SELECT REPLICATE (' G ', 1000)

--insert data into TEMPB to make the TEMPB data 100 times times the Tempa
INSERT INTO TEMPB select * from Tempa
Go 100
--Check the size of the table Tempa and TEMPB
SET STATISTICS IO on
SELECT * FROM Tempa
SELECT * FROM TEMPB
--the result of the return is as follows:
/*
Table ' Tempa '. Scan count 1, logical reads 1
Table ' TEMPB '. Scan count 1, logical reads 100
*/
--it can be seen that the table Tempa has 7 rows, stored in 1 pages; The TEMPB has 700 rows and is stored in 100 pages.

--Execute the following query, use Tempa as an outer loop table, TEMPB as the inner loop table to see how the cost of execution
SELECT *from Tempa a INNER LOOP JOIN TEMPB b
ona.string = b.string OPTION (Force order)
/*
Table ' TEMPB '. Scan count 1, logical reads 700
Table ' Tempa '. Scan count 1, logical reads 1
*/

From the results can be seen from the Tempa read 1 pages, from TEMPB read 700 pages, a total of 701 logical read, that is, the outer loop of the table, do a full table read, how many pages there are many logical read; The Inner loop table, which corresponds to every 1 records of the outer loop table, reads the full table once, that is, 7 times 100,700 logical reads.

If according to this law, change the position of the inside and outside circulating table, get the logical reading should be TEMPB of a full table read, 100 logical reads plus 700 rows multiplied by tempa full table read (1 pages), is 700 logical reads, total is 800 logical reads.

--Swap the order of joins and look at the execution cost:
SELECT *from TEMPB b INNER LOOP JOIN Tempa A
ona.string = b.string OPTION (Force ORDER)
/*
Table ' Tempa '. Scan count 1, logical reads 700
Table ' TEMPB '. Scan count 1, logical reads 100
*/

Sure enough, as expected.

So in this hypothetical case, if the outer loop table is smaller, the join cost is lower.

The table structure of the experiment is very special, if the general situation to deduce, you can make such assumptions:

Suppose that table x has a page, with an average of B rows per page, and a C page on the table Y, with an average of D rows per page.
The cost of the nested loop join is a + (A*B*C), and table y is an outer loop, and table x is an inner loop, then the costs of the nested loop join are c+ (c*d*a)
, which is the outer loop of table X.

SQL Server nested loop join efficiency test

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.