Performance impact of driver sequencing in multiple table joins in SQL Server

Source: Internet
Author: User
Tags joins

Original: Effect of driving order on performance in multiple table joins in SQL Server

The source of this article: http://www.cnblogs.com/wy123/p/7106861.html
(It is not the original works right to retain the source, I my book still far to reach, just to link to the original text, because the following may exist some errors to amend or supplement, without him)

Recently encountered in SQL Server several times the developer submitted the performance problem of SQL, its surface is due to the table to go between the driving sequence caused by performance problems,
This is manifested in cases where other factors have been excluded, the occasional execution time of the stored procedure is more than expected, and even when debugging, a specific parameter is inserted directly into the SQL statement of the stored procedure, which still does not achieve the expected response time in performance.
Such problems, after excluding the common problems such as server resource factors, indexes, locks, parameter sniff, are known to be caused by the driver sequence between tables because of the rapid performance increase after the attempt to add option (Force order) at the end of the SQL statement.
In general, it is a relatively efficient way to use "small table driver Big table" when connecting between tables, that is, when loops join, loop the small table, drive the large table by loop, and then produce the query result set.
The performance surface, which is caused by the order of the driving order between the tables, has a very significant improvement in performance after forcing a drive sequence.
But further thinking, why is SQL Server not choosing a reasonable driver sequence by default?
Therefore, this article briefly elaborates these two questions:
1) Why does the drive order between tables affect performance?
2) Why does SQL Server not select the correct driver sequence in some cases?

Why does the drive order between tables affect performance?

First, the table at the time of connection, the driving sequence of the performance of the impact, where test_smalltable insert 1W rows of data, test_bigtable insert 10W rows of test data, in turn, to represent the small table and large table

Create Tabletest_smalltable (IDint Identity(1,1)Primary Key, OthercolumnsChar( -))Create Tabletest_bigtable (IDint Identity(1,1)Primary Key, OthercolumnsChar( -))Declare @i int = 0 while @i<100000begin    if @i<10000    begin        Insert  intoTest_smalltableValues(NEWID())    End    Insert  intoTest_bigtableValues(NEWID())    Set @i = @i + 1End

After the test table has written the data, use two SQL to observe its IO situation by forcing the loop join's drive sequence

Select *  fromTest_smalltable AInnerLoopJoinTest_bigtable b ona.ID=b.IDoption(ForceOrder)GOSelect *  fromTest_bigtable AInnerLoopJoinTest_smalltable b ona.ID=b.IDoption(ForceOrder)GO

, which is the IO information generated after the execution of two SQL, can be found, because the driving sequence of the two tables is inconsistent, resulting in a nearly order of magnitude of the logical IO.

The cause of this problem, there may be some difficult to understand, double-loop nesting, who is outside who in the inside there is a difference, on the surface is not the same? actually otherwise
Loop join is performed in a manner similar to the following two-loop nesting, until the outer table loop ends and the loop (query) completes
Foreache (Outer_row in outer_table)
{
Foreache (Internal_row in internal_table)
{
if (Outer_row.key = Internal_row.key)
{
--Output results
}
}
}

Take the above test as an example to make a rough comparison of statistics
If the outer layer is a small table (1W row), the outer loop 1W times, respectively, the large table (10W row) query, and then end the query, which is equivalent to the loop 1W times, with ID query the inner table,
It can be roughly assumed that the overall cost is: 1w+1w*10w = 11W, where the specific cost unit is ignored first
If the outer layer is a large table (10W row), the outer loop 10W times, respectively, the small table (1W row) query, and then end the query, the equivalent of a cyclic 10W, respectively, with ID query the inner table,
It can be roughly assumed that the overall cost is: 10W+10W*1W = 20W, the same way, here also first ignores the cost of units
Now it is clear that the former (small table driven large table) is the cost of 11W, the latter (large table driven small table) is the cost of 20W, so, generally speaking, small table drive large table is a relatively efficient way.

But pay attention to the big table and the small table here, not only the "table" level concept, because the actual SQL is not so simple, but also can be filtered based on the concept of the results after filtering, which is the key point of the second problem.

Why SQL Server does not select the correct driver sequence in some cases

In the above tests, if the query hint is not added, the execution plan generation is not related to the order of the table writing, you can see that the writing order is not the same, the execution plan is still the same.
That is, when writing a SQL statement, the large table in front or behind, normally does not affect the execution plan generation.

  

So why, at the beginning of the question, why does SQL Server not select the correct driver sequence to appear in some cases?
In reality, SQL is rarely written so simple, and more often than not, there are a variety of where conditions after a table is connected.
It says that the concept of large tables and small tables is not just a "table" level concept, but rather a concept of the result (number of rows, or size) that is filtered based on the filter criteria.
For example, the following SQL, where conditions may be added to a variety of filter conditions, such as may be similar to type, may be time-range, and there may be some filters on both tables.

SELECT * FROM Test_smalltable a
INNER JOIN test_bigtable b on a.id = b.ID
where A.othercolumns = "and B.othercolumns =" and other filter condition

So at this point, how does SQL Server evaluate the size of the result set filtered by various criteria for each table in the face of complex queries?
The question is, of course, based on the filter condition behind the where (or the addition of the filter on the back of on), what is the filter condition behind the where or on, and what is the basis for providing an approximate filtered result set?
Yes, it's statistical information!


Now that the problem is clear, SQL Server gets a "result set of its own estimation size" based on statistics, after filtering through various (perhaps more complex) filters, and then determines the driving order according to the result set.
SQL Server does the same as "small table-driven large table" on the basis of "the result set of its own estimation size" (not just the loop join, which does not say other join methods),
Once the estimated result set size has a large error, even if the error is small, but enough to change the real "small table driven large table" way to do the operation, the second problem arises.
Therefore, in general, the wrong driving sequence is essentially when using statistical information to estimate, because the statistical information is not accurate enough or pre-estimating the problem of its own.
Reference: http://www.cnblogs.com/wy123/tag/%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF%20Statistics/
Causes SQL Server to incorrectly perform operations in a large table-driven manner, and a similar problem arises.


In view of the particularity of the problem, it is difficult to create case, do not create case demonstration, cut two actual results encountered by the comparison. In practice, the impact of the drive sequence on performance can vary from 0.5 seconds to 10 seconds, or from 1 minutes to 10 minutes.

  

Of course, when adding option (Force order), pay attention to the wording itself is not the small table at the front,
In complex cases, although the problem is caused by the sequence of drivers, the option force order does not necessarily make sense, because when a multi-table connection is made, it is not necessarily a reasonable driving order to drive in a written manner.
Even more serious problems arise, refer to: http://www.cnblogs.com/wy123/p/6238844.html, so it is not recommended to mess with option (Force order)

Summarize:

In the face of more complicated query and filter conditions, especially in the case of large data in the table, the prediction of statistical information generation and the driving order between the predicted tables will have a great impact on performance.
In the face of similar problems, what is the direct reason, the root cause is what, how to quickly identify the problem, and how to solve and avoid, are worth thinking, but also to do performance optimization of the time to consider one of the issues.

Performance impact of driver sequencing in multiple table joins in SQL Server

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.