Standard Source: http://www.cnblogs.com/wy123/p/6238844.html
Recently encountered a stored procedure in some special cases, the efficiency is extremely inefficient,
As for the bottom to what extent I now do not have an exact data, because it is expected to be able to query the results of SQL, actually half an hour can not come out, there will be
When observing the execution plan, one step in the middle shows an unusual connection alert like the following, such as
No join predicate means that there is no join predicate, and no connection predicate is specified when join between tables can result in no join predicate,
But the reverse is certainly set up, obviously wrote the connection conditions, still hint no join predicate.
Let's start with the No join predicate and explain when no join predicate will appear, and why
1, no join predicate due to connection condition not specified
Two tables in the absence of a specified connection condition, the result of the operation is the calculator Cartesian product, of course, there is no connection predicate, suggesting that no join predicate is also easy to understand
A simple code to illustrate the following, create two tables, #t1, #t2, as for the test data why this is the case, I will continue to explain
create table #t1(id int,name varchar(100))
create table #t2(id int,name varchar(100))
insert into #t1 values (1,newid())
insert into #t1 values (1,newid())
insert into #t2 values (1,newid())
insert into #t2 values (1,newid())
First look at the time of the calculation of the Cartesian product execution plan, Nested Loops in the Red Fork, it is indicated that there is no connection predicate, of course, this query SQL does not have a connection predicate, this case is also easy to understand.
2, specifies the NO join predicate under the connection condition
Even if you specify a connection condition, there is still a hint that there is no connection predicate.
At this point we need to look at the data characteristics of the table, from the test data created above can be seen, #t1表id = 1 is two lines, #t2 table of the same, id = 1 of the data is also two lines
At this point, the join of the two tables is a many-to-many relationship, many-to-many cases is to calculate the Cartesian product, this is the reason why there is no connection predicate in this case.
For more information, please refer to: http://www.cnblogs.com/liwei225/p/5056460.html, the Great God has long been detailed analysis, thanks to liwei225 great God's sharing
However, I have a question here, or the above two tables, specify the connection condition, but do not specify the query condition, that is, there is no where a.id = 1, there is no prompt no join predicate
This reason I did not understand, and then think about why, hoping to pass by the great God to help explain, thank you.
3, if a connection condition is specified, no join is present under certain query conditions predicate
This is an actual business of SQL, from the stored procedure deducted from the code, because there are more query conditions, the final assembly of dynamic SQL is not exactly the same, in most cases is no problem,
However, when a query condition is added to a where condition, the efficiency begins to fall seriously, and as to what extent it is dropped, it is canceled after 35 minutes of operation.
During this SQL run, the server CPU soared to 100%, and was persistent
A comparison test, simply add an option (force order) query hint in the SQL above, forcing it to be driven in the order of the written table, resulting in 2 seconds.
Execution plan is not the same as above, but also does not show no join predicate, can not say add a mandatory prompt has a connection predicate, do not add force hint there is no connection predicate it?
From the comparison, it can be explained that there is no very serious external factors interference, such as missing index, statistical information problems, etc.
If this is the case, the SQL with the option Force order query hint is unlikely to be the same as that of the non-option (force order) query hint, and it must be the choice of the execution plan .
So long to continue to analyze the implementation plan.
Usually, we will first analyze the execution plan, what index is used (suppressed), index fragmentation, parameter sniffing, statistics expired (sampling is not enough), all analyzed,
These additional factors will only slow down the efficiency of SQL to some extent, rather than slowing it down to a number of orders of magnitude.
So to analyze, why is it so slow to have no option (Force ORDER)?
In fact, this SQL execution plan can only be seen from the estimated execution plan, because it is not possible to wait until the SQL run is complete and see the actual execution plan
Title, estimated execution plan explicit, there is one step in the middle there is a no connection predicate warning as described above
Let's take a look at the details of this nested loops, and do suggest that there are no connection predicates, and that the estimated number of rows is 126469000 rows, more than 100 million rows,
According to the specific data distribution and query conditions analysis, if you do not do Cartesian product, this intermediate result is not up to billions of levels, this is the Cartesian product
If you really want to calculate more than 100 million rows so large a result set, the cost can be imagined.
So how did the Cartesian product come about here? Specific data I am not convenient to analyze, here to do a simple demolition
such as this one sql:
SELECT * FROM TableA a
INNER JOIN TableB b on a.identifier1 = B.identifier1
INNER JOIN TableC c on b.identifier2 = C.identifier2
where a.column_x = * * *
and b.column_y = * * *
and other Filter Condition
Join conditions are there, we temporarily simplify the problem, ignoring the query conditions, logically analyzed
The normal logic is a table result driven b table (A.identifier1 = B.identifier1),
With the result of the A and B joins, the C table is driven by the Identifier2 of table B (B.identifier2 = c.identifier2), where table A and C are not directly related,
If A and C tables are combined, and finally drive the B-table, it can be imagined that because there is no direct relationship between a and C tables, the result of a and C tables must be Cartesian product
This Cartesian product is similar to the result set of more than 100 million rows as estimated in nested loops above.
Why does SQL Server change the way the table was connected before, causing the Cartesian product?
The choice of execution plan is a complex computational process. The generation of the execution plan is calculated with various factors, such as index, statistic information, data distribution in the table, system resources, etc.
SQL Server may have chosen an "efficient" single-table query based on query criteria, ignoring the driver-driven sequence (personal guessing) between tables.
Therefore, the Cartesian product, which is analogous to the "no direct relationship between Table A and C" and "forced Connection", is created as above reasoning,
Based on an analysis of the correlation between the estimated execution plan and the actual table, this execution plan is formally applied to the processing of the association between tables.
At the same time, after the force-driven sequence, the results are quickly queried, and it can be shown that driving C with the results similar to A driven b,a+b is much more efficient than the A+c computing Cartesian product drive B.
Sometimes SQL Server can remove a join predicate from the original query.
So, what if we avoid this situation?
It is known that the above SQL in the execution of the prompt does not have a connection predicate, does not really have no write connection predicate,
Instead, SQL Server changed the drive order before the table, causing tables with no direct relationship to be placed together to produce a Cartesian product demerit.
Programme one:
OPTION (Force order) is also validated by forcing the query engine to be implemented in order by the forced-drive sequence.
Scenario Two:
Or the above example to illustrate:
For example, the original SQL looks like this:
SELECT * FROM TableA a
INNER JOIN TableB b on a.identifier1 = B.identifier1
INNER JOIN TableC c on b.identifier2 = C.identifier2
where a.column_x = * * *
and b.column_y = * * *
and other Filter Condition
Rewrite this SQL
SELECT * FROM TableA a
INNER JOIN TableB b on a.identifier1 = B.identifier1
Cross APPLY (SELECT * TableC c where b.identifier2 = C.identifier2)
where a.column_x = * * *
and b.column_y = * * *
and other Filter Condition
Using cross apply, similar to forcing the B table to drive the C-table, there will be no case of a-and C-table binding resulting in a Cartesian product.
It is also proven that this approach is also practical in the process of rewriting the actual SQL, which is equivalent to option (Force ORDER).
Scenario Three: Also rewrite SQL, in fact, the SQL is not too complex, but also not a very simple logical association, you can catch a temporary table, to split an intermediate result set
Use the intermediate result set to drive additional tables, simplifying the connection logic for each step, and avoiding the intermediate production of Cartesian product
It turns out that this approach is also feasible, the effect slightly renmin the first two ways,
On the use of temporary tables to do logical splitting, but also need a certain skill, here are the case, http://www.cnblogs.com/wy123/p/5712001.html
Summary: The above through a practical case, analyzed under what circumstances will cause no join predicate,
And even if the connection conditions are written, there will still be no join predicate reasons, when faced with this situation, and can be solved by what way.
When starting from the novice, do not dare to write a select in the SSMs Query window (afraid of more than three tables on the write bad, be scolded by the master), afraid to write the update DELETE statement (afraid of Misoperation),
To write a SQL that has a, slowly mastered some basic knowledge and skills, and then to understand the index, execution schedule, statistics, server resource usage and other information
When I started performance analysis and performance optimization, I felt I could do everything I could.
The reality tells me repeatedly, you still have many many unknown question, once again felt oneself such weak force.
I promise, I will never dare to blow up again.
Reference: http://www.cnblogs.com/liwei225/p/5056460.html
http://www.scarydba.com/2009/09/15/no-join-predicate/
Http://dba.stackexchange.com/questions/35082/what-exactly-does-no-join-predicate-mean-in-sql-server
http://www.scarydba.com/2009/09/15/no-join-predicate/
2017,sql server also has a lot of unknown knowledge waiting to learn and challenge.
Show no join predicate cause analysis and solutions for join