Objective
In this section we analyze the left join and not EXISTS, short content, in-depth understanding, always to review the basics.
Left JOIN ... Is null and NOT EXISTS analysis
We have previously analyzed that in query processing null value is based on the three-value logic, as long as there is null in the subquery when there is no data return, and the left join and not exists regardless of the subquery has no null value on the processing is the same, of course, it is more important to use the LEFT join ... Is null to check for NULL. Based on the same result set returned from both, let's start by testing directly with the table created in the previous section. Index not first created on bigtable and smallertable
Use tsql2012godbcc freeproccachedbcc dropcleanbuffersset STATISTICS IO ONSET STATISTICS time ONSELECT bigtable.id, so Mecolumn = smallertable.lookupcolumn = bigtable.somecolumn)
Both CPU time and elapsed time are executed as follows
We saw that the above query plan did not create an index before the two were nearly identical in cost, while the left JOIN .... Is null first the right outer join in the hash match, then the filter, in other words, the left join .... Is null will join directly and then filter the duplicate data, while not exists directly with the right half of the hash match, about the semi-join we have already said in the preceding, if there is duplicate data directly take only one. So left JOIN .... Is null and not exists both filter for duplicate data after a complete join is completed by two operations, while the other is filtered directly through the right half of the join. So the biggest difference is that when you use the left JOIN ... Is null, SQL isn't that smart, just checking once, so it needs to be done through full join and filtering, while not exists filtering at join.
There is not much difference between the time of the CPU and elapsed. Next we'll create an index to see.
CREATE INDEX Idx_bigtable_somecolumnon BigTable (somecolumn) CREATE index Idx_smallertable_lookupcolumnon smallertable (Lookupcolumn)
Look at the query execution plan for both
At this point we see the above query execution plan, we can clearly see the left JOIN .... Is null or full JOIN then after filtering, just after creating the index performance improved a little, but unlike the left JOIN ... Is null for not exists the planned execution is different from not creating an index, at this point the flow aggregation is used first and then the right half of the hash match becomes the right half join in the merge join, we look at this stream Aggregate (stream aggregation) is what ghost, For this flow aggregation I do not understand, can not be installed to understand, we will talk about flow aggregation, as for why every time the query plan appears a new noun to go into the detailed understanding of the reasons, I believe that I have seen SQL Server this series of children's shoes know, each section of the content is very short, no reading fatigue, and is explaining , I've been learning SQL Server all over again to learn all about the performance tuning involved in SQL Server and some basics that go through it so that there's no point in having to perform performance tuning later on. Well, back to the topic, let's look at stream Aggregate.
Stream Aggregate
The concepts on MSDN are as follows: the Stream aggregate operator groups rows by one or more columns, and then computes one or more aggregate expressions returned by the query. The output of this operator can be referenced by subsequent operators in the query and/or returned to the client. The Stream Aggregate operator requires that the input be sorted by column in the group. The optimizer uses a sort operator before this operator if the data has not been sorted because of the previous sort operator or the sorted index lookup or scan. In the SHOWPLAN_ALL statement or in the graphical execution plan of SQL Server Management Studio, the Liei in the GROUP by predicate is listed in the Argument column, and the aggregation expression is listed in the Defined Values column.
By simply knowing that stream aggregate is aggregated with rows or columns, it is unclear when stream aggregation will occur in the query plan, and when the use of stream aggregation to improve query performance is unknown, let's explore next. The above focus is on "grouping" and then the "aggregation" calculation, based on which we look at three scenarios using stream aggregate.
(1) Aggregation summary
COUNT SUM (Empid) As Empidfrom sales.orders
(2) Group first, then aggregate summary
COUNT (CustID) As Countcustidfrom sales.ordersGROUP by CustID
(3) Distinct summary
DISTINCT Custidfrom sales.orders
The above query is used through distinct, which is actually a grouping of Cutid. The above is the use of the stream aggregate scene, of course, there is another aggregation is the hash matching aggregation, follow-up will be supplemented. We'll understand the stream aggregate definition, we'll summarize the definition to sort the inputs, then group and then aggregate the calculations. In the above (2) and (3) are grouped, but there is no sort, in fact, the internal default implementation of the sort, we look at the table in (3) CustID data, as follows
After the distinct
However, there is no aggregation in (3), why is the flow aggregation? In fact, there is a state variable in the flow aggregation, the number of state variables depends on the number of aggregates, this state variable is used to set the result set, when the corresponding data after grouping is saved, the corresponding state variable is 0, when matching to the corresponding data at this time the state variable plus 1, Therefore, the above (3) can be said that the implicit aggregation calculation, only the corresponding state variable for each data is 0, it is not difficult to explain here, only the sorting, grouping, and not the reason for the aggregation calculation. An example of what stream aggregate knows is that when we use SqlDataReader memory to read data, we can say that we are reading stream records, and if we need to summarize the result set, the state variables within each read will add 1 to the final summary and to the client. Here we are simply talking about stream Aggregate, the following will be a talk about hash Aggregate. Let's go back to the left JOIN .... Is null and NOT exists topic, when we create the index at this time left JOIN .... Is IsNull execution time is more than twice times that of not exisits. To this, about the left JOIN ... Is null and not exists to this end, I need the same next basic conclusion.
Left JOIN ... Is null and not EXISTS performance Analysis conclusion: when we need to find the rows that do not match in the subquery and the column is nullable, with not EXISTS, when we need to find the rows that do not match in the subquery, when the columns are not empty, we can use not EXISTS or not.
Because of the left JOIN: Is null for mismatched rows that do not return immediately and require a full join after filtering, especially if there are multiple conditions, the left JOIN ... Is null may affect query performance more.
Summarize
In this section we learned about the left JOIN. Is null and not EXISTS performance analysis, the next section we enter into these sections of the comprehensive chapter, a comprehensive comparison of not in VS not EXISTS vs left JOIN ... Is null the ultimate chapter. Short content, in-depth understanding, we'll see you next day.
SQL server-Focus left JOIN ... is NULL and not EXISTS performance analysis (17)