Objective
In this section, we will compare not with VS not EXISTS VS left JOIN ... Is null performance, short content, in-depth understanding, always to review the basics.
Not in, not EXISTS, left JOIN ... Is NULL performance analysis
We first create a test table
Use tsql2012gocreate SCHEMA [Compare]create TABLE [Compare].t_left ( ID int. not NULL PRIMARY KEY, value int NO T null, stuffing VARCHAR ($) not null) the CREATE TABLE [compare].t_right ( ID INT not NULL PRIMARY KEY, value INT not NULL, stuffing VARCHAR ($) not null) GO
We then create an index on column value in two tables.
Use Tsql2012gocreate index idx_left_value on [Compare].t_left (value) CREATE INDEX idx_right_value on [Compare].t_right (v Alue)
We insert the following test data in the T_left and T_right tables
Use tsql2012gobegin transactiondeclare @cnt intset @cnt=1While @cnt<=100000BEGIN INSERT into [Compare].t_left VALUES (@cnt, @cnt%10000, left (' Left'+ CAST (@cnt as VARCHAR) +' '+ REPLICATE ('*', $), $)) SET @cnt= @cnt +1END; With rows as (SELECT1As row UNION all SELECT row+1From rows WHERE row<Ten) Insertinto [Compare].t_rightselect (ID-1) *Ten+ row +1, Value+1, left (' Right'+ CAST (id as VARCHAR) +' '+ REPLICATE ('*', $), $) from [Compare].t_leftcross JOIN rowscommit
Let's explain the above inserted test data a little bit:
(1) Insert 100,000 data in the T_left table, which contains 10,000 duplicate data.
(2) Insert 1 million data in the T_right table, which contains 10,000 duplicate data.
(3) The T_left table inserts 10 data that are not in the T_right table.
Next, let's look at its query execution plan.
Not in performance analysis
Use Tsql2012goset STATISTICS IO ONSET STATISTICS time ONSELECT l.id, L.valuefrom [Compare].t_left lwhere L.value not in ( SELECT value from [Compare].t_right R )
Let's take a look at two important places where the above diagram has been marked, and finally return the result set using the merge Anti Semi join, which is the combination of the above merge join and right Anti Semi join, which can be said to be a very efficient way to Sorting by index beforehand gets the result set of two tables. The database iterates through the merge join for the result set of two tables from small to large, and, of course, to the current value of the result set through the pointer and then to the next value. And what is the main anti Semi join? As we said earlier, it is a semi-join, and the database engine skips all t_left and t_right tables, as long as the values in the T_right table are matched to the same value. Because at this time the stream aggregate played a decisive role ("about the stream aggregate in front of the simple understanding of the next, the feeling is not enough to understand, write this article is only a gray often understand, the following will be specifically written stream aggregate and hash Aggregate ") we know that stream Aggregate first needs to be sorted, then grouped and then aggregated, because we have an index so we have the sort, and then we do a stream Aggregate to group it by looking at the stream Aggregate the following specific information is known. Because the values in the T_right table are grouped, when a right half join is made, only the first one in the group is taken, and the rest is automatically skipped, so this is very efficient, sorted by index, then grouped by stream aggregate, and finally the merge is executed. Join (right Anti Semi join). Finally we saw that the query took only 0.315 seconds.
Performance analysis of NOT EXISTS
We run the following query
Use tsql2012goset STATISTICS IO ONSET STATISTICS time ONSELECT l.id, L.valuefrom [Compare].t_left lwhere Not EXISTS ( SELECT NULL from [compare].t_right R WHERE = l.value )
About its query time is no longer given, in fact, not exists and not query plan and query time are the same, and there is no difference, we discussed in a separate discussion of not exists and not when it has been explicitly said, both in the query column is not NULL premise, The query cost is the same, and when the query column is set to be null, the performance of the not exists is much higher than not, and here we don't talk too much, and the children's shoes that we don't understand can look at the previous article on the comparison.
Left JOIN .... Is NULL performance analysis
Use tsql2012goset STATISTICS IO ONSET STATISTICS time ONSELECT l.id, L.valuefrom [Compare].t_left lleft JOIN [compare].t_right rON = l.valuewhere r.value is NULL
Here we know it is clear that the result set is certainly the same, but the query plan and the above not EXISTS, does not have a big difference, left JOIN ... Is null the first is to use the left join to return all data, including duplicates, and then filter, why the left join before the filter? Because SQL Server does not intelligently identify the is NULL immediately following the left join, it takes two steps to complete. At this point we need to filter 1 million data, which is a very time-consuming task, so we use a very efficient hash match and are parallel, but it takes a long time to filter the values. The entire time takes 0.989 seconds, and its query time is 3 times times that of not exists or not. So here, we can make the following conclusion about these three.
Not-in-vs not-EXISTS vs left JOIN: Is NULL conclusion: it is best to use not EXISTS and not when querying for default values, but only if both query columns cannot be NULL, otherwise use not EXISTS. And the left JOIN ... Is NULL because it always does not skip the already matched value, but the way to return all result sets First and then filter, its inefficiency is conceivable.
Summarize
In this section we compare the not exists and not in and left joins: Is null performance, finally came to the three performance analysis conclusions, the next section we have determined to be the last of the final article comparison exists vs in VS join performance, short content, in-depth understanding, we will see you next.
SQL server-focus not on vs not EXISTS vs Left JOIN ... Is NULL performance analysis (18)