Objective
Before we learned the comparison of not exists and not in, of course, exists and in comparisons, so this section we will learn exists and in comparison, short content, in-depth understanding, always to review the basics.
Preliminary discussion on exists and in
We create the table Table1 and remove the six data from the previously created BigTable table and insert it, along with a duplicate of the data as follows:
CREATE TABLE Table1 (intcol uniqueidentifier) Insert into Table1 (intcol) Values ('b927ded5-c78b-4f53-80bf-f65a6ce86d87') Insert into Table1 (intcol) Values ('1be326ec-4b62-4feb-8421-d9edf2df28c8') Insert into Table1 (intcol) Values ('91c92337-24ba-4ebf-b2a3-14b987179ca6') Insert into Table1 (intcol) Values ('c03168f8-c1c7-4903-a8ee-9b4d9c0b6b1f') Insert into Table1 (intcol) Values ('C15ac08c-8d3d-4381-9c64-54854ddf15b7') Insert into Table1 (intcol) Values ('C15ac08c-8d3d-4381-9c64-54854ddf15b7')
Now we're going to do in query
Use Tsql2012goselect somecolumnfrom bigtablewhere somecolumn in (SELECT intcol from dbo. TABLE1)
We talked about this before. If an inner join returns six data at this point, because the inner join emphasizes the table after the join, if the right table has more than one data match, then more than one data is returned, but in the in query, only five data is returned, why is this?
When querying with in, even if there is duplicate data in the subquery will not worry about the problem, it will automatically filter processing, because in the semi join semi-join in the use of the right or left semi-join, that is, only return one of the duplicate data. So what about the situation in exists?
SELECT somecolumnfrom dbo. BigTable WHERE EXISTS (SELECT intcol from dbo. TABLE1)
At this point, because there is no where condition, all data in the external query table is returned, and in order to achieve the equivalent result with the in query, we need to add the Where condition
= T.intcol)
and exists compared to in, when the need to compare two or more than two conditions, exists can be better implemented and in is not so easy, such as the following
= T.intcol and BT. Othercol = T.othercol)
Okay, here we go, we're talking about the performance issues.
Further discussion of exists and in
We use the previous table directly to query
= Bigtable.somecolumn)
Both use the right-half join in the default clustered index scan and hash match, and the cost is the same. Let's do it again. Create an index on both query columns
CREATE INDEX Idx_bigtable_somecolumnon BigTable (somecolumn) CREATE index Idx_smallertable_lookupcolumnon smallertable (Lookupcolumn)
At this point, the query efficiency is improved only after the index is created, and the query plan replaces the left half of the hash match with the inner join in the merge join, and the flow aggregation is increased. The two are still on the same cost. In the other tutorials I see the impression has always been that the use of exists instead of in, its exists query performance is higher than in, and the fact is the cost is consistent, is it 1 million data is too small, or the scene is not enough, or the statement is not complex enough. Are talking about the use of the scene, then exactly in what scenario exists better than in performance, this has a better understanding of you, hoping to get the most practical answer in the comments. And I think that with exists words, only exists is more flexible than in, and there will be no unexpected results. Let's keep looking down.
In-depth discussion of exists and in
Let's take a look at what happens when in, we first create the test table and insert the data as follows:
Use tsql2012gocreate table table1 (id INT, title VARCHAR (), Someintcol INT) gocreate table Table12 (id int, t1id int, somedata VARCHAR) GO
Inserting test data
INSERT into Table1select1,'Title 1',5UNION Allselect2,'Title 2',5UNION Allselect3,'Title 3',5UNION Allselect4,'Title 4',5UNION AllselectNULL,'Title 5',5UNION AllselectNULL,'Title 6',5INSERT into Table12select1,1,'Data 1'UNION Allselect2,1,'Data 2'UNION Allselect3,2,'Data 3'UNION Allselect4,3,'Data 4'UNION Allselect5,3,'Data 5'UNION Allselect6,3,'Data 6'UNION Allselect7,4,'Data 7'UNION Allselect8,NULL,'Data 8'UNION Allselect9,6,'Data 9'UNION AllselectTen,6,'Data Ten'UNION Allselect One,8,'Data One'
The data in Table1 and table2 are as follows:
Explore one (in query leading to incorrect results)
Let's compare the exists and in queries as follows:
Use tsql2012goselect t1. * from dbo.table1 as T1where t1.id in (select T1id from dbo.table12) SELECT t1. * from dbo.table1 as T1 where * from Dbo.table12 as t2 where t1.id = t2.t1id)
At this point both returns are correct, and then we'll look at other things and we need to get all the rows in the Table1 that have no data in table2.
Use tsql2012goselect t1. * from dbo.table1 as T1 where * from Dbo.table12 as t2 where t1.id = t2.t1id) SELECT t1.* from as T1 WHERE as T2)
At this time using exists to get the correct results, and through in query did not achieve the purpose of our query, the reason is also said in is based on the three-value logic, at this time to encounter null will be treated as unknown, so the resulting result set is wrong. We continue to explore.
Explore 2 (handwriting errors cause unexpected results)
We re-create the test table and insert the test data as follows:
intint) Insert TestTable1 VALUES (1), (2), (3) insert TestTable2 VALUES (1), (2)
Let's start with the following query:
* FROMtesttable1where id1 inch (SELECT id2 from TestTable2)
At this point the result is correct, if in the subquery we will column ID2 written id1, then what will be the situation?
SELECT * fromtesttable1where id1 inch (select Id1 from TestTable2)
I don't know if you notice anything. No, the surface is fine, we then run the sub-query below
SELECT Id1 from TestTable2
Run the query alone, the result is actually wrong, and here we look at the table created by the column, ID1 is in the Table1 rather than in the Table2, so caused this unexpected error, if the handwriting error, the result data also has, generally will not be aware of, by using in query caused by unexpected appearance. Instead of getting the wrong result set, you will be given a direct error when using exists as follows
SELECT ** from TestTable2 t2 WHERE t2.id2 = t1.id1)
Of course, some people will say that they will not make such a low-level error, but who can guarantee that SQL has smart tips more prone to make such a mistake, because directly in the subquery will have such a column appears, but the column in the subquery table does not exist at all. Therefore, based on the discussion of two points, the use of exists more insurance. In this regard, the introduction of exists and in is considered to be the end of this conclusion.
exists and in performance Analysis conclusion: We recommend using exists instead of in, because the performance cost is the same as the exists performance is better than in, and the exists is more flexible than in, more secure, and no unexpected results are possible with insurance.
Summarize
In this section we explain the exists and in, on the performance of the two are still a bit puzzled, after all, the scene is not enough, of course, it is recommended to use exists, but not because of performance. Our next section explains the left join and not EXISTS, short content, in-depth understanding, we'll see you next.
SQL server-focus exists and in performance analysis (16)