SQL server-focus exists and in performance analysis (16)

Last Update:2016-12-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

Before we learned the comparison of not exists and not in, of course, exists and in comparisons, so this section we will learn exists and in comparison, short content, in-depth understanding, always to review the basics.

Preliminary discussion on exists and in

We create the table Table1 and remove the six data from the previously created BigTable table and insert it, along with a duplicate of the data as follows:

CREATE TABLE Table1 (intcol uniqueidentifier) Insert into Table1 (intcol) Values ('b927ded5-c78b-4f53-80bf-f65a6ce86d87') Insert into Table1 (intcol) Values ('1be326ec-4b62-4feb-8421-d9edf2df28c8') Insert into Table1 (intcol) Values ('91c92337-24ba-4ebf-b2a3-14b987179ca6') Insert into Table1 (intcol) Values ('c03168f8-c1c7-4903-a8ee-9b4d9c0b6b1f') Insert into Table1 (intcol) Values ('C15ac08c-8d3d-4381-9c64-54854ddf15b7') Insert into Table1 (intcol) Values ('C15ac08c-8d3d-4381-9c64-54854ddf15b7')

Now we're going to do in query

Use Tsql2012goselect somecolumnfrom bigtablewhere somecolumn in (SELECT intcol from dbo. TABLE1)

We talked about this before. If an inner join returns six data at this point, because the inner join emphasizes the table after the join, if the right table has more than one data match, then more than one data is returned, but in the in query, only five data is returned, why is this?

When querying with in, even if there is duplicate data in the subquery will not worry about the problem, it will automatically filter processing, because in the semi join semi-join in the use of the right or left semi-join, that is, only return one of the duplicate data. So what about the situation in exists?

SELECT somecolumnfrom dbo. BigTable WHERE EXISTS (SELECT intcol from dbo. TABLE1)

At this point, because there is no where condition, all data in the external query table is returned, and in order to achieve the equivalent result with the in query, we need to add the Where condition

= T.intcol)

and exists compared to in, when the need to compare two or more than two conditions, exists can be better implemented and in is not so easy, such as the following

= T.intcol and BT. Othercol = T.othercol)

Okay, here we go, we're talking about the performance issues.

Further discussion of exists and in

We use the previous table directly to query

= Bigtable.somecolumn)

Both use the right-half join in the default clustered index scan and hash match, and the cost is the same. Let's do it again. Create an index on both query columns

CREATE INDEX Idx_bigtable_somecolumnon BigTable (somecolumn) CREATE index Idx_smallertable_lookupcolumnon smallertable (Lookupcolumn)

At this point, the query efficiency is improved only after the index is created, and the query plan replaces the left half of the hash match with the inner join in the merge join, and the flow aggregation is increased. The two are still on the same cost. In the other tutorials I see the impression has always been that the use of exists instead of in, its exists query performance is higher than in, and the fact is the cost is consistent, is it 1 million data is too small, or the scene is not enough, or the statement is not complex enough. Are talking about the use of the scene, then exactly in what scenario exists better than in performance, this has a better understanding of you, hoping to get the most practical answer in the comments. And I think that with exists words, only exists is more flexible than in, and there will be no unexpected results. Let's keep looking down.

In-depth discussion of exists and in

Let's take a look at what happens when in, we first create the test table and insert the data as follows:

Use tsql2012gocreate table table1 (id INT, title VARCHAR (), Someintcol INT) gocreate table Table12 (id int, t1id int, somedata VARCHAR) GO

Inserting test data

INSERT into Table1select1,'Title 1',5UNION Allselect2,'Title 2',5UNION Allselect3,'Title 3',5UNION Allselect4,'Title 4',5UNION AllselectNULL,'Title 5',5UNION AllselectNULL,'Title 6',5INSERT into Table12select1,1,'Data 1'UNION Allselect2,1,'Data 2'UNION Allselect3,2,'Data 3'UNION Allselect4,3,'Data 4'UNION Allselect5,3,'Data 5'UNION Allselect6,3,'Data 6'UNION Allselect7,4,'Data 7'UNION Allselect8,NULL,'Data 8'UNION Allselect9,6,'Data 9'UNION AllselectTen,6,'Data Ten'UNION Allselect One,8,'Data One'

The data in Table1 and table2 are as follows:

Explore one (in query leading to incorrect results)

Let's compare the exists and in queries as follows:

Use tsql2012goselect    t1. * from    dbo.table1 as T1where    t1.id in (select T1id from dbo.table12) SELECT    t1. * from    dbo.table1 as T1 where    * from Dbo.table12 as t2 where t1.id = t2.t1id)

At this point both returns are correct, and then we'll look at other things and we need to get all the rows in the Table1 that have no data in table2.

Use tsql2012goselect    t1. * from   dbo.table1 as T1 where    * from Dbo.table12 as t2 where t1.id = t2.t1id) SELECT    t1.* from as  T1  WHERE    as T2)

At this time using exists to get the correct results, and through in query did not achieve the purpose of our query, the reason is also said in is based on the three-value logic, at this time to encounter null will be treated as unknown, so the resulting result set is wrong. We continue to explore.

Explore 2 (handwriting errors cause unexpected results)

We re-create the test table and insert the test data as follows:

intint) Insert TestTable1 VALUES (1), (2), (3) insert TestTable2 VALUES (1), (2)

Let's start with the following query:

* FROMtesttable1where id1 inch (SELECT id2 from TestTable2)

At this point the result is correct, if in the subquery we will column ID2 written id1, then what will be the situation?

SELECT * fromtesttable1where id1 inch (select Id1 from TestTable2)

I don't know if you notice anything. No, the surface is fine, we then run the sub-query below

SELECT Id1 from TestTable2

Run the query alone, the result is actually wrong, and here we look at the table created by the column, ID1 is in the Table1 rather than in the Table2, so caused this unexpected error, if the handwriting error, the result data also has, generally will not be aware of, by using in query caused by unexpected appearance. Instead of getting the wrong result set, you will be given a direct error when using exists as follows

SELECT ** from TestTable2 t2 WHERE t2.id2 = t1.id1)

Of course, some people will say that they will not make such a low-level error, but who can guarantee that SQL has smart tips more prone to make such a mistake, because directly in the subquery will have such a column appears, but the column in the subquery table does not exist at all. Therefore, based on the discussion of two points, the use of exists more insurance. In this regard, the introduction of exists and in is considered to be the end of this conclusion.

exists and in performance Analysis conclusion: We recommend using exists instead of in, because the performance cost is the same as the exists performance is better than in, and the exists is more flexible than in, more secure, and no unexpected results are possible with insurance.

Summarize

In this section we explain the exists and in, on the performance of the two are still a bit puzzled, after all, the scene is not enough, of course, it is recommended to use exists, but not because of performance. Our next section explains the left join and not EXISTS, short content, in-depth understanding, we'll see you next.

SQL server-focus exists and in performance analysis (16)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

SQL server-focus exists and in performance analysis (16)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

SQL server-focus exists and in performance analysis (16)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support