The difference between in and exists and the efficiency of SQL execution

Last Update:2015-03-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The difference between in and exists and the efficiency of SQL execution
Many forums have recently started to discuss the difference between in and exists and the efficiency of SQL execution,
This paper deals with some differences between in and exists and the analysis of SQL execution efficiency.

In SQL, in can be divided into three categories:

1, Shape like select * from T1 where F1 in (' A ', ' B '), should be compared with the following two kinds of efficiency

SELECT * from t1 where f1= ' a ' or f1= ' B '

or select * from t1 where f1 = ' A ' union ALL SELECT * from T1 f1= ' B '

You may not be referring to this category, there is no discussion here.

2. Shape like select * from T1 where F1 in (select F1 from T2 where t2.fx= ' x '),

Where the subquery in the condition is not affected by the outer query, such queries in general, the automatic optimization will be turned into exist statement, that is, efficiency and exist.

3. Shape like select * from T1 where F1 in (select F1 from T2 where t2.fx=t1.fx),

Where the sub-query conditions are affected by the outer query, the efficiency of such queries depends on the relevant conditions related to the index of the field and the amount of data, it is generally considered less efficient than exists.

Except that the first type in statements are SQL that can be converted into exists statements, the general programming habit should be to use exists instead of in, and seldom consider the execution efficiency of in and exists.

SQL execution efficiency analysis in and exists

A, b two tables,

(1) When displaying only one table of data, such as a, when the relationship condition is only one such as ID, use in faster:

SELECT * from A where ID in (select ID from B)

(2) When displaying only one table of data, such as a, when the relationship condition is more than one such as Id,col1, using in is inconvenient, you can use exists:

SELECT * FROM A

where exists (select 1 from B where id = a.id and col1 = a.col1)

(3) When only two tables of data are displayed, using in,exists is not appropriate to use a connection:

SELECT * from A LEFT join B on id = a.id

So the way you use it depends on your requirements.

This is usually done in the test:

This is the result of my test:

SET STATISTICS IO on
SELECT * from sysobjects where exists (select 1 from syscolumns where id=syscolumns.id)
SELECT * from sysobjects where ID in (select ID from syscolumns)
SET STATISTICS IO off

(47 rows affected)

Table ' Syscolpars '. Scan count 1, logical read 3 times, physical read 0 times, read 2 times, LOB logic read 0 times, lob physical read 0 times, lob read 0 times.

Table ' Sysschobjs '. Scan count 1, logical read 3 times, physical read 0 times, read 0 times, LOB logic read 0 times, lob physical read 0 times, lob read 0 times.

(1 rows affected)

(44 rows affected)

Table ' Syscolpars '. Scan Count 47, logical read 97 times, physical read 0 times, read 0 times, LOB logic read 0 times, lob physical read 0 times, lob read 0 times.

Table ' Sysschobjs '. Scan count 1, logical read 3 times, physical read 0 times, read 0 times, LOB logic read 0 times, lob physical read 0 times, lob read 0 times.

(1 rows affected)

SET STATISTICS IO on
SELECT * from syscolumns where exists (select 1 from sysobjects where id=syscolumns.id)
SELECT * from syscolumns where ID in (select ID from sysobjects)
SET STATISTICS IO off

(419 rows affected)

Table ' Syscolpars '. Scan count 1, logical read 10 times, physical read 0 times, read 15 times, LOB logic read 0 times, lob physical read 0 times, lob read 0 times.

Table ' Sysschobjs '. Scan count 1, logical read 3 times, physical read 0 times, read 0 times, LOB logic read 0 times, lob physical read 0 times, lob read 0 times.

(1 rows affected)

(419 rows affected)

Table ' Syscolpars '. Scan count 1, logical read 10 times, physical read 0 times, read 0 times, LOB logic read 0 times, lob physical read 0 times, lob read 0 times.

Table ' Sysschobjs '. Scan count 1, logical read 3 times, physical read 0 times, read 0 times, LOB logic read 0 times, lob physical read 0 times, lob read 0 times.

(1 rows affected)

Test results (overall exists efficiency is higher than in):

Efficiency: The index of conditional factors is very critical

Take syscolumns as a condition: syscolumns data is greater than sysobjects

In

Scan Count 47, logic read 97 times,

With exists

Scan count 1, logic read 3 times

sysobjects as a condition: sysobjects data is less than syscolumns

exists more than 15 reads

I remember doing the following tests:

Table

Test

Structure

ID int identity (--ID), primary key auto-increment

sort int,--category, every 1000 data is a category

Sid INT--Category ID

Inserting 600w Data

If you want to query the maximum SID for each category,

SELECT * FROM Test a
Where NOT EXISTS (select 1 from test where sort = a.sort and Sid > A.sid)

Than

SELECT * FROM Test a
Where SID in (the Select Max (SID) from test where sort = a.sort)

is more than three times times more efficient to execute. The specific execution time has been forgotten. But the result I remember very clearly. Before that I had been advocating the second way of writing, and then changed the first.

In and exists's SQL execution efficiency analysis, and then simply give an example:

Declare @t table (ID int identity (), v varchar (10))
Insert @t Select ' A '
UNION ALL select ' B '
UNION ALL SELECT ' C '
UNION ALL select ' d '
UNION ALL SELECT ' E '
UNION ALL select ' B '
UNION ALL SELECT ' C '
SQL notation for--A statement in
SELECT * from @t where V on (select V from @t Group by V has Count (*) >1)
SQL notation for--b statement exists
SELECT * from @t a where exists (select 1 from @t where Id!=a.id and V=A.V)

The two statement function is to find the table variable @t, V contains a record of duplicate values.

The first SQL statement uses in, but the subquery has no external connection.

The second SQL statement uses exists, but the subquery has a connection to the outside.

Everyone read the SQL query plan, very clear.

Selec v from @t Group by V have Count (*) > 1

This SQL statement, its execution does not depend on the main query main sentence (I do not know how to describe in the outside and inside, so call it, people understand it)

Then SQL is optimized at query time, and its result set is cached.

That's the cache.

V

---

B

C

Subsequent operations, the main query at each step, the equivalent in the processing where V in (' B ', ' C ') of course, the statement will not be so converted, just to illustrate the meaning, that is, the main query each processing a row (when recorded as CurrentRow, the subquery will no longer scan the table, only matches the cached results)

and

Select 1 from @t where Id!=a.id and V=A.V

This sentence, its execution results depend on each row in the main query.

When the first row of the main query is CurrentRow (id=1), the subquery is executed again in select 1 from @t where id!=1 and v= ' A ' scan the entire table, starting with the first row of Currentsubrow (id=1) to scan, with the same ID, over Filter, subquery row down, Currentsubrow (id=2) continue, ID is different, but V value does not match, subquery Row continues to move down ... Until Currentsubrow (id=7) does not find a match, the subquery processing ends, the first line CurrentRow (id=1) is filtered, and the main query record line moves down

When processing the second row, CurrentRow (id=2), subquery Select 1 from @t where id!=2 and v= ' B ', first row currentsubrow (id=1) v value mismatch, sub-query down, second row, ID same filter, third row, ... To line sixth, the IDs are different, the V Values match, and the matching results are found, that is, return, no longer processing the record down. The main query moves down.

Handle third row, etc. ...

SQL optimization, using in and exist? Mainly depends on whether your filter is on the main query or on a subquery.

Through analysis, we believe that we have a clearer understanding of the difference between in and exists, and the efficiency of SQL execution in and exists.

This article from Csdn Blog, reproduced please indicate the source: http://blog.csdn.net/qzww5324/archive/2009/04/23/4103115.aspx

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The difference between in and exists and the efficiency of SQL execution

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The difference between in and exists and the efficiency of SQL execution

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support