Difference between SQL query in and exists

Source: Internet
Author: User

In is a hash connection between the external table and the internal table, while exists is a loop on the external table. Each loop then queries the internal table.

 
I. in syntax

Test_expression [NOT] IN
(
Subquery
| Expression [,... n]
)


Parameters

Test_expression

Is any valid Microsoft® SQL Server™ expression.

Subquery

Is a subquery that contains a column result set. This column must have the same data type as test_expression.

Expression [,... n]

A list of expressions used to test whether a match exists. All expressions must be of the same type as test_expression.
 

Result type

Boolean
 

Result value

If test_expression is equal to any value returned by subquery, or any expression in the list separated by commas (,), the result value is TRUE. Otherwise, the result value is FALSE.

 
II. EXISTS
 

EXISTS is used to check whether a subquery returns at least one row of data. This subquery does not actually return any data, but returns True or False.

EXISTS specifies a subquery to check the existence of rows.

Syntax: EXISTS subquery


Parameter: subquery is a restricted SELECT statement (COMPUTE clauses and INTO keywords are not allowed ).


Result type: Boolean. If the subquery contains rows, TRUE is returned. Otherwise, FLASE is returned.
 

III. Differences between IN and EXISTS

IN

Determines whether the given value matches the value in the subquery or list.

EXISTS

Specifies a subquery to detect the existence of a row.


What are the applicable scenarios?

Take the subquery (select y from T2) as the direction of consideration
 

If the result set of a subquery is large, it takes a lot of time, but T1 is relatively small to execute (select null from t2 where y = x. x) very fast, then exists is more suitable for this


Use in when the result set of the corresponding subquery is small.
 

Performance Comparison
 

If the two tables to be queried are of the same size, there is little difference between in and exists.

If one of the two tables is small and the other is a large table, exists is used for the large subquery table and in is used for the small subquery table:

 

For example
Select * from A where id in (select id from B );

Select * from A where exists (select 1 from B where A. id = B. id );

In the above two cases, the in is compared in the memory traversal, and exists needs to query the database, so when the data volume of Table B is large, the exists efficiency is better than in.

1. select * from A where id in (select id from B );

In () is executed only once. It finds all id fields in Table B and caches them. Then, check whether the IDs of Table A are equal to those of Table B. If they are equal, add the records of Table A to the result set until all the records of Table A are traversed.
Its query process is similar to the following process:

The code is as follows:
List resultSet = {};
Array A = (select * from );
Array B = (select id from B );

For (int I = 0; I <A. length; I ++ ){
For (int j = 0; j <B. length; j ++ ){
If (A [I]. id = B [j]. id ){
ResultSet. add (A [I]);
Break;
      }
   }
}
Return resultSet;


It can be seen that in () is not suitable for large data volumes in Table B because it traverses all data in Table B once.
For example, if Table A has 10000 records and Table B has 1000000 records, it is possible to traverse A maximum of 10000*1000000 times, resulting in poor efficiency.
For example, if Table A has 10000 records and Table B has 100 records, it is possible to traverse A maximum of 10000*100 times, greatly reducing the number of traversal times and greatly improving the efficiency.

Conclusion: in () is suitable for situations where Table B has less data than table A. www.111cn.net

2. select * from A where exists (select 1 from B where A. id = B. id );

Exists () will execute. length, it does not cache exists () result set, because the content of exists () result set is not important, it is important that the result set of the query statement in it is null or not empty, if it is null, false is returned. If it is not null, true is returned.
Its query process is similar to the following process:

The code is as follows:
List resultSet = {};
Array A = (select * from );

For (int I = 0; I <A. length; I ++ ){
If (exists (A [I]. id) {// execute select 1 from B where B. id = A. id to check whether A record is returned
ResultSet. add (A [I]);
   }
}
Return resultSet;


When Table B is larger than table A, exists () is suitable because it does not have so many traversal operations. You only need to execute another query.
For example, if Table A has 10000 records and Table B has 1000000 records, exists () will execute 10000 times to determine whether the IDs in Table A are equal to those in Table B.
For example, if Table A has 10000 records and Table B has 100000000 records, exists () still executes 10000 times because it only executes. length times. The more data in Table B, the more suitable exists.
For another example, if Table A has 10000 records and Table B has 100 records, it is better to use in () to traverse 10000*10000 times if exists () is executed 100 times, because in () is compared in memory traversal, and exists () needs to query the database, we all know that the query database consumes a higher performance, and the memory is relatively fast.

Conclusion: exists () is applicable when Table B is larger than table.

When the data in Table A is as big as that in Table B, the in and exists efficiency is similar. You can choose one of them.
Before inserting a record, you must check whether the record EXISTS. The insert operation is performed only when the record does not exist. You can use the EXISTS condition to prevent repeated record insertion.
Insert into A (name, age) select name, age from B where not exists (select 1 from A where A. id = B. id );

The efficiency of using EXISTS and IN is usually higher than that of in because IN does not take indexes. But it depends on the actual use: IN is suitable for the case where the external table is large but the internal table is small; EXISTS is suitable for the case where the external table is small but the internal table is large.

Not in and not exists if the query statement uses not in, the internal and external tables are scanned for the whole table, and no index is used. However, the not extsts subquery can still use the table index. Therefore, whether the table is large, not exists is faster than not in.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.