Difference between in and exists in SQL queries

Source: Internet
Author: User

Select * from A where id in (select id from B );

Select * from A where exists (select 1 from B where A. id = B. id );

In the above two cases, the in is compared in the memory traversal, and exists needs to query the database, so when the data volume of Table B is large, the exists efficiency is better than in.

1. select * from A where id in (select id from B );

In () is executed only once. It finds all id fields in Table B and caches them. Then, check whether the IDs of Table A are equal to those of Table B. If they are equal, add the records of Table A to the result set until all the records of Table A are traversed.
Its query process is similar to the following process:

Copy codeThe Code is as follows:
List resultSet = {};
Array A = (select * from );
Array B = (select id from B );

For (int I = 0; I <A. length; I ++ ){
For (int j = 0; j <B. length; j ++ ){
If (A [I]. id = B [j]. id ){
ResultSet. add (A [I]);
Break;
}
}
}
Return resultSet;

It can be seen that in () is not suitable for large data volumes in Table B because it traverses all data in Table B once.
For example, if Table A has 10000 records and table B has 1000000 records, it is possible to traverse A maximum of 10000*1000000 times, resulting in poor efficiency.
For example, if Table A has 10000 records and table B has 100 records, it is possible to traverse A maximum of 10000*100 times, greatly reducing the number of traversal times and greatly improving the efficiency.

Conclusion: in () is suitable for the case where Table B has less data than table.

2. select * from A where exists (select 1 from B where A. id = B. id );

Exists () will execute. length, it does not cache exists () result set, because the content of exists () result set is not important, it is important that the result set of the query statement in it is null or not empty, if it is null, false is returned. If it is not null, true is returned.
Its query process is similar to the following process:

Copy codeThe Code is as follows:
List resultSet = {};
Array A = (select * from );

For (int I = 0; I <A. length; I ++ ){
If (exists (A [I]. id) {// execute select 1 from B where B. id = A. id to check whether A record is returned
ResultSet. add (A [I]);
}
}
Return resultSet;

When table B is larger than table A, exists () is suitable because it does not have so many traversal operations. You only need to execute another query.
For example, if Table A has 10000 records and table B has 1000000 records, exists () will execute 10000 times to determine whether the IDs in Table A are equal to those in table B.
For example, if Table A has 10000 records and table B has 100000000 records, exists () still executes 10000 times because it only executes. length times. The more data in Table B, the more suitable exists.
For another example, if Table A has 10000 records and table B has 100 records, it is better to use in () to traverse 10000*10000 times if exists () is executed 100 times, because in () is compared in memory traversal, and exists () needs to query the database, we all know that the query database consumes a higher performance, and the memory is relatively fast.

Conclusion: exists () is applicable when table B is larger than table.

When the data in Table A is as big as that in Table B, the in and exists efficiency is similar. You can choose one of them.
Before inserting a record, you must check whether the record EXISTS. The insert operation is performed only when the record does not exist. You can use the EXISTS condition to prevent repeated record insertion.
Insert into A (name, age) select name, age from B where not exists (select 1 from A where A. id = B. id );

The efficiency of using EXISTS and IN is usually higher than that of in because IN does not take indexes. But it depends on the actual use: IN is suitable for the case where the external table is large but the internal table is small; EXISTS is suitable for the case where the external table is small but the internal table is large.

About exists:

EXISTS is used to check whether a subquery returns at least one row of data. In fact, this subquery does not return any data, but returns True or False.
EXISTS specifies a subquery to check the existence of rows.
Syntax: EXISTS subquery
Parameter: subquery is a restricted SELECT statement (COMPUTE clauses and INTO keywords are not allowed ).
Result type: Boolean. If the subquery contains rows, TRUE is returned. Otherwise, FLASE is returned.
Conclusion: select * from A where exists (select 1 from B where A. id = B. id)
The Return Value of the EXISTS (including not exists) clause is a boolean value. EXISTS has a subquery Statement (SELECT... FROM...), which is called the EXIST internal query statement. The query statement returns a result set. The EXISTS clause returns a Boolean value based on whether the query result set is null or not.
It can be understood that each row of the external query table is substituted into the internal query as a test. If the results returned by the internal query are not null, The EXISTS clause returns TRUE, this row can be used as the result row of the external query, otherwise it cannot be used as the result.
The analyzer first looks at the first word of the statement. When it finds that the first word is the SELECT keyword, it jumps to the FROM keyword, finds the table name through the FROM keyword, and loads the table into the memory. Next, find the WHERE keyword. If the WHERE keyword is not found, return to SELECT to find the field for parsing. If the WHERE keyword is found, analyze the conditions in it, and then return to SELECT to analyze the field. Finally, a virtual table is created.
The WHERE keyword is followed by a conditional expression. After the conditional expression is calculated, a return value is displayed, that is, if the value is not 0 or 0, the return value is true (true), and if the value is not 0, the return value is false (false ). Similarly, the condition after WHERE also has a return value, true or false, to determine whether to execute the SELECT statement.
Analyzer first finds the keyword SELECT, then jumps to the FROM keyword to import the STUDENT table to the memory, finds the first record through the pointer, and then finds the WHERE keyword to calculate its conditional expression, if it is true, this record is loaded into a virtual table, and the Pointer Points to the next record. If it is false, the pointer points directly to the next record without any other operations. Retrieve the complete table and return the retrieved virtual table to the user. EXISTS is part of a conditional expression, and it also has a return value (true or false ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.