Usage and difference of in, exists, not in, and not exists in SQL queries
1. in and exists
In connects the External table to the internal table as a hash (Dictionary set), while exists loops the External table and queries the internal table after each loop. The statement that exists is more efficient than in is always inaccurate. If the two tables to be queried are of the same size, there is little difference between exists and in; if one of the two tables is smaller and the other is larger, exists is used for the large subquery table, and in is used for the small subquery table.
Example: Table A (small table) and Table B (large table)
Method 1: Index usage
1) select * from A where id in (select id from B) --> low efficiency. The index of id column in table A is used.
2) select * from A where exists (select id from B where id = A. id) --> high efficiency, using the index of the id column in table B
3) select * from B where id in (select id from A) --> high efficiency, using the index of the id column in table B
4) select * from B where exists (select id from A where id = B. id) --> low efficiency, using the index of id column in table
Method 2: traverse and use
1) in () is executed only once. It finds all id fields in Table B and caches them. Check whether the IDs of Table A are equal to those of Table B. If the IDs are equal, add the records of Table A to the result set until all the records of Table A are traversed.
The query process is similar to the execution process of the following code:
List resultSet = {};
Array A = (select * from );
Array B = (select id from B );
For (int I = 0; I <A. length; I ++ ){
For (int j = 0; j <B. length; j ++ ){
If (A [I]. id = B [j]. id ){
ResultSet. add (A [I]);
Break;
}
}
}
Return resultSet;
We can see that in () is not suitable for large data volumes in Table B because it traverses all data in table B.
For example, if Table A has 10000 records and table B has 1000000 records, it is possible to traverse A maximum of 10000*1000000 times, resulting in poor efficiency.
For example, if Table A has 10000 records and table B has 100 records, it is possible to traverse A maximum of 10000*100 times, greatly reducing the number of traversal times and greatly improving the efficiency.
Conclusion: in () is suitable for the case where Table B has less data than table.
2) exists () will execute. length, it does not cache exists () result set, because the content of exists () result set is not important, it is important that the result set of the query statement in it is null or not empty, if it is null, false is returned. If it is not null, true is returned.
The query process is similar to the execution process of the following code:
List resultSet = {};
Array A = (select * from );
For (int I = 0; I <A. length; I ++ ){
If (exists (A [I]. id) {// execute select id from B where B. id = A. id to check whether A record is returned
ResultSet. add (A [I]);
}
}
Return resultSet;
When table B is larger than table A, exists () is suitable because it does not have so many traversal operations. You only need to execute another query.
For example, if Table A has 10000 records and table B has 1000000 records, exists () will execute 10000 times to determine whether the IDs in Table A are equal to those in table B.
For example, if Table A has 10000 records and table B has 100000000 records, exists () still executes 10000 times because it only executes. length times. The more data in Table B, the more suitable exists.
For another example, if Table A has 10000 records and table B has 100 records, it is better to use in () to traverse 10000*10000 times if exists () is executed 100 times, because in () is compared in memory traversal, and exists () needs to query the database, we all know that the query database consumes a higher performance, and the memory is relatively fast.
Conclusion: exists () is suitable for the case where Table B is larger than table.
When the data in Table A is as big as that in Table B, the in and exists efficiency is similar. You can choose one of them.
2. not in and not exists
Not in is not logically equivalent to not exists. If you misuse not in, be careful that your program has a fatal bug.
See the following example:
Create table A1 (c1 int, c2 int );
Create table A2 (c1 int, c2 int );
Insert into A1 values (1, 2 );
Insert into A1 values (1, 3 );
Insert into A2 values (1, 2 );
Insert into A2 values (1, null );
Select * from A1 where c2 not in (select c2 from A2); --> execution result: none (null)
Select * from A1 where not exists (select c2 from A2 where A2.c2 = A1.c2); --> execution result: 1 3
As we can see, not in has an unexpected result set and there is a logical error. Use not in (it will call the subquery) and not exists (it will call the associated subquery ). If any record returned in the subquery contains a null value, no record is returned in the query. If the subquery field has a non-empty limit, you can use not in.
If the query statement uses not in, all internal and external tables are scanned, and no index is used. However, subqueries of not exists can still use table indexes. Therefore, no matter which table is large, not exists is faster than not in.
3. in And =
Select name from employee where name in ('zhang san', 'Li si', 'wang wu ');
And
Select name from employee where name = 'zhang san' or name = 'Li si' or name = 'wang wu ';
The results are the same.
4. exists prevents repeated records from being inserted
Sometimes non-repeated records need to be inserted. in Mysql, you can use the ignore keyword to ignore existing records, but they can only be ignored through the primary key and cannot be ignored according to custom conditions.
Syntax: insert ignore into tableName (column1, column2 ,......) Values (value1, value2 ,......);
However, other databases do not necessarily provide keywords similar to ignore. Therefore, you can use the exists condition to prevent duplicate records from being inserted.
Insert into A (name, age) select name, age from B where not exists (select 1 from A where A. id = B. id );
5. More about exists
Exists is used to check whether the result set returned by the subquery is null. The subquery does not actually return any data, but returns true or false.
Syntax: exists subQuery
Parameter: subQuery is a restricted select statement (compute clauses and into keywords are not allowed ).
Result type: boolean. If the subquery contains rows, true is returned. Otherwise, false is returned.
Conclusion: select * from A where exists (select 1 from B where A. id = B. id );
It can be understood that each row of the external query table is substituted into the internal query as a test. If the result set returned by the internal query is not empty, the exists clause returns true, this row can be used as the result row of the external query. Otherwise, it cannot be used as the result.
-------- The preceding SQL content is applicable to Mysql, SQL Server, and Oracle based on the results obtained from the online materials.
This article permanently updates the link address: