The difference between IN and exists IN SQL statements and the application of inexists IN SQL statements
Table display
First, the query involves two tables, one user and one order table. The specific table content is as follows:
User table:
Order table:
In
Determines whether the given value matches the value in the subquery or list. When querying, in first queries the sub-query table, then makes a Cartesian product for the internal table and External table, and then filters according to the conditions. Therefore, when the internal table is relatively small, the in speed is faster.
The specific SQL statement is as follows:
1 SELECT 2 * 3 FROM 4 `user` 5 WHERE 6 `user`.id IN ( 7 SELECT 8 `order`.user_id 9 FROM10 `order`11 )
This statement is very simple. Use the user_id data found in the subquery to match the id in the user table and get the result. The statement execution result is as follows:
What does the execution process look like? Let's take a look.
First, query the subquery in the database and execute the following code:
SELECT `order`.user_id FROM `order`
The result is as follows:
In this case, make a Cartesian Product Between the query result and the original user table. The result is as follows:
Then, filter the results based on the conditions of user. id IN order. user_id (both the values of the id column and the user_id column are equal and will be deleted unevenly ). Finally, two matching data items are obtained.
Exists
Specifies a subquery to detect the existence of a row. Traverse the circular External table and check whether the records in the External table are the same as the data in the internal table. After matching, the results are placed in the result set.
The specific SQL statement is as follows:
1 SELECT 2 `user`.* 3 FROM 4 `user` 5 WHERE 6 EXISTS ( 7 SELECT 8 `order`.user_id 9 FROM10 `order`11 WHERE12 `user`.id = `order`.user_id13 )
The execution result of this SQL statement is the same as that of the preceding in statement.
However, the difference is that their execution processes are completely different:
When querying with the exists keyword, we first query not the content of the subquery, but the table of our primary query. That is to say, the SQL statement we first run is:
SELECT 'user'. * FROM 'user'
The result is as follows:
Then, based on each record in the table, execute the following statements to determine whether the condition after where is true:
EXISTS ( SELECT `order`.user_id FROM `order` WHERE `user`.id = `order`.user_id )
If true is true, false is returned. If true is returned, the result of the row is retained. If false is returned, the row is deleted and the result is returned.
Differences and application scenarios
The difference between in and exists: If the subquery returns a small number of results set records, the table in the primary query is large and has an index, in should be used, if there are few primary query records in the outer layer, the table in the subquery is large, and there is an index, exists is used. In fact, we distinguish between in and exists mainly because of changes in the driving sequence (this is the key to performance changes). If exists is used, the External table is the driving table and is first accessed, if it is IN, execute the subquery first, so we will take the fast return of the driving table as the goal, then the relationship between the index and the result set will be taken into account, IN addition, NULL is not processed for IN.
In is a hash connection between the external table and the internal table, while exists is a loop on the External table. Each loop then queries the internal table. The argument that exists is more efficient than in is always inaccurate.
Not in and not exists
If the query statement uses not in, all the internal and external tables are scanned and no indexes are used. However, the not extsts subquery can still use table indexes. Therefore, whether the table is large, not exists is faster than not in.