Transferred from Https://www.cnblogs.com/liyasong/p/sql_in_exists.html and http://blog.csdn.net/lick4050312/article/details/4476333
Table Show
The two tables involved in the query, a user and an order table, are the following:
User table:
Order table:
Inch
First, determine whether the given value matches a subquery or a value in the list. In the query, first query the sub-query table, and then the inner table and the appearance of a Cartesian product, and then filter by criteria. Therefore, the in speed is faster when the table is relatively small.
The specific SQL statements are as follows:
1SELECT 2 * 3 from 4 ' user" Span style= "COLOR: #008080" > 5 where 6 ' user". ID in ( 7 Span style= "COLOR: #0000ff" >select 8 ' order ". user_id 9 from 10 ' order 11)
The statement is simple enough to match the ID of the user table with the USER_ID data found in the subquery and get the result. The result of the statement execution is as follows:
What is the execution process like? Let's have a look.
First, inside the database, query the subquery, execute the following code:
SELECT 'order '. user_id from 'order '
After execution, the results are as follows:
At this point, the result of the query and the original user table to make a Cartesian product, the result is as follows:
At this point, the results are filtered according to our user.id in order.user_id (comparing the values of the ID column and the user_id column for equality and will not be equal to the deletion). Finally, two qualifying data are obtained.
Second, select * from A where ID in (select ID from B)
The above query uses the in statement, in () executes only once, it detects all the ID fields in table B and caches them. After that, check that the ID of table A is equal to the ID in table B, and if it is equal, add the records of table A to the result set until all records of table A are traversed. Its query process is similar to the following procedure
List resultset=[]; Array a= (SELECT * from A); Array b= (select ID from B);
for (int i=0;i<a.length;i++) {for (int j=0;j<b.length;j++) {if (a[i].id==b[j].id) {Resultset.add (A [i]); Break }}} return resultSet;
As you can see, it is not appropriate to use in () when the table B data is large, because it iterates through the B-table data all at once. such as: A table has 10,000 records, B table has 1 million records, then it is possible to traverse 10000*1000000 times, the efficiency is very poor. Again such as: A table has 10,000 records, b table has 100 records, then it is possible to traverse 10000*100 times, the number of traversal greatly reduced, efficiency greatly improved.
Conclusion: In () The case of B-table is smaller than the data of table A
Exists
First, specify a subquery to detect the existence of a row. Iterate over the outer surface and see if the records in the appearance are the same as the data in the table. The result is placed in the result set.
The specific SQL statements are as follows:
1SELECT2 'User '.*3From4 'User ' 5 where 6 exists (select 8 ' order ". user_id 9 from10 ' order" 11 Span style= "COLOR: #0000ff" >where12 ' user ". ID = ' order ". User_id13)
The execution result of this SQL statement is the same as the result of the above in.
However, the difference is that their execution process is completely different:
When using the EXISTS keyword for querying, first of all, we are not querying the contents of the subquery, but rather check the table of our main query, that is, we first execute the SQL statement is:
SELECT 'user '. * from 'user '
The results are as follows:
Then, according to each record of the table, execute the following statement, in turn, to determine whether the condition behind the Where is true:
EXISTS ( SELECT 'order '). From 'orderWHERE '= 'order '. user_id )
Returns False if True does not form. If True is returned, the row result is retained and if False is returned, the row is deleted and the resulting result is returned.
Second, select A.* from a a where exists (select 1 from b b where a.id=b.id)
The above query uses the EXISTS statement, exists () executes a.length times, and it does not cache the exists () result set because the contents of the exists () result set are not important, it is important whether there is a record in the result set, and if so, returns True if none returns False . Its query process is similar to the following procedure
List resultset=[]; Array a= (SELECT * from A)
for (int i=0;i<a.length;i++) {if (exists (a[i].id) {//) executes select 1 from b where b.id=a.id if there are records returned RESULTSET. Add (A[i]); }} return resultSet;
When the B table is larger than the A-table data, it is appropriate to use exists (), because it does not have that traversal operation and only needs to execute the query again. such as: A table has 10,000 records, B table has 1 million records, then exists () will perform 10,000 times to determine whether the ID in table A is equal to the ID in table B. such as: A table has 10,000 records, b table has 100 million records, then exists () or execute 10,000 times, because it only executes a.length times, the more the B table data, the more suitable for exists () to play the effect. Again such as: A table has 10,000 records, b table has 100 records, then exists () or execute 10,000 times, it is better to use in () to traverse 10000*100 times, because in () is in memory traversal comparison, and exists () need to query the database, We all know that querying a database consumes more performance and memory is faster.
Conclusion: Exists () is suitable for the case of B-table larger than a-table data
When the A-table data is as large as the B-table data, in and exists efficiency is similar, can choose one to use.
Differences and application Scenarios
In and exists differences: if the subquery results in a smaller number of result sets, the table in the main query should use in if it is large and indexed, whereas if the outer main query records are smaller, the tables in the subquery are large and the indexes are indexed with exists. In fact, we distinguish in and exists is mainly caused by the change of the driving sequence (this is the key to performance change), if it is exists, then the other layer table is the driver table, first accessed, if it is in, then execute the subquery first, so we will be the driver table of the fast return as the target, Then the relationship between the index and the result set is taken into account, and in the case of NULL is not processed.
In is the appearance and the inner table as a hash connection, and exists is the external loop loop, each loop loop and then query the internal table. The assertion that exists is more efficient than in is inaccurate.
Not in and not exists
If the query statement uses not-in so that the outer surface of the full table is scanned, the index is not used, and the subquery of not extsts can still use the index on the table. So no matter the table is large, using not exists is faster than not.
For example, in the Northwind database there is a query for select C.customerid,companyname from Customers C where EXISTS (select OrderID from Orders o where O . Customerid=c.customerid) How does this exists work? The subquery returns the OrderID field, but the outside query is looking for the CustomerID and CompanyName fields, and the two fields are definitely not in OrderID, how does this match?
EXISTS is used to check if a subquery returns at least one row of data, and the subquery does not actually return any data, but instead returns a value of TRUE or False EXISTS specifies a subquery to detect the presence of a row.
Syntax: EXISTS subquery parameter: subquery is a restricted SELECT statement (COMPUTE clauses and into keywords are not allowed). Result type: Boolean returns TRUE if the subquery contains rows, otherwise returns flase.
Example Table A:tablein |
Example Table B:tableex |
|
|
(i). Using NULL in a subquery still returns the result set select * from Tablein where exists (select NULL) is equivalent to: SELECT * from Tablein
(b). Compare queries that use EXISTS and in. Note Two queries return the same result. SELECT * from Tablein where exists (select BID from Tableex where bname=tablein.aname) select * from Tablein where ANAME in (select Bname from Tableex)
(iii). Compare queries that use EXISTS and = any. Note Two queries return the same result. SELECT * from Tablein where exists (select BID from Tableex where bname=tablein.aname) select * from Tablein where Aname=an Y (select bname from Tableex)
The role of not EXISTS is the opposite of EXISTS. If the subquery does not return a row, the WHERE clause in not EXISTS is satisfied.
Conclusion: The return value of the EXISTS (including not EXISTS) clause is a bool value. There is a subquery inside the exists (SELECT ... From ...), which I call an inner query statement for exist. The inner query statement returns a result set. The EXISTS clause returns a Boolean value based on the result set of its query statement, either empty or non-empty.
A popular can be understood as: each row of the outer query table, in the query as a test, if the query returned by the result of a non-null value, then the EXISTS clause returns true, this row of rows can be used as the result row of the outer query, otherwise it cannot be a result.
The parser first looks at the first word of the statement, and when it finds out that the first word is the SELECT keyword, it jumps to the FROM keyword and then finds the table name from the keyword and loads the table into memory. Then find the WHERE keyword, if not found, return to the Select to find the field resolution, if found where, then analyze the conditions, complete and then return to the Select analysis field. Finally form a virtual table we want. The WHERE keyword is followed by a conditional expression. When the conditional expression is evaluated, a return value of 0 or 0, not 0 is true (true), and 0 is False (false). Similarly, the condition behind the where also has a return value, true or FALSE, to determine that the next hold does not execute the SELECT. The parser first finds the keyword Select, jumps to the FROM keyword, imports the student table into memory, finds the first record through the pointer, and then finds the conditional expression where the keyword evaluates it, and if it is true then put the record in a virtual table, and the pointer points to the next record. If False then the pointer points directly to the next record without any other action. Retrieves the entire table and returns the retrieved virtual table to the user. Exists is part of the conditional expression, and it also has a return value (TRUE or false).
Before inserting a record, you need to check to see if the record already exists, and to perform the insert operation only if the record does not exist, you can prevent the insertion of duplicate records by using the EXISTS conditional clause. INSERT into Tablein (aname,asex) SELECT top 1 ' Zhang San ', ' Male ' from Tablein where isn't exists (SELECT * from Tablein WHERE Tablein . AID = 7)
The problem of efficiency in the use of exists and in, usually using exists is higher than in efficiency, because in does not walk the index, but depends on the actual use: in the appearance of large and small inside the case, exists suitable for the appearance of small and inner table large case.
Read the full text
Differences between exists and in in SQL statements