In is the appearance and the inner table as a hash connection, and exists is the external loop loop, each loop loop and then query the internal table.
NOT exists: Do NL, subquery first, there is a virtual table, there is a definite value, so even if the subquery has null eventually there is a value returned
Not in: Do hash, set up a memory array on the subquery table, match with the appearance, if the subquery has null, then the appearance of the match will not return the final value.
The assertion that exists is more efficient than in is inaccurate.
If the two table size of the query is equal, then the in and exists are not very different.
If one of the two tables is smaller and one is a large table, then the subquery table is large with exists, and the subquery table is small in:
Example: Table A (small table), table B (large table) 1:
SELECT * from A where CC in (select CC from B) is inefficient and uses the index of the CC column on table A;
SELECT * from A where exists (select cc from B where cc=a.cc) is efficient and uses the index of the CC column on table B.
The opposite 2:
SELECT * from B where cc in (select CC from A) is highly efficient and uses the index of the CC column on table B;
SELECT * from B where exists (select cc from A where cc=b.cc) is inefficient and uses the index of the CC column on table A.
Not in and not exists
If the query statement uses not-in so that the outer surface of the full table is scanned, the index is not used, and the subquery of not extsts can still use the index on the table. So no matter the table is large, using not exists is faster than not.
Always hear is said to use exists do not use in, because exists only to determine the existence and in need to compare values, so exists relatively fast, but looked at some things on the internet only to find that this is not the case at all. The following paragraph is copied.
SELECT * from T1 the procedure performed by the Where x in (select Y from T2) is equivalent to: SELECT *
from T1, (select distinct y from T2) t2 where t1.x = t2.y;
The process of select * from T1 where exists (select null from t2 where y = x) is equivalent to the following:
For x in (SELECT * from T1) loop
if (exists (select null from t2 where y = x.x) Then
OUTPUT The RECORD End If End loop
From my point of view, in the way more intuitive, exists is a bit around, and in can be used for various subqueries, and exists seems to only be used to correlate subqueries (other subqueries of course can also be used, but meaningless).
Because exists is the way of loop, so, the number of cycles for the exists the most impact, so, the appearance to record a few, the inner table does not matter, and in with the hash join, so the inner table if small, the entire scope of the query will be very small, if the table is large, If the appearance is also very big is very slow, this time exists really will be faster than in the way.
Not in and not exists
If the query statement uses not-in so that the outer surface of the full table is scanned, the index is not used, and the subquery of not extsts can still use the index on the table. So no matter the table is large, using not exists is faster than not.
In other words, in and exists need specific case analysis, not in and not exists do not have to analyze, try to use not exists just fine.
There are 3 typical connection types:
Sort-Merge connection (sort merge Join (SMJ)) nested loop (Nested Loops (NL)) hash join (hash join)
The algorithms for nested loops and hash joins are still different, and in theory hash joins are faster than the sort and NL, although the reality is much more complicated than in theory, but there are differences.
1 correlated subqueries and non-correlated subqueries
The associated subquery requires an internal reference to the external table, and not an associated subquery to reference the external table. For records processed in a parent query, an associated subquery is computed once per row, whereas a non-associative subquery is executed only once, and the result set is kept in memory (if the result set is small) or in an Oracle temporary data segment If the result set is larger. A "scalar" subquery is a non-associative subquery that returns a unique record. If the subquery returns only one record, the Oracle optimizer shrinks the result to a constant, and the subquery executes only once. /*select * from EMP where deptno in (select Deptno from dept where dept_name= ' admin '); */2. How do I choose? Based on the external query, and the number of records returned by the subquery itself. If the results returned by both queries are the same, which one is more efficient?
Overhead of correlating subqueries: subqueries are executed one time for records returned to the outer query. Therefore, it is important to ensure that all subqueries use the index whenever possible.
The overhead of a non-correlated subquery: The subquery executes only once, and the result set is usually ordered and saved in a temporary data segment, where each record is referenced by the parent query when it returns, and the result sets are sorted back to the overhead of increasing the system if the subquery returns a large number of records.
Therefore: If the parent query returns fewer records, the cost of executing the subquery again is not very large, and if many rows of data are returned, the direct query executes many times. If the subquery returns fewer records, the system overhead of saving the result set for the parent query in memory is not very large, and if the subquery returns more than one row, you need to place the result on a temporary segment and then sort the data segments to serve each record in the negative query.
3 Conclusions: 1) When using an associated subquery, a subquery execution plan that uses an in or EXISTS clause is usually the same
2) The EXISTS clause is not usually appropriate for subqueries
3) When the external query returns relatively few records, the associated subquery executes faster than the non-correlated subquery. 4) If there are only a small number of records in the subquery, the non-correlated subquery executes faster than the associated subquery. 4 Sub-query conversions: subqueries can be converted to standard connection operations 1) use in non-associative subqueries (subqueries unique)
Condition: 1) Data columns that define unique primary keys on the lowest-level data table in the entire hierarchy exist in the select list of the subquery
2) At least one data column that defines a unique primary key is listed in the select list, and the other data columns that define the unique primary key must have the specified equality criteria, whether directly or indirectly specified. 2) associating subqueries using the EXISTS clause
Condition: For related conditions, the subquery can return only one record.
5. Not in and not exists adjustment
1) Not in non-associative subquery: minus clause converted to in notation
2) Not EXISTS correlated subqueries: This type of anti-join operation internally queries each record in an external query, filtering out all records except for the internal data table that does not satisfy the where condition in the subquery. You can override: Specify an external link condition in an equivalent connection, and then add select DISTINCT eg:select distinct ... from a, where a.col1 = B.col1 (+) and b.col1 is null 6. Using all in a subquery
exists and not exists and in usage in Oracle