Efficiency of in and exists, not in and not exists

Source: Internet
Author: User
Tags joins
from the point of view of efficiency:

1 SELECT * from T1 where exists (select 1 from T2 where t1.a=t2.a);

When the amount of T1 data is small and the T2 data is very large, the query efficiency of the T1<<T2 is high at 1.

2) SELECT * from T1 where t1.a in (select t2.a from T2);

T1 data volume is very large and T2 data hours, T1>>T2, 2 of the query efficiency is high.

In short, General: the appearance is big, use in, inside table is big, use exists. How to execute:

By using Exists,oracle, you will first check the main query and then run the subquery until it finds the first match, which saves time. When Oracle executes an in subquery, it executes the subquery first and stores the resulting list in an indexed temporary table. Before executing the subquery, the system suspends the main query before executing the subquery, and then executes the main query after it is stored in the temporary table. This is why the use of exists is faster than in-usual queries. In is a hash connection between the exterior and the inner table, and exists is the loop loop on the outside, every loop loop and then query the table NOT exists: Do NL, subqueries first, there is a virtual table, there is a certain value, so even if the subquery has null eventually also have a value return not in: hash, the query table to create an array of memory, with the appearance of matching, the subquery if there is null that the appearance of the match does not return the final value.

The assertion that exists is more efficient than in is not accurate.

If the query has two tables of equal size, the in and exists are not very different.

If one of the two tables is smaller, one is a large table, the subquery table is large with exists, and the subquery table is small with in:

For example: Table A (small table), table B (large table)
1:
SELECT * from A where cc ' (select cc from B)
Low efficiency, using the index of CC on table A;
SELECT * from A where exists (select cc from B where cc=a.cc)
High efficiency, the index of the CC column on table B is used.

On the contrary
2:
SELECT * from B where CC in (select CC from A)
High efficiency, using the index of CC column on B table;
SELECT * from B where exists (select cc from A where cc=b.cc)
Inefficient, the index of the CC column on table A is used. Not in and not exists

If the query statement uses not in so that the inner surface is scanned all over the table, no index is used;
A subquery for not extsts still uses the index on the table.
So regardless of the size of the table, using not exists is faster than not.
Always hear is said to use exists do not use in, because exists only judge existence and in need contrast value, so exists faster, but look at some things on the internet to find that it is not so.
This is a copy of the following paragraph.
SELECT * from T1 where x ' (select y from T2)
The process of execution is equivalent to:

SELECT * 
  from T1, (select distinct y from T2) t2
 where t1.x = t2.y;

SELECT * from t1 where exists (select null from t2 where y = x)
The process of execution is equivalent to:

For x in (SELECT * to T1)
   Loop
      if (exists (select null from t2 where y = x.x)
      then 
         OUTPUT the Recor D End
      If End
loop

From my point of view, in the way is more intuitive, exists is somewhat around, and in can be used for various subqueries, and exists seems to only be used for association subqueries (other subqueries of course can also use, unfortunately meaningless).
Because exists is the way of loop, so, the number of cycles for the exists most influential, so, the appearance to record few, the inner table does not matter, and in with the hash join, so if the small, the entire query range will be very small, if the table is very large, If the appearance is also very big is very slow, at this time exists really will be faster in the way. Not in and not exists

If the query statement uses not in so that the inner surface is scanned all over the table, no index is used;
A subquery for not extsts still uses the index on the table.
So regardless of the size of the table, using not exists is faster than not.
In other words, in and exists need specific situation specific analysis, not in and not exists without analysis, as far as possible with not exists.

There are 3 typical types of connection:
Sort--Merge joins (sort merge Join (SMJ))
Nested Loops (Nested Loops (NL))
Hash joins (hash join)

There are still different algorithms for nesting loops and hash joins, and in theory hash joins are faster than sorting and NL, of course, the actual situation is much more complicated than the theory, but there are differences between the two.

1 correlated subqueries and unrelated subqueries

The associated subquery needs to refer to the external table internally, not the associated subquery. For records processed in a parent query, an associated subquery is computed once per row, whereas a unassociated subquery executes only once, and the result set is saved in memory (if the result set is small) or in an Oracle temporary data segment (if the result set is larger). A scalar subquery is an unrelated subquery that returns a unique record. If the subquery returns only one record, the Oracle optimizer shrinks the result to a constant, and the subquery executes only once.

/select from EMP where Deptno into (select Deptno from dept where dept_name= ' admin ');

2. How to choose.

Depending on the external query, and the number of records returned by the subquery itself. If both queries return the same results, which is more efficient.

The overhead of an associated subquery: For a record returned to the outer query, the subquery executes once for each time. Therefore, you must ensure that you use indexes whenever possible for subqueries.

System overhead for non-correlated subqueries: Subqueries are executed only once, the result set is usually sorted and stored in temporary data segments, where each record is referenced by the parent query when it returns, and the result set is ordered back to increase the overhead of the system when the subquery returns a large number of records.

So: If the parent query returns fewer records, then the cost of executing the subquery is not very large, and if many rows of data are returned, the direct query executes many times. If the subquery returns fewer records, the system overhead of saving the parent query's result set in memory is not very large, and if the subquery returns multiple rows, you need to place the results on a temporary segment and then sort the data segments to serve each record in the negative query.

3 Conclusion:

1 when using an associated subquery, a subquery execution plan using in or exists clauses is usually the same
2 exists clause is usually not suitable for subqueries
3 When an external query returns relatively few records, the associated subquery executes faster than the unassociated subquery.
4 If there are only a small number of records in the subquery, the unassociated subquery executes faster than the associated subquery.

4 subquery Transformation: Subqueries can be converted to standard connection operations

   1 use in non-correlated subqueries (subquery only)

      conditions: 1 A data column that defines a unique primary key on the lowest-level data table in the hierarchy exists in the select list of the subquery

            2) at least one of the data that defines a unique primary key is listed in the select list, Also, other data columns that define a unique primary key must have the specified equality criteria, whether directly specified or indirectly specified.

   2 The associated subquery using the EXISTS clause

      : For related conditions, the subquery can only return one record.

5. Not in and not exists adjustment

1 not in non-associative subquery: MINUS clause in the in-writing

2) Not EXISTS associated subquery: This type of reverse join operation makes an internal query for each record in the external query, filtering out all records except the internal data table that does not satisfy the where condition in the subquery.

You can override: Specify an external link condition in an equivalent connection, and then add a SELECT distinct

eg:select distinct ... from a,b where a.col1 = B.col1 (+) and b.col1 is null

6. Use all in subqueries

Original Address http://muyue123.blog.sohu.com/146930118.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.