Oracle in and exists Selection

Source: Internet
Author: User

The business problem can be roughly described as follows: A parent table and a sub-table, the query result is to find the records in the sub-table that do not use the parent table id. In this case, it is estimated that many systems will be involved. Let's take an example:

Table 1: parent table parent

1. ID varchar2 (10) primary key

2. NAME VARCHAR2 (100) NAME

Table 2: child table childen

1. ID VARCHAR2 (10) primary key

2. PID VARCHAR2 (10) primary table Representation

3. NAME VARCHAR2 (100) NAME

The parent table stores the parent table, the child table stores the child table, and then uses the pid to associate with the parent table. The result of the query is to find the father who does not have the child.

Let's take a look at the syntax of the query statement:

Select * from parent where id not in (select pid from childen)

When the sub-table contains 0.5 million records, the query time exceeds 10 seconds, which is far more than one second on the original SQL server. I came up with a solution:

Select * from parent where id in

(Select id from parent minus select pid from childen)

Under normal understanding, this statement should be more time-consuming, but the fact is totally unexpected. This statement is not only fast when the sub-table has a large number of records, in the case of a small amount of data in a sub-table, the speed is also very good, basically completed within 1 second.

Select * from
Where id in (select id from B)

The preceding query uses the in statement. in () is executed only once. It finds all id fields in Table B and caches them. then, check whether the IDs of Table A are equal to those of Table B. If they are equal, add the records of Table A to the result set until all the records of Table A are traversed.
Its query process is similar to the following process:

List resultSet = [];
Array A = (select * from );
Array B = (select id from B );

For (int I = 0; I <A. length; I ++ ){
  For (int j = 0; j <B. length; j ++ ){
     If (A [I]. id = B [j]. id ){
        ResultSet. add (A [I]);
        Break;
     }
  }
}
Return resultSet;

It can be seen that in () is not suitable for large data volumes in Table B because it traverses all data in Table B once.
For example, if Table A has 10000 records and table B has 1000000 records, it is possible to traverse A maximum of 10000*1000000 times, resulting in poor efficiency.
For example, if Table A has 10000 records and table B has 100 records, it is possible to traverse A maximum of 10000*100 times, greatly reducing the number of traversal times and greatly improving the efficiency.

Conclusion: in () is suitable for the case where Table B has less data than table.

Select a. * from A
Where exists (select 1 from B B where a. id = B. id)

The above query uses the exists statement, and exists () will execute. length, it does not cache exists () result set, because exists () result set content is not important, it is important to check whether there is a record in the result set, if yes, return true, if not, false is returned.
Its query process is similar to the following process:

List resultSet = [];
Array A = (select * from)

For (int I = 0; I <A. length; I ++ ){
  If (exists (A [I]. id ){  // Execute select 1 from B B where B. id = a. id to check whether a record exists.
      ResultSet. add (A [I]);
  }
}
Return resultSet;

When table B is larger than table A, exists () is suitable because it does not perform the traversal operation. You only need to perform another query.
For example, if Table A has 10000 records and table B has 1000000 records, exists () will execute 10000 times to determine whether the IDs in Table A are equal to those in table B.
For example, if Table A has 10000 records and table B has 100000000 records, exists () still executes 10000 times because it only executes. length times. The more data in Table B, the more suitable exists.
For another example, if Table A has 10000 records and table B has 100 records, it is better to use in () to traverse 10000*10000 times if exists () is executed 100 times, because in () is compared in memory traversal, and exists () needs to query the database, we all know that the query database consumes a higher performance, and the memory is relatively fast.

Conclusion: exists () is applicable when table B is larger than table.

When the data in Table A is as big as that in Table B, the in and exists efficiency is similar. You can choose one of them.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.