Oracle in and exists Selection

Last Update:2014-06-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The business problem can be roughly described as follows: A parent table and a sub-table, the query result is to find the records in the sub-table that do not use the parent table id. In this case, it is estimated that many systems will be involved. Let's take an example:

Table 1: parent table parent

1. ID varchar2 (10) primary key

2. NAME VARCHAR2 (100) NAME

Table 2: child table childen

1. ID VARCHAR2 (10) primary key

2. PID VARCHAR2 (10) primary table Representation

3. NAME VARCHAR2 (100) NAME

The parent table stores the parent table, the child table stores the child table, and then uses the pid to associate with the parent table. The result of the query is to find the father who does not have the child.

Let's take a look at the syntax of the query statement:

Select * from parent where id not in (select pid from childen)

When the sub-table contains 0.5 million records, the query time exceeds 10 seconds, which is far more than one second on the original SQL server. I came up with a solution:

Select * from parent where id in

(Select id from parent minus select pid from childen)

Under normal understanding, this statement should be more time-consuming, but the fact is totally unexpected. This statement is not only fast when the sub-table has a large number of records, in the case of a small amount of data in a sub-table, the speed is also very good, basically completed within 1 second.

Select * from
Where id in (select id from B)

The preceding query uses the in statement. in () is executed only once. It finds all id fields in Table B and caches them. then, check whether the IDs of Table A are equal to those of Table B. If they are equal, add the records of Table A to the result set until all the records of Table A are traversed.
Its query process is similar to the following process:

List resultSet = [];
Array A = (select * from );
Array B = (select id from B );

For (int I = 0; I <A. length; I ++ ){
  For (int j = 0; j <B. length; j ++ ){
     If (A [I]. id = B [j]. id ){
        ResultSet. add (A [I]);
        Break;
     }
  }
}
Return resultSet;

It can be seen that in () is not suitable for large data volumes in Table B because it traverses all data in Table B once.
For example, if Table A has 10000 records and table B has 1000000 records, it is possible to traverse A maximum of 10000*1000000 times, resulting in poor efficiency.
For example, if Table A has 10000 records and table B has 100 records, it is possible to traverse A maximum of 10000*100 times, greatly reducing the number of traversal times and greatly improving the efficiency.

Conclusion: in () is suitable for the case where Table B has less data than table.

Select a. * from A
Where exists (select 1 from B B where a. id = B. id)

The above query uses the exists statement, and exists () will execute. length, it does not cache exists () result set, because exists () result set content is not important, it is important to check whether there is a record in the result set, if yes, return true, if not, false is returned.
Its query process is similar to the following process:

List resultSet = [];
Array A = (select * from)

For (int I = 0; I <A. length; I ++ ){
  If (exists (A [I]. id ){  // Execute select 1 from B B where B. id = a. id to check whether a record exists.
      ResultSet. add (A [I]);
  }
}
Return resultSet;

When table B is larger than table A, exists () is suitable because it does not perform the traversal operation. You only need to perform another query.
For example, if Table A has 10000 records and table B has 1000000 records, exists () will execute 10000 times to determine whether the IDs in Table A are equal to those in table B.
For example, if Table A has 10000 records and table B has 100000000 records, exists () still executes 10000 times because it only executes. length times. The more data in Table B, the more suitable exists.
For another example, if Table A has 10000 records and table B has 100 records, it is better to use in () to traverse 10000*10000 times if exists () is executed 100 times, because in () is compared in memory traversal, and exists () needs to query the database, we all know that the query database consumes a higher performance, and the memory is relatively fast.

Conclusion: exists () is applicable when table B is larger than table.

When the data in Table A is as big as that in Table B, the in and exists efficiency is similar. You can choose one of them.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Oracle in and exists Selection

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Oracle in and exists Selection

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support