Postgresql data exists in one table, and the other table does not meet the integrity search requirements.

Source: Internet
Author: User

Postgresql data exists in one table, and the other table does not meet the integrity search requirements.
There are two tables T1 and T2. The table structure and data are as follows:

create table t1 (  id int);create table t2 (  id int,  finished int);insert into t1 values (1);insert into t1 values (2);insert into t1 values (3);insert into t1 values (4);insert into t1 values (5);insert into t1 values (6);insert into t2 values (1,1);insert into t2 values (2,0);insert into t2 values (3,1);insert into t2 values (4,0);
SQL Fiddle can test the execution of SQL statements.
If you want to implement existence in T1, T2 does not exist or finished = 0, that is, the query result is: 2, 4, 5, 6.

1. Performance Testing: 1. not in:
explain (analyze,verbose,costs,buffers) select ID from T1 where ID not in (select ID from T2 where finished=1);
Total runtime: 0.128 MS

2. not exists:
explain (analyze,verbose,costs,buffers) select ID from T1 where not exists (select 1 from T2 where T1.ID=T2.ID and T2.finished=1);
Total runtime: 0.105 MS

3. left join:
explain (analyze,verbose,costs,buffers) select T1.ID from T1 left join T2 on T1.ID=T2.ID and T2.finished=1 where T2.ID is null;
Total runtime: 0.096 MS

4. There is also a faster method on the Internet, but this method is not correct after testing, so we will not discuss it:
Select ID from T2 where (select count (1) from T1 where T1.ID = T2.ID) = 0; the query result of this statement is blank


Therefore, on postgresql 9.3, the statement execution speed is left join> not exists> not in.
When the ID in Table T1 and table T2 is null, the not in statement may be different. Therefore, we recommend that you use not exists instead of not in.



Ii. Performance Testing of large data volumes: when there is a large amount of data, not in has a serious performance degradation problem. below is my test on i5 2.4 GHz MAC pro 13.
Department (T1) contains 59280 data records and 29 characters. dept (T2) contains 23633 data records and 29 characters.

1. explain analyze select department. id from department where department. id not in (select id from dept where finished = true );
Total runtime: 447073.065 MS

2. explain analyze select department. id from department where not exists (select 1 from dept where department. id = dept. id and finished = true );
Total runtime: 325.732 MS

3. explain analyze select department. id from department left join dept on department. id = dept. id and dept. finished = true where dept. id is null;
Total runtime: 319.869 MS


Iii. Conclusion: On Postgresql 9.3:
Not in not only has poor performance, but may also have problems with logic.
Not exists has good performance and is easy to think about.
The performance of left join is the best, but the overall comparison with not exists is not much faster.


The following is an online left join diagram, but the source cannot be found, which helps you understand the left join process:


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.