There are two tables t1,t2, the table structure and the data are as follows:
CREATE TABLE T1 ( ID int), CREATE TABLE T2 ( ID int, finished int), insert into T1 values (1); INSERT into T1 val UEs (2); INSERT into T1 values (3), insert into T1 values (4), insert into T1 values (5), insert into T1 values (6); Insert int o T2 values, insert into T2 values (2,0), insert into T2 values (3,1), insert into T2 values (4,0);
SQL fiddle can test the execution of SQL statements.
Want to implement T1 in the premise, T2 does not exist or finished=0, that is, the query results are: 2, 4, 5, 6.
First, performance testing:
1. Not in:
Explain (analyze,verbose,costs,buffers) Select ID from T1 where ID not in (select ID from T2 where finished=1);
Total runtime:0.128 ms
2. Not exists:
Explain (analyze,verbose,costs,buffers) select ID from T1 where isn't exists (select 1 from T2 where T1.id=t2.id and T2.fini shed=1);
Total runtime:0.105 ms
3. Left join:
Explain (analyze,verbose,costs,buffers) select T1.id from T1 left joins T2 on T1.id=t2.id and t2.finished=1 where T2.id is Null
Total runtime:0.096 ms
4. The web also saw a faster method, but this method is not tested, so it is not discussed:
Select ID from T2 where (select COUNT (1) from T1 where t1.id=t2.id) = 0; The query result for this statement is empty
Sostatement execution speed on PostgreSQL 9.3 left join > NOT exists > not in
When NULL is present in the T1 and T2 tables, the not in statement behaves differently, so it is recommended that the not exists be used instead of not.
Second, the big Data volume performance test: In a large amount of data, not in has a serious performance degradation problem, below is my i5 2.4GHz MAC Pro 13-inch test.
Department (T1) is 59,280 data, data length 29 characters, Dept (T2) is 23,633 data, data length is 29 characters.
1. Explain analyze select Department.id from department where department.id not in (SELECT ID from dept where finished=tru e);
Total runtime:447073.065 ms
2. Explain analyze select Department.id from department where isn't exists (select 1 from dept where Department.id=dept.id a nd finished=true);
Total runtime:325.732 ms
3. Explain analyze select Department.id from department left joins dept on Department.id=dept.id and Dept.finished=true whe Re dept.id is null;
Total runtime:319.869 ms
Iii. Summary: on PostgreSQL 9.3:
Not is not only poor performance, but logic may be problematic.
Not exists performance is good, it is easier to think.
Left JOIN performance is the best, but the overall and not exists is not much faster, thinking a little bit around.
The following is a graph of the left join on the web, but the source is not found, which helps to understand the process of the last join:
Postgresql data exists in one table, and another table does not satisfy the integrity of the lookup