標籤:postgresql 效能 sql
有兩張表T1,T2,表結構和資料如下:
create table t1 ( id int);create table t2 ( id int, finished int);insert into t1 values (1);insert into t1 values (2);insert into t1 values (3);insert into t1 values (4);insert into t1 values (5);insert into t1 values (6);insert into t2 values (1,1);insert into t2 values (2,0);insert into t2 values (3,1);insert into t2 values (4,0);
SQL Fiddle 可以測試SQL語句的執行。
想要實現T1中存在前提下,T2中不存在或者finished=0,也就是查詢結果是:2, 4, 5, 6.
一、效能測試:
1. not in:
explain (analyze,verbose,costs,buffers) select ID from T1 where ID not in (select ID from T2 where finished=1);
Total runtime: 0.128 ms
2. not exists:
explain (analyze,verbose,costs,buffers) select ID from T1 where not exists (select 1 from T2 where T1.ID=T2.ID and T2.finished=1);
Total runtime: 0.105 ms
3. left join:
explain (analyze,verbose,costs,buffers) select T1.ID from T1 left join T2 on T1.ID=T2.ID and T2.finished=1 where T2.ID is null;
Total runtime: 0.096 ms
4. 網上還看到一種更快方法,但測試下來此方法不對,所以不討論:
select ID from T2 where (select count(1) from T1 where T1.ID=T2.ID) = 0; 這條語句查詢結果為空白
因此在postgresql 9.3 上語句執行速度 left join > not exists > not in
當T1和T2表中ID出現null時,not in 語句會有不同的表現,所以推薦總是用not exists 代替 not in.
二、大資料量效能測試:在大量資料的時候,not in有嚴重性能下降的問題,下面是我在i5 2.4GHz MAC pro 13吋上的測試。
department(T1) 為59280條資料,資料長度29字元;dept(T2) 為23633條資料,資料長度29字元。
1. explain analyze select department.id from department where department.id not in (select id from dept where finished=true);
Total runtime: 447073.065 ms
2. explain analyze select department.id from department where not exists (select 1 from dept where department.id=dept.id and finished=true);
Total runtime: 325.732 ms
3. explain analyze select department.id from department left join dept on department.id=dept.id and dept.finished=true where dept.id is null;
Total runtime: 319.869 ms
三、總結:在Postgresql 9.3上:
not in 不僅效能差,而且邏輯可能有問題。
not exists 效能不錯,思考起來比較容易。
left join 效能最好,但總體跟not exists 比也快不了多少,思考稍微有點繞。
下面是一張網上的left join 的圖,但找不到出處,有助於理解 left join 的過程:
Postgresql 資料在一張表中存在,另一張表不滿足完整性的尋找