First, the problem description
When the number of records for a table in PostgreSQL recently increased from million to 1 million (design capability is 100 million), the query performance of a multi-table associated view built on top of the table was dramatically slower (from about 10ms to 100s). After analyzing the query plan, it is found that the bottleneck is that the sequencing takes a long time, and what are the incentives for sequencing? After removing a few other factors, it is found that the function is used in the view definition (non-intrinsic function, which is hereby stated). Performance is well improved when you change a custom function to an equivalent subquery or join query.
Because the actual tables, view, functions are too complex to describe and reproduce the problem of the key, the following table, view, functions, etc. are simplified.
Second, the test object
Create three Test tables:
create table t1
(
id int not null primary key,
name varchar(20)
); create table t2
(
id int not null primary key,
name varchar(20)
); create table t3
(
id int not null primary key,
name varchar(20)
);
Recreate the function, the view that contains it, the equivalent view as a comparison:
create or replace function f1(p_id in int)
returns int as $$
declare v_name varchar(40);
begin
select name3 into v_name from t3 where id=p_id;
if v_name is null then return 0; else return length(v_name); end if;
end;
$$ language plpgsql;
create view v1 as
select t1.id id, t1.name name1, t2.name name2, f1(t1.id) length
from t1 left join t2 on t2.id=t1.id;
CREATE VIEW v2 as
Select T1.id ID, t1.name1 name1, T2.name2 name2,
Coalesce (Length (T3.name3), 0) length
From T1 left joins T2 on T2.id=t1.id left joins T3 on T3.id=t1.id;
Finally, insert 8,000 records into table T1 with ID 1: The value of the 8000,name does not matter; Copy some of the data separately to table T2, T3:
insert into t2 select * from t1 where t1.id%2=0;
insert into t3 select * from t1 where t1.id%3=0;
III. Inspection of the implementation plan
After all the preparations have been completed, check the execution plan for sorting queries on two view:
Obviously, the sort time is the big head, and the V1 on the sort is almost 3 times times the V2.
Further testing, if added limit ... offset (often used with order by for paging), the gap between the two view is more pronounced:
If you remove the sort operation, the V1 query will take more time than V2, but it is almost an order of magnitude less than before. Confined to space, this place does not map.
The number of records in this test is less than million, you can see the gap between the two views, interested peers can self-verify the table record number in the 100,000 and more of the situation.
Iv. Conclusion
Based on this simple test, combined with the situation in practice (sorry not to give details), the following conclusions are drawn:
- PostgreSQL, if the Multi-table Association view in the package uses a function, as far as possible to modify as a subquery or connection query;
- For the sort operation, the performance of the view with the function is much lower than the equivalent subquery or the view of the connection query, such as the paging operation, the performance gap is greater;
- In practice, when the number of main table records reaches a certain number (the critical value is unknown), the order of view with function can not even use the index to go full table scan, which can lead to a sharp increase in time;
- In previous practice, the differences between the two view types in Oracle are not very obvious.
PostgreSQL performance issues when sorting with multi-table associated views of functions