An example of PostgreSQL dataset reduction optimization process

Source: Internet
Author: User

Mr. Huang took an SQL statement and asked me if I could optimize it. After reading it, the data volume is not large and the query is not complex. The record optimization process is as follows:

DB: PostgreSQL 9.1
OS: CentOS 6
Count (d_personal_report_view )~ 9 K entries. The field type involved in the amount is numeric (9, 2 ).

Original SQL:

select distinct c.doctor_name,c.department_name,c.hospital_id,c.hospital_name,c.part_hospital_name,t.* from d_personal_report_view c,(       select dc.part_hospital_id,dc.department_id,dc.doctor_id,     coalesce(sum(coalesce(dc.surgery_amount, '0.0')), '0.0') as surgery_amount,sum(coalesce(dc.surgery_quantity, 0)) as surgery_quantity,     coalesce(sum(coalesce(dc.repair_amount, '0.0')), '0.0') as repair_amount,sum(coalesce(dc.repair_quantity, 0)) as repair_quantity,       coalesce(sum(coalesce(dc.orthod_amount, '0.0')), '0.0') as orthod_amount,sum(coalesce(dc.orthod_quantity, 0)) as orthod_quantity,       coalesce(sum(coalesce(dc.radiation_amount, '0.0')), '0.0') as radiation_amount,sum(coalesce(dc.radiation_quantity, 0)) as radiation_quantity,       coalesce(sum(coalesce(dc.teethcln_amount, '0.0')), '0.0') as teethcln_amount,sum(coalesce(dc.teethcln_quantity, 0)) as teethcln_quantity,       coalesce(sum(coalesce(dc.crop_amount, '0.0')), '0.0') as crop_amount,sum(coalesce(dc.crop_quantity, 0)) as crop_quantity,       coalesce(sum(coalesce(dc.assay_amount, '0.0')), '0.0') as assay_amount,sum(coalesce(dc.assay_quantity, 0)) as assay_quantity,       coalesce(sum(coalesce(dc.drugs_amount, '0.0')), '0.0') as drugs_amount,sum(coalesce(dc.drugs_quantity, 0)) as drugs_quantity,       coalesce(sum(coalesce(dc.regist_amount, '0.0')), '0.0') as regist_amount,sum(coalesce(dc.regist_quantity, 0)) as regist_quantity,       coalesce(sum(coalesce(dc.others_amount, '0.0')), '0.0') as others_amount,sum(coalesce(dc.others_quantity, 0)) as others_quantity,       coalesce(sum(surgery_amount + repair_amount + orthod_amount + radiation_amount + teethcln_amount + crop_amount + assay_amount + drugs_amount + regist_amount + others_amount), '0.0') as totalRowAmount,       coalesce(sum(surgery_quantity + repair_quantity + orthod_quantity + radiation_quantity + teethcln_quantity + crop_quantity + assay_quantity + drugs_quantity + regist_quantity + others_quantity), 0) as totalRowQuantity from d_personal_report_view dcwhere 1=1 group by dc.part_hospital_id,dc.department_id,dc.doctor_id order by dc.part_hospital_id,dc.doctor_idlimit 10 offset 0) as t   where c.part_hospital_id=t.part_hospital_id and c.department_id=t.department_id and c.doctor_id=t.doctor_id order by t.part_hospital_id,t.department_id,t.doctor_id;

I. Analysis
The results show that 10 different data records are used for report presentation. The same table is used for two associations without any query conditions, and the index is not very effective. Analyze this SQL statement first.


Ii. Optimization process
1. Cancel the association between the two tables and obtain the optimized SQL statement only once:

select  dc.doctor_name,dc.department_name,dc.hospital_name,dc.part_hospital_name,dc.part_hospital_id,dc.department_id,dc.doctor_id,       coalesce(sum(coalesce(dc.surgery_amount, '0.0')), '0.0') as surgery_amount,sum(coalesce(dc.surgery_quantity, 0)) as surgery_quantity,       coalesce(sum(coalesce(dc.repair_amount, '0.0')), '0.0') as repair_amount,sum(coalesce(dc.repair_quantity, 0)) as repair_quantity,       coalesce(sum(coalesce(dc.orthod_amount, '0.0')), '0.0') as orthod_amount,sum(coalesce(dc.orthod_quantity, 0)) as orthod_quantity,       coalesce(sum(coalesce(dc.radiation_amount, '0.0')), '0.0') as radiation_amount,sum(coalesce(dc.radiation_quantity, 0)) as radiation_quantity,       coalesce(sum(coalesce(dc.teethcln_amount, '0.0')), '0.0') as teethcln_amount,sum(coalesce(dc.teethcln_quantity, 0)) as teethcln_quantity,       coalesce(sum(coalesce(dc.crop_amount, '0.0')), '0.0') as crop_amount,sum(coalesce(dc.crop_quantity, 0)) as crop_quantity,       coalesce(sum(coalesce(dc.assay_amount, '0.0')), '0.0') as assay_amount,sum(coalesce(dc.assay_quantity, 0)) as assay_quantity,       coalesce(sum(coalesce(dc.drugs_amount, '0.0')), '0.0') as drugs_amount,sum(coalesce(dc.drugs_quantity, 0)) as drugs_quantity,       coalesce(sum(coalesce(dc.regist_amount, '0.0')), '0.0') as regist_amount,sum(coalesce(dc.regist_quantity, 0)) as regist_quantity,       coalesce(sum(coalesce(dc.others_amount, '0.0')), '0.0') as others_amount,sum(coalesce(dc.others_quantity, 0)) as others_quantity,       coalesce(sum(surgery_amount + repair_amount + orthod_amount + radiation_amount + teethcln_amount + crop_amount + assay_amount + drugs_amount + regist_amount + others_amount), '0.0') as totalRowAmount,       coalesce(sum(surgery_quantity + repair_quantity + orthod_quantity + radiation_quantity + teethcln_quantity + crop_quantity + assay_quantity + drugs_quantity + regist_quantity + others_quantity), 0) as totalRowQuantity from d_personal_report_view dcwhere 1=1 group by dc.doctor_name,dc.department_name,dc.hospital_name,dc.part_hospital_name,dc.part_hospital_id,dc.department_id,dc.doctor_id order by dc.part_hospital_id,dc.doctor_idlimit 10 offset 0

View the execution plan:


2. the cost in the execution plan is still a little high. When sum in postgresql is optimized again, the null value is automatically replaced with 0, and the null values such as ''or'' are not saved, therefore, you can remove the coalesce conversion function.
Optimized SQL:

select  dc.doctor_name,dc.department_name,dc.hospital_name,dc.part_hospital_name,dc.part_hospital_id,dc.department_id,dc.doctor_id,sum(dc.surgery_amount) as surgery_amount,sum(dc.surgery_quantity) as surgery_quantity,sum(dc.repair_amount) as repair_amount,sum(dc.repair_quantity) as repair_quantity,sum(dc.orthod_amount) as orthod_amount,sum(dc.orthod_quantity) as orthod_quantity,sum(dc.radiation_amount) as radiation_amount,sum(dc.radiation_quantity) as radiation_quantity,sum(dc.teethcln_amount) as teethcln_amount,sum(dc.teethcln_quantity) as teethcln_quantity,sum(dc.crop_amount) as crop_amount,sum(dc.crop_quantity) as crop_quantity,sum(dc.assay_amount) as assay_amount,sum(dc.assay_quantity) as assay_quantity,sum(dc.drugs_amount) as drugs_amount,sum(dc.drugs_quantity) as drugs_quantity,sum(dc.regist_amount) as regist_amount,sum(dc.regist_quantity) as regist_quantity,sum(dc.others_amount) as others_amount,sum(dc.others_quantity) as others_quantity,sum(surgery_amount + repair_amount + orthod_amount + radiation_amount + teethcln_amount + crop_amount + assay_amount + drugs_amount + regist_amount + others_amount) as totalRowAmount,sum(surgery_quantity + repair_quantity + orthod_quantity + radiation_quantity + teethcln_quantity + crop_quantity + assay_quantity + drugs_quantity + regist_quantity + others_quantity) as totalRowQuantity from d_personal_report_view dcwhere 1=1 group by dc.doctor_name,dc.department_name,dc.hospital_name,dc.part_hospital_name,dc.part_hospital_id,dc.department_id,dc.doctor_id order by dc.part_hospital_id,dc.doctor_idlimit 10 offset 0

View the execution plan:

The results are the same. It also shows that PostgreSQL's Internal Conversion Function consumes a lot.

3. to narrow down the query base and improve the query performance, all the query statements before the full table scan the basic table and sort and group by rules, and finally obtain 10 results, in fact, a certain amount of data can be obtained from another perspective, without the need for a full table, but it can generally meet the different data results, and then take 10 more.

Optimized SQL

select dc.doctor_name,dc.department_name,dc.hospital_name,dc.part_hospital_name,dc.part_hospital_id,dc.department_id,dc.doctor_id,sum(dc.surgery_amount) as surgery_amount,sum(dc.surgery_quantity) as surgery_quantity,sum(dc.repair_amount) as repair_amount,sum(dc.repair_quantity) as repair_quantity,sum(dc.orthod_amount) as orthod_amount,sum(dc.orthod_quantity) as orthod_quantity,sum(dc.radiation_amount) as radiation_amount,sum(dc.radiation_quantity) as radiation_quantity,sum(dc.teethcln_amount) as teethcln_amount,sum(dc.teethcln_quantity) as teethcln_quantity,sum(dc.crop_amount) as crop_amount,sum(dc.crop_quantity) as crop_quantity,sum(dc.assay_amount) as assay_amount,sum(dc.assay_quantity) as assay_quantity,sum(dc.drugs_amount) as drugs_amount,sum(dc.drugs_quantity) as drugs_quantity,sum(dc.regist_amount) as regist_amount,sum(dc.regist_quantity) as regist_quantity,sum(dc.others_amount) as others_amount,sum(dc.others_quantity) as others_quantity,sum(surgery_amount + repair_amount + orthod_amount + radiation_amount + teethcln_amount + crop_amount + assay_amount + drugs_amount + regist_amount + others_amount) as totalRowAmount,sum(surgery_quantity + repair_quantity + orthod_quantity + radiation_quantity + teethcln_quantity + crop_quantity + assay_quantity + drugs_quantity + regist_quantity + others_quantity) as totalRowQuantity from (select * from d_personal_report_view limit 1000) as dcgroup by dc.doctor_name,dc.department_name,dc.hospital_name,dc.part_hospital_name,dc.part_hospital_id,dc.department_id,dc.doctor_id order by dc.part_hospital_id,dc.doctor_idlimit 10 offset 0

Now let's take a look at the execution plan: the effect has been significantly improved, especially when the basic table expands rapidly, the response time is much faster.

This method is also applicable to other databases, but must match the business results.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.