MySQL skillfully uses sum,case...when ... Optimizing Statistical queries

Source: Internet
Author: User

Recently in the company to do projects, involving the development of statistical reports related to the task, due to the relatively large amount of data, the query before writing query 500,000 data about 10 seconds to look like, later through the boss of the guidance of the use of sum,case...when ... Rewriting SQL performance suddenly improved to a second to solve. In order to explain the problem and solve it concisely, I simplify the requirement model.

Now the database has an order table (a simplified intermediate table) with the following table structure:

CREATE TABLE `statistic_order` (  `oid` bigint(20) NOT NULL,  `o_source` varchar(25) DEFAULT NULL COMMENT ‘来源编号‘,  `o_actno` varchar(30) DEFAULT NULL COMMENT ‘活动编号‘,  `o_actname` varchar(100) DEFAULT NULL COMMENT ‘参与活动名称‘,  `o_n_channel` int(2) DEFAULT NULL COMMENT ‘商城平台‘,  `o_clue` varchar(25) DEFAULT NULL COMMENT ‘线索分类‘,  `o_star_level` varchar(25) DEFAULT NULL COMMENT ‘订单星级‘,  `o_saledep` varchar(30) DEFAULT NULL COMMENT ‘营销部‘,  `o_style` varchar(30) DEFAULT NULL COMMENT ‘车型‘,  `o_status` int(2) DEFAULT NULL COMMENT ‘订单状态‘,  `syctime_day` varchar(15) DEFAULT NULL COMMENT ‘按天格式化日期‘,  PRIMARY KEY (`oid`)) ENGINE=InnoDB DEFAULT CHARSET=utf8
The project needs are this:

Counts the number of source numbers per day within a time range, where the source number corresponds to the O_source field in the datasheet and the field value may be cde,sde,pde,cse,sse.

Source classification flows over time

In the beginning, I wrote this sql:

select S.syctime_day,  (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = ‘CDE‘) as ‘CDE‘,  (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = ‘CDE‘) as ‘SDE‘,  (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = ‘CDE‘) as ‘PDE‘,  (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = ‘CDE‘) as ‘CSE‘,  (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = ‘CDE‘) as ‘SSE‘ from statistic_order S where S.syctime_day > ‘2016-05-01‘ and S.syctime_day < ‘2016-08-01‘  GROUP BY S.syctime_day order by S.syctime_day asc;

This method uses a subquery, in the case of no index, 550,000 data execution of this SQL, in the workbench for nearly 10 minutes, and finally reported a connection interruption, through the explain interpreter can see the SQL execution plan is as follows:

Each query has a full table scan, five sub-query dependent subquery Description depends on the external query, this query mechanism is to make an external query, query out the group by the date results, and then the subquery respectively query the corresponding date of the number of CDE,SDE, Its efficiency is conceivable.

After adding indexes on O_source and syctime_day, the efficiency is improved a lot, and the results are queried in about five seconds :


View execution Plan Discovery the number of rows scanned has decreased a lot and no full table scan has been performed:

This is certainly not fast enough, if the amount of data reached millions, the query speed is certainly not tolerated. have been thinking whether there is a way, can directly traverse the query all the results, similar to traversing the list in Java, a condition is counted once, so that a full table scan can query the result set, the result index, the efficiency should be very high. Under the guidance of the boss, using the sum aggregation function, plus case...when...then ... This "strange" use, effectively solves the problem.
The specific SQL is as follows:

 select S.syctime_day,   sum(case when S.o_source = ‘CDE‘ then 1 else 0 end) as ‘CDE‘,   sum(case when S.o_source = ‘SDE‘ then 1 else 0 end) as ‘SDE‘,   sum(case when S.o_source = ‘PDE‘ then 1 else 0 end) as ‘PDE‘,   sum(case when S.o_source = ‘CSE‘ then 1 else 0 end) as ‘CSE‘,   sum(case when S.o_source = ‘SSE‘ then 1 else 0 end) as ‘SSE‘ from statistic_order S where S.syctime_day > ‘2015-05-01‘ and S.syctime_day < ‘2016-08-01‘  GROUP BY S.syctime_day order by S.syctime_day asc;

The use of Case...when...then in MySQL does not explain too much, this SQL is very easy to understand, first a record to traverse, group by on the date classification, sum aggregate function for a date value summed, the focus is on case ... When...then the sum of sums cleverly added the condition, when o_source = ' CDE ', the count is 1, otherwise 0; when o_source= ' SDE ' ...
The execution of this statement takes more than one second, and it is ideal for statistics of such a dimension for more than 500,000 of data.

The execution plan found that, although the number of rows scanned is more, but only one full table scan, and simple query, so the efficiency is naturally higher:

MySQL skillfully uses sum,case...when ... Optimizing Statistical Queries

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.