MySQL skillfully uses sum,case...when ... Optimizing Statistical queries

Last Update:2017-12-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently in the company to do projects, involving the development of statistical reports related to the task, due to the relatively large amount of data, the query before writing query 500,000 data about 10 seconds to look like, later through the boss of the guidance of the use of sum,case...when ... Rewriting SQL performance suddenly improved to a second to solve. In order to explain the problem and solve it concisely, I simplify the requirement model.

Now the database has an order table (a simplified intermediate table) with the following table structure:

CREATE TABLE `statistic_order` (  `oid` bigint(20) NOT NULL,  `o_source` varchar(25) DEFAULT NULL COMMENT ‘来源编号‘,  `o_actno` varchar(30) DEFAULT NULL COMMENT ‘活动编号‘,  `o_actname` varchar(100) DEFAULT NULL COMMENT ‘参与活动名称‘,  `o_n_channel` int(2) DEFAULT NULL COMMENT ‘商城平台‘,  `o_clue` varchar(25) DEFAULT NULL COMMENT ‘线索分类‘,  `o_star_level` varchar(25) DEFAULT NULL COMMENT ‘订单星级‘,  `o_saledep` varchar(30) DEFAULT NULL COMMENT ‘营销部‘,  `o_style` varchar(30) DEFAULT NULL COMMENT ‘车型‘,  `o_status` int(2) DEFAULT NULL COMMENT ‘订单状态‘,  `syctime_day` varchar(15) DEFAULT NULL COMMENT ‘按天格式化日期‘,  PRIMARY KEY (`oid`)) ENGINE=InnoDB DEFAULT CHARSET=utf8

The project needs are this:

Counts the number of source numbers per day within a time range, where the source number corresponds to the O_source field in the datasheet and the field value may be cde,sde,pde,cse,sse.

Source classification flows over time

In the beginning, I wrote this sql:

select S.syctime_day,  (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = ‘CDE‘) as ‘CDE‘,  (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = ‘CDE‘) as ‘SDE‘,  (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = ‘CDE‘) as ‘PDE‘,  (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = ‘CDE‘) as ‘CSE‘,  (select count(*) from statistic_order SS where SS.syctime_day = S.syctime_day and SS.o_source = ‘CDE‘) as ‘SSE‘ from statistic_order S where S.syctime_day > ‘2016-05-01‘ and S.syctime_day < ‘2016-08-01‘  GROUP BY S.syctime_day order by S.syctime_day asc;

This method uses a subquery, in the case of no index, 550,000 data execution of this SQL, in the workbench for nearly 10 minutes, and finally reported a connection interruption, through the explain interpreter can see the SQL execution plan is as follows:

Each query has a full table scan, five sub-query dependent subquery Description depends on the external query, this query mechanism is to make an external query, query out the group by the date results, and then the subquery respectively query the corresponding date of the number of CDE,SDE, Its efficiency is conceivable.

After adding indexes on O_source and syctime_day, the efficiency is improved a lot, and the results are queried in about five seconds :

View execution Plan Discovery the number of rows scanned has decreased a lot and no full table scan has been performed:

This is certainly not fast enough, if the amount of data reached millions, the query speed is certainly not tolerated. have been thinking whether there is a way, can directly traverse the query all the results, similar to traversing the list in Java, a condition is counted once, so that a full table scan can query the result set, the result index, the efficiency should be very high. Under the guidance of the boss, using the sum aggregation function, plus case...when...then ... This "strange" use, effectively solves the problem.
The specific SQL is as follows:

 select S.syctime_day,   sum(case when S.o_source = ‘CDE‘ then 1 else 0 end) as ‘CDE‘,   sum(case when S.o_source = ‘SDE‘ then 1 else 0 end) as ‘SDE‘,   sum(case when S.o_source = ‘PDE‘ then 1 else 0 end) as ‘PDE‘,   sum(case when S.o_source = ‘CSE‘ then 1 else 0 end) as ‘CSE‘,   sum(case when S.o_source = ‘SSE‘ then 1 else 0 end) as ‘SSE‘ from statistic_order S where S.syctime_day > ‘2015-05-01‘ and S.syctime_day < ‘2016-08-01‘  GROUP BY S.syctime_day order by S.syctime_day asc;

The use of Case...when...then in MySQL does not explain too much, this SQL is very easy to understand, first a record to traverse, group by on the date classification, sum aggregate function for a date value summed, the focus is on case ... When...then the sum of sums cleverly added the condition, when o_source = ' CDE ', the count is 1, otherwise 0; when o_source= ' SDE ' ...
The execution of this statement takes more than one second, and it is ideal for statistics of such a dimension for more than 500,000 of data.

The execution plan found that, although the number of rows scanned is more, but only one full table scan, and simple query, so the efficiency is naturally higher:

MySQL skillfully uses sum,case...when ... Optimizing Statistical Queries

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MySQL skillfully uses sum,case...when ... Optimizing Statistical queries

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MySQL skillfully uses sum,case...when ... Optimizing Statistical queries

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support