Problem solving: Slow double count (distinct) in hive

Source: Internet
Author: User

The double count (distinct) here refers to a statement similar to the following


Select Day,count (Distinct session_id), COUNT (distinct user_id) from log a group by;

If you want to execute such a statement, you must set the parameters: set hive.groupby.skewindata=true;


We can solve the problem with the idea of "space Change Time":

Select Day,
count (case if type= ' session ' then 1 else null end) as SESSION_CNT, 
count (case is type= ' user ' then 1 else null end) as user_cnt from
(
	select Day,session_id,type from (
		select day,session_id, ' Session ' as type F ROM Log
		UNION ALL
		-Select Day user_id, ' user ' as-type from
		log
	]
	GROUP by Day,session_id,type) t1
 
  group by day
 

The Type field here is completely self-defined, the purpose is to pass the extra space, the "check value", "Go to Weight", "add 1" operation scattered into different Mr Tasks, to speed up the effect.

Note that the number of values in the type is consistent with several counts (distinct) in the original statement, and the number of session_id, user_id, is OK.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.