Calculates the accumulated value, the de-value, from a table group by:
Set parallelism for efficiency: set Hive.exec.parallel=true (optional: Set hive.exec.parallel.thread.number=16), set hive.groupby.skewindata=true , set Hive.map.aggr=true
SelectPlat, PageType,Count (*)PvCount(distinctUserKey) UV fromClient_pv_formwhereDt= '2015-08-19' Group byPlat, PageTypeUnion AllSelectPlat' All'PageTypeCount (*)PvCount(distinctUserKey) UV fromClient_pv_formwhereDt= '2015-08-19' Group byPlatUnion AllSelect ' All'Plat, PageType,Count (*)PvCount(distinctUserKey) UV fromClient_pv_formwhereDt= '2015-08-19' Group byPageTypeUnion AllSelect ' All'Plat' All'PageTypeCount (*)PvCount(distinctUserKey) UV fromClient_pv_formwhereDt= '2015-08-19'
Bad is bad in: Set Hive.map.aggr=true,map end aggregation settings;
The PV number out is not the same as the true value;
Change the code below to run correctly;
SelectPlat, PageType,sum(1) PV,Count(distinctUserKey) UV fromClient_pv_formwhereDt= '2015-08-19' Group byPlat, PageTypeUnion AllSelectPlat' All'PageTypesum(1) PV,Count(distinctUserKey) UV fromClient_pv_formwhereDt= '2015-08-19' Group byPlatUnion AllSelect ' All'Plat, PageType,sum(1) PV,Count(distinctUserKey) UV fromClient_pv_formwhereDt= '2015-08-19' Group byPageTypeUnion AllSelect ' All'Plat' All'PageTypesum(1) PV,Count(distinctUserKey) UV fromClient_pv_formwhereDt= '2015-08-19'
Statistics PV Data error when set Hive.map.aggr=true