What is the equivalent of 1.GROUPING sets in other ways?
2. Which keyword completes the aggregation based on all the combinations of the dimensions of the group by?
What is the relationship between 3.ROLLUP and rollup?
GROUPING Sets,grouping__id,cube,rollup These analytic functions are commonly used in OLAP, not additive, and need to be based on the different dimensions of drilling and drill down metrics statistics, such as the number of hours, days, months of UV.
hive version is apache-hive-0.13.1Data preparation:
2015-03,2015-03-10,cookie1 2015-03,2015-03-10,cookie5 2015-03,2015-03-12,cookie7 2015-04,2015-04-12,cookie3 2015-04,2015-04-13,cookie2 2015-04,2015-04-13,cookie4 2015-04,2015-04-16,cookie4 2015-03,2015-03-10,cookie2 2015-03,2015-03-10,cookie3 2015-04,2015-04-12,cookie5 2015-04,2015-04-13,cookie6 2015-04,2015-04-15,cookie3 20 15-04,2015-04-15,cookie2 2015-04,2015-04-16,cookie1 CREATE EXTERNAL TABLE lxw1234 (Month string, day string, Cookieid STRING) ROW FORMAT delimited fields TERMINATED by ', ' stored as textfile location '/tmp/lxw11/'; Hive> select * from lxw1234; OK 2015-03 2015-03-10 cookie1 2015-03 2015-03-10 cookie5 2015-03 2015-03-12 cookie7 2015-04 201 5-04-12 cookie3 2015-04 2015-04-13 cookie2 2015-04 2015-04-13 cookie4 2015-04 2015-04-16 Cook IE4 2015-03 2015-03-10 cookie2 2015-03 2015-03-10 cookie3 2015-04 2015-04-12 cookie5 2015-04 2015-04-13 cookie6 2015-04 2015-04-15 cookie3 2015-04 2015-04-15 cookie2 2015-04 2015-04- Cookie1
GROUPING Sets
In a group by query, aggregations are based on different dimension combinations, which is equivalent to combining the group by result set of different dimensions with the union ALL
SELECT Month, day , COUNT (DISTINCT Cookieid) as Uvs, grouping__id from lxw1234 GROUP by month, Day GROUPING Sets (Month,day) ORDER by grouping__id; Month Day UV grouping__id ------------------------------------------------ 2015-03 NULL 5 1 2015-04 null 6 1 null 2015-03-10 4 2 Null 2015-03-12 1 2 null 2015-04-12 2 2 null 2015-04-13 3 2 Null 2015-04-15 2 2 null 2015-04-16 2 2 is equivalent to SELECT month,null, COUNT (DISTINCT Cookieid) as uv,1 as grouping__id from lxw1234 GROUP by month UNION all SELECT null,day,count (DIS Tinct Cookieid) as uv,2 as grouping__id from lxw1234 GROUP by day
Again such as:
SELECT month, day, COUNT (DISTINCT Cookieid) as Uvs, grouping__id from lxw1234 GROUP by Month,day G Rouping Sets (Month,day, (Month,day)) ORDER by grouping__id; Month Day UV grouping__id------------------------------------------------2015-03 NU LL 5 1 2015-04 null 6 1 NULL 2015-03-10 4 2 NULL 2015-03-12 1 2 null 2015-04-12 2 2 null 2015-04-13 3 2 NU LL 2015-04-15 2 2 NULL 2015-04-16 2 2 2015-03 2015-03-10 4 3 2015-03 2015-03-12 1 3 2015-04 2015-04-12 2 3 2015-04 2015-04-13 3 3 2015-04 2015-04-15 2 3 2015-04 2015-04-16 2 3 equivalent to SELECT month,null , COUNT (DISTINCT Cookieid) as uv,1 as grouping__id from lxw1234 GROUP by month UNION ALL SELECT Null,day,count (DISTINCT Cookieid) as uv,2 as grouping__id from lxw1234 GROUP by day UNION all S Elect Month,day,count (DISTINCT Cookieid) as uv,3 as grouping__id from lxw1234 GROUP by Month,day
One of the grouping__id, which indicates which grouping collection the result belongs to.
CUBE
Aggregations are based on all the combinations of the dimensions of the group by.
SELECT month, day, COUNT (DISTINCT Cookieid) as Uvs, grouping__id from lxw1234 GROUP by Month,day W ITH CUBE ORDER by grouping__id; Month Day UV grouping__id-------------------------------------------- Null NULL 7 0 2015-03 NULL 5 1 2015-04 NULL 6 1 NULL 2015-04-12 2 2 null 2015-04-13 3 2 NULL 201 5-04-15 2 2 null 2015-04-16 2 2 null 2015-03-10 4 2 NULL 2015-03-12 1 2 2015-03 2015-03-10 4 3 2015-03 2015-03-12 1 3 2015-04 2015-04-16 2 3 2015-04 2015-04-12 2 3 2015-04 2015-04-1 3 3 3 2015-04 2015-04-15 2 3 equivalent to SELECT null,null,count (DISTINCT CookIeid) as uv,0 as grouping__id from lxw1234 UNION all SELECT month,null,count (DISTINCT Cookieid) as uv,1 as Grouping_ _id from lxw1234 GROUP by month UNION all SELECT null,day,count (DISTINCT Cookieid) as uv,2 as grouping__id from LXW1 234 GROUP BY day UNION all SELECT month,day,count (DISTINCT Cookieid) as uv,3 as grouping__id from lxw1234 GROUP by M Onth,day
ROLLUP
is a subset of the cube, dominated by the leftmost dimension, and aggregated hierarchically from that dimension.
For example, hierarchical aggregation is done in the month dimension: SELECT month, day, COUNT (DISTINCT Cookieid) as Uvs, grouping__id from lxw1234 GROUP by Month,day with ROLLUP ORDER by grouping__id; Month Day UV grouping__id------------------------------------------------ ---null NULL 7 0 2015-03 NULL 5 1 2015-04 NULL 6 1 2015-03 2015-03-10 4 3 2015-03 2015-03-12 1 3 2015-04 2015-04-12 2 3 2015-04 2015-04-13 3 3 2015-04 2015-04-15 2 3 2015-04 2015-04-16 2 3 This can be achieved through the drilling process: The month of the uv-> month of the uv-> total UV Copy code-the exchange of month and day in the order, the DA The Y dimension is aggregated hierarchically: SELECT Day, month, COUNT (DISTINCT Cookieid) as Uvs, grouping__id from lxw1234 GROUP by da Y,month with ROLLUP ORDER by grouping__id; Day Month UV grouping__id-------------------------------------------------------NULL NULL 7 0 2015-04-13 NULL 3 1 2015-03-12 NULL 1 1 2015-04-15 null 2 1 2015-03-10 NULL 4 1 2015-04-16 NULL 2 1 2015-04-12 NULL 2 1 2015-04-12 2015-04 2 3 2015-03-10 2015-03 4 3 2015-03-12 2015-03 1 3 2015-04-13 2015-04 3 3 2015-04-15 2015-04 2 3 2015-04-16 2015-04 2 3 can To achieve such a drilling process: the days of the uv-> days of the uv-> total UV (here, according to day and month aggregation, and according to day aggregation results, because there is a parent-child relationship, if it is a combination of other dimensions, it will be different)
Hive Analysis window function (v) GROUPING Sets,grouping__id,cube,rollup