Hive Analysis window function (v) GROUPING Sets,grouping__id,cube,rollup

Source: Internet
Author: User
Tags null null

What is the equivalent of 1.GROUPING sets in other ways?
2. Which keyword completes the aggregation based on all the combinations of the dimensions of the group by?

What is the relationship between 3.ROLLUP and rollup?


GROUPING Sets,grouping__id,cube,rollup These analytic functions are commonly used in OLAP, not additive, and need to be based on the different dimensions of drilling and drill down metrics statistics, such as the number of hours, days, months of UV. hive version is apache-hive-0.13.1Data preparation:
    2015-03,2015-03-10,cookie1 2015-03,2015-03-10,cookie5 2015-03,2015-03-12,cookie7 2015-04,2015-04-12,cookie3    2015-04,2015-04-13,cookie2 2015-04,2015-04-13,cookie4 2015-04,2015-04-16,cookie4 2015-03,2015-03-10,cookie2 2015-03,2015-03-10,cookie3 2015-04,2015-04-12,cookie5 2015-04,2015-04-13,cookie6 2015-04,2015-04-15,cookie3 20    15-04,2015-04-15,cookie2 2015-04,2015-04-16,cookie1 CREATE EXTERNAL TABLE lxw1234 (Month string, day string,    Cookieid STRING) ROW FORMAT delimited fields TERMINATED by ', ' stored as textfile location '/tmp/lxw11/';    Hive> select * from lxw1234; OK 2015-03 2015-03-10 cookie1 2015-03 2015-03-10 cookie5 2015-03 2015-03-12 cookie7 2015-04 201 5-04-12 cookie3 2015-04 2015-04-13 cookie2 2015-04 2015-04-13 cookie4 2015-04 2015-04-16 Cook   IE4 2015-03 2015-03-10 cookie2 2015-03 2015-03-10 cookie3 2015-04 2015-04-12 cookie5 2015-04 2015-04-13 cookie6 2015-04 2015-04-15 cookie3 2015-04 2015-04-15 cookie2 2015-04 2015-04- Cookie1

GROUPING Sets
In a group by query, aggregations are based on different dimension combinations, which is equivalent to combining the group by result set of different dimensions with the union ALL
    SELECT    Month, day    ,    COUNT (DISTINCT Cookieid) as Uvs,    grouping__id from    lxw1234    GROUP by month, Day    GROUPING Sets (Month,day)    ORDER by grouping__id;    Month      Day            UV      grouping__id    ------------------------------------------------    2015-03    NULL            5       1    2015-04    null            6       1    null       2015-03-10      4       2    Null       2015-03-12      1       2    null       2015-04-12      2       2    null       2015-04-13      3       2    Null       2015-04-15      2       2    null       2015-04-16      2       2    is equivalent to    SELECT month,null, COUNT (DISTINCT Cookieid) as uv,1 as grouping__id from lxw1234 GROUP by month    UNION all    SELECT null,day,count (DIS Tinct Cookieid) as uv,2 as grouping__id from lxw1234 GROUP by day

Again such as:
    SELECT month, day, COUNT (DISTINCT Cookieid) as Uvs, grouping__id from lxw1234 GROUP by Month,day G    Rouping Sets (Month,day, (Month,day)) ORDER by grouping__id; Month Day UV grouping__id------------------------------------------------2015-03 NU          LL 5 1 2015-04 null 6 1 NULL 2015-03-10 4 2 NULL 2015-03-12 1 2 null 2015-04-12 2 2 null 2015-04-13 3 2 NU       LL 2015-04-15 2 2 NULL 2015-04-16 2 2 2015-03 2015-03-10 4        3 2015-03 2015-03-12 1 3 2015-04 2015-04-12 2 3 2015-04 2015-04-13 3 3 2015-04 2015-04-15 2 3 2015-04 2015-04-16 2 3 equivalent to SELECT month,null   , COUNT (DISTINCT Cookieid) as uv,1 as grouping__id from lxw1234 GROUP by month UNION ALL SELECT Null,day,count (DISTINCT Cookieid) as uv,2 as grouping__id from lxw1234 GROUP by day UNION all S Elect Month,day,count (DISTINCT Cookieid) as uv,3 as grouping__id from lxw1234 GROUP by Month,day

One of the grouping__id, which indicates which grouping collection the result belongs to.

CUBE
Aggregations are based on all the combinations of the dimensions of the group by.
    SELECT month, day, COUNT (DISTINCT Cookieid) as Uvs, grouping__id from lxw1234 GROUP by Month,day W    ITH CUBE ORDER by grouping__id;    Month Day UV grouping__id--------------------------------------------            Null NULL 7 0 2015-03 NULL 5 1 2015-04 NULL 6 1 NULL 2015-04-12 2 2 null 2015-04-13 3 2 NULL 201            5-04-15 2 2 null 2015-04-16 2 2 null 2015-03-10 4 2 NULL       2015-03-12 1 2 2015-03 2015-03-10 4 3 2015-03 2015-03-12 1 3 2015-04 2015-04-16 2 3 2015-04 2015-04-12 2 3 2015-04 2015-04-1 3 3 3 2015-04 2015-04-15 2 3 equivalent to SELECT null,null,count (DISTINCT CookIeid) as uv,0 as grouping__id from lxw1234 UNION all SELECT month,null,count (DISTINCT Cookieid) as uv,1 as Grouping_ _id from lxw1234 GROUP by month UNION all SELECT null,day,count (DISTINCT Cookieid) as uv,2 as grouping__id from LXW1 234 GROUP BY day UNION all SELECT month,day,count (DISTINCT Cookieid) as uv,3 as grouping__id from lxw1234 GROUP by M Onth,day

ROLLUP
is a subset of the cube, dominated by the leftmost dimension, and aggregated hierarchically from that dimension.
    For example, hierarchical aggregation is done in the month dimension: SELECT month, day, COUNT (DISTINCT Cookieid) as Uvs, grouping__id from lxw1234    GROUP by Month,day with ROLLUP ORDER by grouping__id; Month Day UV grouping__id------------------------------------------------            ---null NULL 7 0 2015-03 NULL 5 1 2015-04 NULL           6 1 2015-03 2015-03-10 4 3 2015-03 2015-03-12 1 3 2015-04       2015-04-12 2 3 2015-04 2015-04-13 3 3 2015-04 2015-04-15 2 3 2015-04 2015-04-16 2 3 This can be achieved through the drilling process: The month of the uv-> month of the uv-> total UV Copy code-the exchange of month and day in the order, the DA The Y dimension is aggregated hierarchically: SELECT Day, month, COUNT (DISTINCT Cookieid) as Uvs, grouping__id from lxw1234 GROUP by da    Y,month with ROLLUP ORDER by grouping__id;                           Day     Month UV grouping__id-------------------------------------------------------NULL       NULL 7 0 2015-04-13 NULL 3 1 2015-03-12 NULL 1                1 2015-04-15 null 2 1 2015-03-10 NULL 4 1 2015-04-16 NULL    2 1 2015-04-12 NULL 2 1 2015-04-12 2015-04 2 3            2015-03-10 2015-03 4 3 2015-03-12 2015-03 1 3 2015-04-13 2015-04 3 3 2015-04-15 2015-04 2 3 2015-04-16 2015-04 2 3 can To achieve such a drilling process: the days of the uv-> days of the uv-> total UV (here, according to day and month aggregation, and according to day aggregation results, because there is a parent-child relationship, if it is a combination of other dimensions, it will be different)


Hive Analysis window function (v) GROUPING Sets,grouping__id,cube,rollup

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.