Hive Analysis window function (ii) Ntile,row_number,rank,dense_rank

Source: Internet
Author: User


Questions Guide:
What is the role of 1.NTILE?
2. In descending order of PV, which window function can be used to generate a PV position within a group per day?
What is the role of 3.RANK and Dense_rank?

Next: Hive Analysis window function (i) Sum,avg,min,max


This article describes the previous sequence functions, Ntile,row_number,rank,dense_rank, which explain each of their uses.
Hive version is apache-hive-0.13.1

Note: The sequence function does not support the window clause.
(What is a window clause, Hive parse window function (i) Sum,avg,min,max)

Data preparation:


cookie1,2015-04-10,1 cookie1,2015-04-11,5 cookie1,2015-04-12,7 cookie1,2015-04-13,3 cookie1,2015-04-14,2 Co okie1,2015-04-15,4 cookie1,2015-04-16,4 cookie2,2015-04-10,2 cookie2,2015-04-11,3 cookie2,2015-04-12,5 Cook ie2,2015-04-13,6 cookie2,2015-04-14,3 cookie2,2015-04-15,9 cookie2,2015-04-16,7 CREATE EXTERNAL TABLE LxW    1234 (Cookieid string, createtime string,--day PV INT) ROW FORMAT delimited fields TERMINATED by ', '         Stored as textfile location '/tmp/lxw11/';    DESC lxw1234;    Cookieid string Createtime string PV INT hive> select * from lxw1234;     OK cookie1 2015-04-10 1 cookie1 2015-04-11 5 cookie1 2015-04-12 7 cookie1 2015-04-13 3 cookie1 2015-04-14 2    Cookie1 2015-04-15 4 cookie1 2015-04-16 4 cookie2 2015-04-10 2 cookie2 2015-04-11 3 cookie2 2015-04-12 5 Cookie2 2015-04-13 6 cookie2 2015-04-14 3 cookie2 2015-04-15 9 cookie2 2015-04-16 7


NTILE
NTILE (n), used to slice grouped data into n slices sequentially, returning the current slice value
NTILE does not support rows between, such as NTILE (2) over (PARTITION by Cookieid ORDER by Createtime ROWS between 3 preceding and current ROW)
If the slices are uneven, the distribution of the first slice is increased by default


SELECT    Cookieid,    createtime,    PV,    NTILE (2) over (PARTITION by Cookieid ORDER by Createtime) as RN1,        --grouping data into 2 slices    NTILE (3) over (PARTITION by Cookieid ORDER by Createtime) as RN2,--grouping data into 3 slices    NTILE (4) over (ORDER by Createtime) as Rn3--divides all data into 4 pieces from the    lxw1234    ORDER by Cookieid,createtime;         Cookieid Day PV rn1 rn2 rn3    -------------------------------------------------    cookie1 2015-04-10 1 1 1 1    Cookie1 2015-04-11 5 1 1 1    cookie1 2015-04-12 7 1 1 2    cookie1 2015-04-13 3 1 2 2    cookie1 2015-04-14 2 2 2 3    cookie1 2015-04-15 4 2 3 3    cookie1 2015-04-16 4 2 3 4    cookie2 2015-04-10 2 1 1 1    cookie2 2015-04-11 3 1 1 1    cookie2 2015-04-12 5 1 1 2    cookie2 2015-04-13 6 1 2 2    cookie2 2015-04-14 3 2 2 3    cookie2 2015-0 4-15 9 2 3 4    cookie2 2015-04-16 7 2 3 4


For example, counting the first 1/3 days of a COOKIE,PV number

SELECT    Cookieid,    createtime,    PV,    NTILE (3) over (PARTITION by Cookieid ORDER by PV DESC) as RN    from lxw1234;         --rn = 1 of the record is the result we want         Cookieid day PV rn    ----------------------------------    cookie1 2015-04-12 7 1    Cookie1 2015-04-11 5 1    cookie1 2015-04-15 4 1    cookie1 2015-04-16 4 2    cookie1 2015-04-13 3 2    cookie1 20  15-04-14 2 3    cookie1 2015-04-10 1 3    cookie2 2015-04-15 9 1    cookie2 2015-04-16 7 1    cookie2 2015-04-13 6 1    cookie2 2015-04-12 5 2    cookie2 2015-04-14 3 2    cookie2 2015-04-11 3 3    cookie2 2015-04-10 2 3


Row_number () – Generates sequences of records within a group, starting with 1, in order
– for example, in descending order of PV, generate a PV position within a group per day
Row_number () has a lot of application scenarios, for example, to get the first record in a group, and to get the first refer in a session.

SELECT    Cookieid,    createtime,    PV,    row_number () over (PARTITION by Cookieid ORDER by PV desc) as RN    from lxw1234;         Cookieid Day PV rn    -------------------------------------------    cookie1 2015-04-12 7 1    cookie1 2015-04-11 5 2    cookie1 2015-04-15 4 3    cookie1 2015-04-16 4 4    cookie1 2015-04-13 3 5    cookie1 2015-04-14 2 6    Cook Ie1 2015-04-10 1 7    cookie2 2015-04-15 9 1    cookie2 2015-04-16 7 2    cookie2 2015-04-13 6 3    cookie2 2015-0 4-12 5 4    cookie2 2015-04-14 3 5    cookie2 2015-04-11 3 6    cookie2 2015-04-10 2 7

RANK and Dense_rank
-rank () The rank of the data item in the grouping, the rank equal will leave the vacancy in the position
-dense_rank () The rank of the data item in the grouping, the rank equal will not leave the vacancy in the rank

  SELECT    Cookieid,    createtime,    PV,    RANK () over (PARTITION by Cookieid ORDER by PV desc) as RN1,    dense_ RANK () over (PARTITION by Cookieid ORDER BY PV Desc) as rn2,    row_number ()-Over (PARTITION by Cookieid ORDER BY PV Desc) As Rn3 from    lxw1234    WHERE cookieid = ' cookie1 ';         Cookieid Day PV rn1 rn2 rn3    --------------------------------------------------    cookie1 2015-04-12 7 1 1 1    cookie1 2015-04-11 5 2 2 2    cookie1 2015-04-15 4 3 3 3    cookie1 2015-04-16 4 3 3 4    cookie1 2015-04-13 3 5 4 5    cookie1 2015-04-14 2 6 5 6    cookie1 2015-04-10 1 7 6 7         rn1:15 and 16th tie 3rd, 13th row 5th    rn2:15 and 16th 3rd , number 13th, 4th    rn3: If they are equal, sort by record value, generate a unique order, and if all record values are equal, they may be randomly sorted.




Hive Analysis window function (ii) Ntile,row_number,rank,dense_rank

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.