Questions Guide:
What is the role of 1.NTILE?
2. In descending order of PV, which window function can be used to generate a PV position within a group per day?
What is the role of 3.RANK and Dense_rank?
Next: Hive Analysis window function (i) Sum,avg,min,max
This article describes the previous sequence functions, Ntile,row_number,rank,dense_rank, which explain each of their uses.
Hive version is apache-hive-0.13.1
Note: The sequence function does not support the window clause. (What is a window clause, Hive parse window function (i) Sum,avg,min,max)
Data preparation:
cookie1,2015-04-10,1 cookie1,2015-04-11,5 cookie1,2015-04-12,7 cookie1,2015-04-13,3 cookie1,2015-04-14,2 Co okie1,2015-04-15,4 cookie1,2015-04-16,4 cookie2,2015-04-10,2 cookie2,2015-04-11,3 cookie2,2015-04-12,5 Cook ie2,2015-04-13,6 cookie2,2015-04-14,3 cookie2,2015-04-15,9 cookie2,2015-04-16,7 CREATE EXTERNAL TABLE LxW 1234 (Cookieid string, createtime string,--day PV INT) ROW FORMAT delimited fields TERMINATED by ', ' Stored as textfile location '/tmp/lxw11/'; DESC lxw1234; Cookieid string Createtime string PV INT hive> select * from lxw1234; OK cookie1 2015-04-10 1 cookie1 2015-04-11 5 cookie1 2015-04-12 7 cookie1 2015-04-13 3 cookie1 2015-04-14 2 Cookie1 2015-04-15 4 cookie1 2015-04-16 4 cookie2 2015-04-10 2 cookie2 2015-04-11 3 cookie2 2015-04-12 5 Cookie2 2015-04-13 6 cookie2 2015-04-14 3 cookie2 2015-04-15 9 cookie2 2015-04-16 7
NTILE
NTILE (n), used to slice grouped data into n slices sequentially, returning the current slice value
NTILE does not support rows between, such as NTILE (2) over (PARTITION by Cookieid ORDER by Createtime ROWS between 3 preceding and current ROW)
If the slices are uneven, the distribution of the first slice is increased by default
SELECT Cookieid, createtime, PV, NTILE (2) over (PARTITION by Cookieid ORDER by Createtime) as RN1, --grouping data into 2 slices NTILE (3) over (PARTITION by Cookieid ORDER by Createtime) as RN2,--grouping data into 3 slices NTILE (4) over (ORDER by Createtime) as Rn3--divides all data into 4 pieces from the lxw1234 ORDER by Cookieid,createtime; Cookieid Day PV rn1 rn2 rn3 ------------------------------------------------- cookie1 2015-04-10 1 1 1 1 Cookie1 2015-04-11 5 1 1 1 cookie1 2015-04-12 7 1 1 2 cookie1 2015-04-13 3 1 2 2 cookie1 2015-04-14 2 2 2 3 cookie1 2015-04-15 4 2 3 3 cookie1 2015-04-16 4 2 3 4 cookie2 2015-04-10 2 1 1 1 cookie2 2015-04-11 3 1 1 1 cookie2 2015-04-12 5 1 1 2 cookie2 2015-04-13 6 1 2 2 cookie2 2015-04-14 3 2 2 3 cookie2 2015-0 4-15 9 2 3 4 cookie2 2015-04-16 7 2 3 4
For example, counting the first 1/3 days of a COOKIE,PV number
SELECT Cookieid, createtime, PV, NTILE (3) over (PARTITION by Cookieid ORDER by PV DESC) as RN from lxw1234; --rn = 1 of the record is the result we want Cookieid day PV rn ---------------------------------- cookie1 2015-04-12 7 1 Cookie1 2015-04-11 5 1 cookie1 2015-04-15 4 1 cookie1 2015-04-16 4 2 cookie1 2015-04-13 3 2 cookie1 20 15-04-14 2 3 cookie1 2015-04-10 1 3 cookie2 2015-04-15 9 1 cookie2 2015-04-16 7 1 cookie2 2015-04-13 6 1 cookie2 2015-04-12 5 2 cookie2 2015-04-14 3 2 cookie2 2015-04-11 3 3 cookie2 2015-04-10 2 3
Row_number () – Generates sequences of records within a group, starting with 1, in order
– for example, in descending order of PV, generate a PV position within a group per day
Row_number () has a lot of application scenarios, for example, to get the first record in a group, and to get the first refer in a session.
SELECT Cookieid, createtime, PV, row_number () over (PARTITION by Cookieid ORDER by PV desc) as RN from lxw1234; Cookieid Day PV rn ------------------------------------------- cookie1 2015-04-12 7 1 cookie1 2015-04-11 5 2 cookie1 2015-04-15 4 3 cookie1 2015-04-16 4 4 cookie1 2015-04-13 3 5 cookie1 2015-04-14 2 6 Cook Ie1 2015-04-10 1 7 cookie2 2015-04-15 9 1 cookie2 2015-04-16 7 2 cookie2 2015-04-13 6 3 cookie2 2015-0 4-12 5 4 cookie2 2015-04-14 3 5 cookie2 2015-04-11 3 6 cookie2 2015-04-10 2 7
RANK and Dense_rank
-rank () The rank of the data item in the grouping, the rank equal will leave the vacancy in the position
-dense_rank () The rank of the data item in the grouping, the rank equal will not leave the vacancy in the rank
SELECT Cookieid, createtime, PV, RANK () over (PARTITION by Cookieid ORDER by PV desc) as RN1, dense_ RANK () over (PARTITION by Cookieid ORDER BY PV Desc) as rn2, row_number ()-Over (PARTITION by Cookieid ORDER BY PV Desc) As Rn3 from lxw1234 WHERE cookieid = ' cookie1 '; Cookieid Day PV rn1 rn2 rn3 -------------------------------------------------- cookie1 2015-04-12 7 1 1 1 cookie1 2015-04-11 5 2 2 2 cookie1 2015-04-15 4 3 3 3 cookie1 2015-04-16 4 3 3 4 cookie1 2015-04-13 3 5 4 5 cookie1 2015-04-14 2 6 5 6 cookie1 2015-04-10 1 7 6 7 rn1:15 and 16th tie 3rd, 13th row 5th rn2:15 and 16th 3rd , number 13th, 4th rn3: If they are equal, sort by record value, generate a unique order, and if all record values are equal, they may be randomly sorted.
Hive Analysis window function (ii) Ntile,row_number,rank,dense_rank