Hive provides more and more analysis functions for statistical analysis. All the analysis window functions will be processed and released one after another. Today, let's take a look at several basic values: SUM, AVG, MIN, and MAX. P is used to achieve statistics of all and continuous accumulation in the group. PCREATEEXTERNALTABLEyeshuai_test (cookieidstring, crea
Hive provides more and more analysis functions for statistical analysis. All the analysis window functions will be processed and released one after another. Today, let's take a look at several basic values: SUM, AVG, MIN, and MAX. P is used to achieve statistics of all and continuous accumulation in the group. /P create external table yeshuai_test (cookieid string, crea
Hive provides more and more analysis functions for statistical analysis. All the analysis window functions will be processed and released one after another. Today, let's take a look at several basic values: SUM, AVG, MIN, and MAX.Used for statistics on all and continuous accumulation in a group.
CREATE EXTERNAL TABLE yeshuai_test( cookieid string, createtime string, --day pv INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
For testing convenience, you do not need to upload files to hadoop for testing.
Directly go to a local folder to edit data import. Here, my folder path is/home/work/yeshuai/data1.txt.
Use vim to edit and copy the data.
Cookie1 2015-04-10 1
Cookie1 2015-04-11 5
Cookie1 2015-04-12 7
Cookie1 2015-04-13 3
Cookie1 2015-04-14 2
Cookie1 2015-04-15 4
Cookie1 2015-04-16 4
Then hive> select * from yeshuai_test is used for testing. If there is data, the test environment is ready.
SUM-Note: The result is related to order by. The default value is ascending.
SELECT cookieid, createtime, pv, SUM (pv) OVER (partition by cookieid order by createtime) AS pv1, -- the default value is SUM (pv) from the start point to the current row) OVER (partition by cookieid order by createtime rows between unbounded preceding and current row) AS pv2, -- from the starting point to the current row, the result is the same AS pv1 SUM (pv) OVER (partition by cookieid) AS pv3, -- SUM (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and current row) AS pv4, -- current row + SUM (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and 1 FOLLOWING) AS pv5, -- current row + forward 3 ROWS + next 1 row sum (pv) OVER (partition by cookieid order by createtime rows between current row and unbounded following) AS pv6 --- current row + all subsequent rows FROM yeshuai_test; cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 unzip cookie1 2015-04-10 1 1 1 26 1 6 26 cookie1 2015-04-11 5 5 6 6 6 6 13 25 cookie1 2015-04-12 7 13 13 13 16 16 20 cookie1 2015-04-13 3 3 16 16 26 16 18 13 cookie1 2015-04-14 2 18 26 17 21 10 cookie1 2015-04-15 4 22 22 26 16 20 8 cookie1 2015-04-16 4 26 26 26 13 13 4
Pv1: accumulation of PVs from the start point to the current row in the group, for example, pv1 on the 11 th = pv on the 10 th + pv on the 11 th, 12 = 10 + 11 + 12
Pv2: Same as pv1
Pv3: All PVS in the group (cookie1) are accumulated.
Pv4: The current row in the group + the first three rows, for example, 11 = 10 + 11, 12 = 10 + 11 + 12, 13 = 10 + 11 + 12 + 13, 14 = 11 + 12 + 13 + 14
Pv5: The current row in the group + 3 rows forward + 1 row backward, for example, 14 = 11 + 12 + 13 + 14 + 15 = 5 + 7 + 3 + 2 + 4 = 21
Pv6: The current row in the group + all subsequent rows, such, 13 = 13 + 14 + 15 + 16 = 3 + 2 + 4 + 4 = 13, 14 = 14 + 15 + 16 = 2 + 4 + 4 = 10
If rows between is not specified, the default value is from the start point to the current row;
If order by is not specified, all values in the group are accumulated;
The key is to understand the meaning of rows between, also called the WINDOW clause:
PRECEDING: Forward
FOLLOWING: Back
Current row: CURRENT ROW
UNBOUNDED: Start Point. unbounded preceding indicates starting point from the front. unbounded following indicates ending point to the back end point.
-Other AVG, MIN, MAX, and SUM are used in the same way.
- -- AVG
- SELECT cookieid,
- Createtime,
- Pv,
- AVG (pv) OVER (partition by cookieid order by createtime) AS pv1, -- From the start point to the current row BY default
- AVG (pv) OVER (partition by cookieid order by createtime rows between unbounded preceding and current row) AS pv2, -- From the start point to the current row, the result is the same AS pv1
- AVG (pv) OVER (partition by cookieid) AS pv3, -- all rows in the group
- AVG (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and current row) AS pv4, -- current row + forward 3
- AVG (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and 1 FOLLOWING) AS pv5, -- current row + forward 3 ROWS + next 1 row
- AVG (pv) OVER (partition by cookieid order by createtime rows between current row and unbounded following) AS pv6 --- current row + all subsequent ROWS
- FROM lxw1234;
- Cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
- -----------------------------------------------------------------------------
- Cookie1 2015-04-10 1 1.0 1.0 3.7142857142857144 1.0 3.0 3.7142857142857144
- Cookie1 2015-04-11 5 3.0 3.0 3.7142857142857144 3.0 4.333333333333333 4.166666666666667
- Cookie1 2015-04-12 7 4.333333333333333 4.333333333333333 3.7142857142857144 4.333333333333333 4.0 4.0
- Cookie1 3 4.0 4.0 3.7142857142857144 4.0 3.6 3.25
- Cookie1 2015-04-14 2 3.6 3.6 3.7142857142857144 4.25 4.2 3.3333333333333335
- Cookie1 2015-04-15 4 3.6666666666666665 3.6666666666666665 3.7142857142857144 4.0 4.0 4.0
- Cookie1 2015-04-16 4 3.7142857142857144 3.7142857142857144 3.7142857142857144 3.25 3.25 4.0
- -- MIN
- SELECT cookieid,
- Createtime,
- Pv,
- MIN (pv) OVER (partition by cookieid order by createtime) AS pv1, -- From the start point to the current row BY default
- MIN (pv) OVER (partition by cookieid order by createtime rows between unbounded preceding and current row) AS pv2, -- From the start point to the current row, the result is the same AS pv1
- MIN (pv) OVER (partition by cookieid) AS pv3, -- all rows in the group
- MIN (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and current row) AS pv4, -- current row + forward 3
- MIN (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and 1 FOLLOWING) AS pv5, -- current row + forward 3 ROWS + next 1 row
- MIN (pv) OVER (partition by cookieid order by createtime rows between current row and unbounded following) AS pv6 --- current row + all subsequent ROWS
- FROM lxw1234;
- Cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
- -----------------------------------------------------------------------------
- Cookie1 2015-04-10 1 1 1 1 1 1 1
- Cookie1 2015-04-11 5 1 1 1 1 1 2
- Cookie1 2015-04-12 7 1 1 1 1 1 2
- Cookie1 2015-04-13 3 1 1 1 1 1 2
- Cookie1 2015-04-14 2 1 1 1 2 2 2
- Cookie1 2015-04-15 4 1 1 1 2 2 4
- Cookie1 2015-04-16 4 1 1 1 2 2 4
- ---- MAX
- SELECT cookieid,
- Createtime,
- Pv,
- MAX (pv) OVER (partition by cookieid order by createtime) AS pv1, -- From the start point to the current row BY default
- MAX (pv) OVER (partition by cookieid order by createtime rows between unbounded preceding and current row) AS pv2, -- From the start point to the current row, the result is the same AS pv1
- MAX (pv) OVER (partition by cookieid) AS pv3, -- all rows in the group
- MAX (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and current row) AS pv4, -- current row + forward 3
- MAX (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and 1 FOLLOWING) AS pv5, -- current row + forward 3 ROWS + next 1 row
- MAX (pv) OVER (partition by cookieid order by createtime rows between current row and unbounded following) AS pv6 --- current row + all subsequent ROWS
- FROM lxw1234;
- Cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
- -----------------------------------------------------------------------------
- Cookie1 2015-04-10 1 1 1 7 1 5 7
- Cookie1 2015-04-11 5 5 5 7 5 7 7
- Cookie1 2015-04-12 7 7 7 7 7 7 7 7 7
- Cookie1 2015-04-13 3 7 7 7 7 7 7 4
- Cookie1 2015-04-14 2 7 7 7 7 7 7 4
- Cookie1 2015-04-15 4 7 7 7 7 7 7 4
- Cookie1 2015-04-16 4 7 7 7 4 4 4 4