Hive analysis window function (1) SUM, AVG, MIN, MAX

Source: Internet
Author: User
Hive provides more and more analysis functions for statistical analysis. All the analysis window functions will be processed and released one after another. Today, let's take a look at several basic values: SUM, AVG, MIN, and MAX. P is used to achieve statistics of all and continuous accumulation in the group. PCREATEEXTERNALTABLEyeshuai_test (cookieidstring, crea

Hive provides more and more analysis functions for statistical analysis. All the analysis window functions will be processed and released one after another. Today, let's take a look at several basic values: SUM, AVG, MIN, and MAX. P is used to achieve statistics of all and continuous accumulation in the group. /P create external table yeshuai_test (cookieid string, crea

Hive provides more and more analysis functions for statistical analysis. All the analysis window functions will be processed and released one after another. Today, let's take a look at several basic values: SUM, AVG, MIN, and MAX.

Used for statistics on all and continuous accumulation in a group.

 
CREATE EXTERNAL TABLE yeshuai_test(    cookieid string,    createtime string, --day    pv INT    ) ROW FORMAT DELIMITED    FIELDS TERMINATED BY ','        

For testing convenience, you do not need to upload files to hadoop for testing.

Directly go to a local folder to edit data import. Here, my folder path is/home/work/yeshuai/data1.txt.

Use vim to edit and copy the data.


  • Cookie1 2015-04-10 1
  • Cookie1 2015-04-11 5
  • Cookie1 2015-04-12 7
  • Cookie1 2015-04-13 3
  • Cookie1 2015-04-14 2
  • Cookie1 2015-04-15 4
  • Cookie1 2015-04-16 4


  • Then hive> select * from yeshuai_test is used for testing. If there is data, the test environment is ready.


    SUM-Note: The result is related to order by. The default value is ascending.

    SELECT cookieid, createtime, pv, SUM (pv) OVER (partition by cookieid order by createtime) AS pv1, -- the default value is SUM (pv) from the start point to the current row) OVER (partition by cookieid order by createtime rows between unbounded preceding and current row) AS pv2, -- from the starting point to the current row, the result is the same AS pv1 SUM (pv) OVER (partition by cookieid) AS pv3, -- SUM (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and current row) AS pv4, -- current row + SUM (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and 1 FOLLOWING) AS pv5, -- current row + forward 3 ROWS + next 1 row sum (pv) OVER (partition by cookieid order by createtime rows between current row and unbounded following) AS pv6 --- current row + all subsequent rows FROM yeshuai_test; cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6 unzip cookie1 2015-04-10 1 1 1 26 1 6 26 cookie1 2015-04-11 5 5 6 6 6 6 13 25 cookie1 2015-04-12 7 13 13 13 16 16 20 cookie1 2015-04-13 3 3 16 16 26 16 18 13 cookie1 2015-04-14 2 18 26 17 21 10 cookie1 2015-04-15 4 22 22 26 16 20 8 cookie1 2015-04-16 4 26 26 26 13 13 4


    Pv1: accumulation of PVs from the start point to the current row in the group, for example, pv1 on the 11 th = pv on the 10 th + pv on the 11 th, 12 = 10 + 11 + 12
    Pv2: Same as pv1
    Pv3: All PVS in the group (cookie1) are accumulated.
    Pv4: The current row in the group + the first three rows, for example, 11 = 10 + 11, 12 = 10 + 11 + 12, 13 = 10 + 11 + 12 + 13, 14 = 11 + 12 + 13 + 14
    Pv5: The current row in the group + 3 rows forward + 1 row backward, for example, 14 = 11 + 12 + 13 + 14 + 15 = 5 + 7 + 3 + 2 + 4 = 21
    Pv6: The current row in the group + all subsequent rows, such, 13 = 13 + 14 + 15 + 16 = 3 + 2 + 4 + 4 = 13, 14 = 14 + 15 + 16 = 2 + 4 + 4 = 10

    If rows between is not specified, the default value is from the start point to the current row;
    If order by is not specified, all values in the group are accumulated;
    The key is to understand the meaning of rows between, also called the WINDOW clause:
    PRECEDING: Forward
    FOLLOWING: Back
    Current row: CURRENT ROW
    UNBOUNDED: Start Point. unbounded preceding indicates starting point from the front. unbounded following indicates ending point to the back end point.

    -Other AVG, MIN, MAX, and SUM are used in the same way.

    1. -- AVG
    2. SELECT cookieid,
    3. Createtime,
    4. Pv,
    5. AVG (pv) OVER (partition by cookieid order by createtime) AS pv1, -- From the start point to the current row BY default
    6. AVG (pv) OVER (partition by cookieid order by createtime rows between unbounded preceding and current row) AS pv2, -- From the start point to the current row, the result is the same AS pv1
    7. AVG (pv) OVER (partition by cookieid) AS pv3, -- all rows in the group
    8. AVG (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and current row) AS pv4, -- current row + forward 3
    9. AVG (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and 1 FOLLOWING) AS pv5, -- current row + forward 3 ROWS + next 1 row
    10. AVG (pv) OVER (partition by cookieid order by createtime rows between current row and unbounded following) AS pv6 --- current row + all subsequent ROWS
    11. FROM lxw1234;
    12. Cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
    13. -----------------------------------------------------------------------------
    14. Cookie1 2015-04-10 1 1.0 1.0 3.7142857142857144 1.0 3.0 3.7142857142857144
    15. Cookie1 2015-04-11 5 3.0 3.0 3.7142857142857144 3.0 4.333333333333333 4.166666666666667
    16. Cookie1 2015-04-12 7 4.333333333333333 4.333333333333333 3.7142857142857144 4.333333333333333 4.0 4.0
    17. Cookie1 3 4.0 4.0 3.7142857142857144 4.0 3.6 3.25
    18. Cookie1 2015-04-14 2 3.6 3.6 3.7142857142857144 4.25 4.2 3.3333333333333335
    19. Cookie1 2015-04-15 4 3.6666666666666665 3.6666666666666665 3.7142857142857144 4.0 4.0 4.0
    20. Cookie1 2015-04-16 4 3.7142857142857144 3.7142857142857144 3.7142857142857144 3.25 3.25 4.0

    1. -- MIN
    2. SELECT cookieid,
    3. Createtime,
    4. Pv,
    5. MIN (pv) OVER (partition by cookieid order by createtime) AS pv1, -- From the start point to the current row BY default
    6. MIN (pv) OVER (partition by cookieid order by createtime rows between unbounded preceding and current row) AS pv2, -- From the start point to the current row, the result is the same AS pv1
    7. MIN (pv) OVER (partition by cookieid) AS pv3, -- all rows in the group
    8. MIN (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and current row) AS pv4, -- current row + forward 3
    9. MIN (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and 1 FOLLOWING) AS pv5, -- current row + forward 3 ROWS + next 1 row
    10. MIN (pv) OVER (partition by cookieid order by createtime rows between current row and unbounded following) AS pv6 --- current row + all subsequent ROWS
    11. FROM lxw1234;
    12. Cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
    13. -----------------------------------------------------------------------------
    14. Cookie1 2015-04-10 1 1 1 1 1 1 1
    15. Cookie1 2015-04-11 5 1 1 1 1 1 2
    16. Cookie1 2015-04-12 7 1 1 1 1 1 2
    17. Cookie1 2015-04-13 3 1 1 1 1 1 2
    18. Cookie1 2015-04-14 2 1 1 1 2 2 2
    19. Cookie1 2015-04-15 4 1 1 1 2 2 4
    20. Cookie1 2015-04-16 4 1 1 1 2 2 4

    1. ---- MAX
    2. SELECT cookieid,
    3. Createtime,
    4. Pv,
    5. MAX (pv) OVER (partition by cookieid order by createtime) AS pv1, -- From the start point to the current row BY default
    6. MAX (pv) OVER (partition by cookieid order by createtime rows between unbounded preceding and current row) AS pv2, -- From the start point to the current row, the result is the same AS pv1
    7. MAX (pv) OVER (partition by cookieid) AS pv3, -- all rows in the group
    8. MAX (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and current row) AS pv4, -- current row + forward 3
    9. MAX (pv) OVER (partition by cookieid order by createtime rows between 3 preceding and 1 FOLLOWING) AS pv5, -- current row + forward 3 ROWS + next 1 row
    10. MAX (pv) OVER (partition by cookieid order by createtime rows between current row and unbounded following) AS pv6 --- current row + all subsequent ROWS
    11. FROM lxw1234;
    12. Cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
    13. -----------------------------------------------------------------------------
    14. Cookie1 2015-04-10 1 1 1 7 1 5 7
    15. Cookie1 2015-04-11 5 5 5 7 5 7 7
    16. Cookie1 2015-04-12 7 7 7 7 7 7 7 7 7
    17. Cookie1 2015-04-13 3 7 7 7 7 7 7 4
    18. Cookie1 2015-04-14 2 7 7 7 7 7 7 4
    19. Cookie1 2015-04-15 4 7 7 7 7 7 7 4
    20. Cookie1 2015-04-16 4 7 7 7 4 4 4 4



  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.