Hive analysis window function (4) LAG, LEAD, FIRST_VALUE, LAST_VALUE

Source: Internet
Author: User
1. What is the LAG function? 2. What is the difference between LEAD and LAG functions? 3. What functions does FIRST_VALUE and LAST_VALUE provide? Continue to learn these four analysis functions. Note: These functions do not support the WINDOW clause. Hive version is apache-hive-0.13.1 data preparation: utilities cookie1, 2015-04-, url2

1. What is the LAG function? 2. What is the difference between LEAD and LAG functions? 3. What functions does FIRST_VALUE and LAST_VALUE provide? Continue to learn these four analysis functions. Note: These functions do not support the WINDOW clause. Hive version is apache-hive-0.13.1 data preparation: utilities cookie1, 10:00:02, url2

1. What is the LAG function?
2. What are the similarities between LEAD and LAG functions?

3. What functions does FIRST_VALUE and LAST_VALUE provide?


Continue to learn these four analysis functions. Note: These functions do not support the WINDOW clause. Hive version: apache-hive-0.13.1Data preparation:

Utilities

Cookie1, 10:00:02, url2 cookie1, 10:00:00, url1 cookie1, 2015-04-10 10:03:04, 1url3 cookie1, 10:50:05, interval cookie1, 11:00:00, interval cookie1, 10:10:00, url4 cookie1, url4 cookie1, 10:50:01, url5 cookie2, 10:00:02, hour cookie2, 10:00:00, url11 cookie2, 10:03:04, hour cookie2, 10:50:05, url66 cookie2, 11:00:00, hour cookie2, 10:10:00, url44 cookie2, 10:50:01, url55 create external table lxw1234 (cookieid string, createtime string, -- page access time url STRING -- accessed page) row format delimited fields terminated ', 'stored as textfile location'/tmp/lxw11/'; hive> select * from lxw1234; OK 10:00:02 url2 cookie1 10:00:00 url1 cookie1 2015-04-10 10:03:04 recipe cookie1 10:50:05 11:00:00 10:10:00 1url33 cookie2 2015-04-10 10:50:05 url66 cookie2 2015-04-10 11:00:00 url77 cookie2 2015-04-10 10:10:00 url44 cookie2 2015-04-10 10:50:01 url55
LAG

LAG (col, n, DEFAULT) is used to calculate the n-th row in the window.
The first parameter is the column name, the second parameter is the nth row (optional, default: 1), and the third parameter is the default value (when the nth row is NULL, the default value is used, if this parameter is not specified, it is NULL)

SELECT cookieid, createtime, url, ROW_NUMBER () OVER (partition by cookieid order by createtime) AS rn, LAG (createtime, 1, '2017-01-01 00:00:00 ') OVER (partition by cookieid order by createtime) AS last_interval time, LAG (createtime, 2) OVER (partition by cookieid order by createtime) AS last_2_time FROM lxw1234; cookieid createtime url rn has been written into cookie1 10:00:00 url1 1 minute 00:00:00 NULL cookie1 10:00:02 url2 2 10:00:00 NULL cookie1 10:03:04 10:00:02 10:00:00 url4 4 10:10:00 10:03:04 10:00:02 scheduled 10:50:01 url5 5 10:10:00 10:03:04 cookie1 10:50:05 scheduled 6 10:50:01 10:10:00 cookie1 11:00:00 scheduled 7 10:50:05 10:50:01 cookie2 10:00:00 scheduled 1 scheduled 00:00:00 NULL scheduled 10:00:02 scheduled 2 10:00:00 NULL 10:03:04 10:00:02 10:00:00 10:10:00 10:03:04 cookie2 11:00:00 url77 7 10:50:05 10:50:01 last_1_time: the value of the first 1st rows is specified. The default value is '1970-01-01 00:00:00 '. The first row of cookie1 is NULL. Therefore, the default value is 1970 00:00:00 cookie1. The third row, the value of the first row is the value of the second row, the sixth row of cookie1 at 10:00:02 on April 10, the value of the first row is the value of the fifth row, and the value of last_2_time at 10:50:01 on April 10, 2nd is specified, specify the first row of cookie1 by default. The second row of NULL cookie1 goes up to 2, the second row of NULL cookie1 goes up to 2, and the second row of NULL cookie1 goes up to 2. The second row of cookie1 goes up to 10:00:02 cookie1, the fifth line of the above 2 behavior, 10:50:01

LEAD

Opposite to LAG
LEAD (col, n, DEFAULT) is used to count the n rows down in the window
The first parameter is the column name, the second parameter is the next n rows (optional, the default value is 1), and the third parameter is the default value (when the next n behavior is NULL, take the default value, if this parameter is not specified, it is NULL)

SELECT cookieid, createtime, url, ROW_NUMBER () OVER (partition by cookieid order by createtime) AS rn, LEAD (createtime, 1, '2017-01-01 00:00:00 ') OVER (partition by cookieid order by createtime) AS next_interval time, LEAD (createtime, 2) OVER (partition by cookieid order by createtime) AS next_2_time FROM lxw1234; cookieid createtime url rn next_interval time limit ----------------------------------------------- 10:00:00 10:00:02 10:03:04 url1 2 10:00:02 10:03:04 10:10:00 10:03:04 url5 5 2015-04-10 10:50:05 11:00:00 10:50:05 scheduled 6 11:00:00 NULL cookie1 11:00:00 scheduled 7 scheduled 00:00:00 NULL cookie2 10:00:00 scheduled 1 10:00:02 10:03:04 cookie2 10:00:02 scheduled 2 10:03:04 10:10:00 10:03:04 cookie2 scheduled 3 10:10:00 10:50:01 cookie2 10:10:00 ur L44 4 2015-04-10 10:50:01 2015-04-10 10:50:05 cookie2 2015-04-10 10:50:01 url55 5 2015-04-10 10:50:05 2015-04-10 11:00:00 cookie2 2015-04-10 10:50:05 url66 6 2015-04-10 11:00:00 NULL cookie2 2015-04-10 11:00:00 url77 7 minutes 00:00:00 NULL -- logic is the same as LAG, only the LAG is up, and the LEAD is down.

FIRST_VALUE

After sorting in the group, the first value ends in the current row.

    SELECT cookieid,    createtime,    url,    ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,    FIRST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS first1    FROM lxw1234;    cookieid  createtime            url     rn      first1    ---------------------------------------------------------    cookie1 2015-04-10 10:00:00     url1    1       url1    cookie1 2015-04-10 10:00:02     url2    2       url1    cookie1 2015-04-10 10:03:04     1url3   3       url1    cookie1 2015-04-10 10:10:00     url4    4       url1    cookie1 2015-04-10 10:50:01     url5    5       url1    cookie1 2015-04-10 10:50:05     url6    6       url1    cookie1 2015-04-10 11:00:00     url7    7       url1    cookie2 2015-04-10 10:00:00     url11   1       url11    cookie2 2015-04-10 10:00:02     url22   2       url11    cookie2 2015-04-10 10:03:04     1url33  3       url11    cookie2 2015-04-10 10:10:00     url44   4       url11    cookie2 2015-04-10 10:50:01     url55   5       url11    cookie2 2015-04-10 10:50:05     url66   6       url11    cookie2 2015-04-10 11:00:00     url77   7       url11

LAST_VALUE

After sorting in the group, the last value of the current row ends.


    SELECT cookieid,    createtime,    url,    ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,    LAST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS last1    FROM lxw1234;    cookieid  createtime            url    rn       last1      -----------------------------------------------------------------    cookie1 2015-04-10 10:00:00     url1    1       url1    cookie1 2015-04-10 10:00:02     url2    2       url2    cookie1 2015-04-10 10:03:04     1url3   3       1url3    cookie1 2015-04-10 10:10:00     url4    4       url4    cookie1 2015-04-10 10:50:01     url5    5       url5    cookie1 2015-04-10 10:50:05     url6    6       url6    cookie1 2015-04-10 11:00:00     url7    7       url7    cookie2 2015-04-10 10:00:00     url11   1       url11    cookie2 2015-04-10 10:00:02     url22   2       url22    cookie2 2015-04-10 10:03:04     1url33  3       1url33    cookie2 2015-04-10 10:10:00     url44   4       url44    cookie2 2015-04-10 10:50:01     url55   5       url55    cookie2 2015-04-10 10:50:05     url66   6       url66    cookie2 2015-04-10 11:00:00     url77   7       url77

If order by is not specified, the ORDER is sorted BY the offset of the record in the file BY default, and an error occurs.

    SELECT cookieid,    createtime,    url,    FIRST_VALUE(url) OVER(PARTITION BY cookieid) AS first2      FROM lxw1234;    cookieid  createtime            url     first2    ----------------------------------------------    cookie1 2015-04-10 10:00:02     url2    url2    cookie1 2015-04-10 10:00:00     url1    url2    cookie1 2015-04-10 10:03:04     1url3   url2    cookie1 2015-04-10 10:50:05     url6    url2    cookie1 2015-04-10 11:00:00     url7    url2    cookie1 2015-04-10 10:10:00     url4    url2    cookie1 2015-04-10 10:50:01     url5    url2    cookie2 2015-04-10 10:00:02     url22   url22    cookie2 2015-04-10 10:00:00     url11   url22    cookie2 2015-04-10 10:03:04     1url33  url22    cookie2 2015-04-10 10:50:05     url66   url22    cookie2 2015-04-10 11:00:00     url77   url22    cookie2 2015-04-10 10:10:00     url44   url22    cookie2 2015-04-10 10:50:01     url55   url22    SELECT cookieid,    createtime,    url,    LAST_VALUE(url) OVER(PARTITION BY cookieid) AS last2      FROM lxw1234;    cookieid  createtime            url     last2    ----------------------------------------------    cookie1 2015-04-10 10:00:02     url2    url5    cookie1 2015-04-10 10:00:00     url1    url5    cookie1 2015-04-10 10:03:04     1url3   url5    cookie1 2015-04-10 10:50:05     url6    url5    cookie1 2015-04-10 11:00:00     url7    url5    cookie1 2015-04-10 10:10:00     url4    url5    cookie1 2015-04-10 10:50:01     url5    url5    cookie2 2015-04-10 10:00:02     url22   url55    cookie2 2015-04-10 10:00:00     url11   url55    cookie2 2015-04-10 10:03:04     1url33  url55    cookie2 2015-04-10 10:50:05     url66   url55    cookie2 2015-04-10 11:00:00     url77   url55    cookie2 2015-04-10 10:10:00     url44   url55    cookie2 2015-04-10 10:50:01     url55   url55

If you want to obtain the last value after sorting in the group, you need to modify it as follows:

    SELECT cookieid,    createtime,    url,    ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,    LAST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS last1,    FIRST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime DESC) AS last2    FROM lxw1234    ORDER BY cookieid,createtime;    cookieid  createtime            url     rn     last1    last2    -------------------------------------------------------------    cookie1 2015-04-10 10:00:00     url1    1       url1    url7    cookie1 2015-04-10 10:00:02     url2    2       url2    url7    cookie1 2015-04-10 10:03:04     1url3   3       1url3   url7    cookie1 2015-04-10 10:10:00     url4    4       url4    url7    cookie1 2015-04-10 10:50:01     url5    5       url5    url7    cookie1 2015-04-10 10:50:05     url6    6       url6    url7    cookie1 2015-04-10 11:00:00     url7    7       url7    url7    cookie2 2015-04-10 10:00:00     url11   1       url11   url77    cookie2 2015-04-10 10:00:02     url22   2       url22   url77    cookie2 2015-04-10 10:03:04     1url33  3       1url33  url77    cookie2 2015-04-10 10:10:00     url44   4       url44   url77    cookie2 2015-04-10 10:50:01     url55   5       url55   url77    cookie2 2015-04-10 10:50:05     url66   6       url66   url77    cookie2 2015-04-10 11:00:00     url77   7       url77   url77



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.