Hive built-in functions (Analysis functions, window functions)

Source: Internet
Author: User


650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/75/97/wKioL1Y9oZiAkspGAAMsRcKYMDs505.jpg "title=" capture. PNG "alt=" Wkiol1y9oziakspgaamsrckymds505.jpg "/>



CLI command

Show functions;

desc function concat;

desc function extended Concat; see example of how a function is used

NVL function
COALESCE (V1,v2,...) Returns the first non-null value in the parameter, or null if all values are null;


Set.cli.print.header=true;

Winfunc

Employee Salary Identification

ID Money Type

Relational operator precedence high to Low: not and OR
And or priority

Select ID, money from winfunc where id= ' 1001 ' or id= ' 1002 '
and money = ' + ';
    
Results

1001
1001
1001
1001
1002

The correct SQL should be


select ID, money from Winfunc where (id= ' 1001 ' or id= ' 1002 ') and money = ' + ';

Result

     1001  100
Span style= "color: #008000;" >     1002  100


if (con,v1,v2)

    select if (2>1, ' v1 ', ' v2 ') from dual;
    v1

Case when

select case when id= ' 1001 ' Then ' v1 ' when id= ' 1002 ' Then ' v2 ' Else ' v3 ' End from WINFUNC;

Get_json_object

Select Get_json_object (' {' name ': ' Jack ', ' Age ': ' "} ', ' $.name ') from dual;
Jack

Parse_url

Select Parse_url (' http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1 ', ' HOST ') from
lxw_dual;
Facebook.com


Select Parse_url (' http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1 ', ' QUERY ', ' K1 ')
From lxw_dual;
v1



Concat_ws is more than concat. A delimiter between strings

Concat_ws (string sep,array<string>) value processing in an array


Collect_set (ID) go back to the array

    
select Collect_set (ID) from Winfunc;
["1001", "1002", "1003", "1004"]

Collect_list (ID) does not go back to the array

select Collect_list (ID) from Winfunc;



The partition by keyword is part of the analytic function in Oracle, which differs from the aggregation function in that it can return multiple records in a group, whereas aggregate functions generally have only one record that reflects the statistical value

SUM () over (PARTITION by ...) is an analytic function. The effect he performs with the ordinary sum ... GROUP by ... Instead, it calculates the accumulation of expressions in the group and not the simple and.


Group BY and have, where, order by these keywords are executed in the following order: where, Group by, have, order by.

In these four keywords, the column name of the final view can be used only in the order by statement, such as:

SELECT fruitname, Productplace, Price, ID as IDE, Discount
From T_test_fruitinfo
WHERE (productplace = N ' China ')
ORDER by IDE
You can use the IDE only in an order BY statement, and you cannot use the IDE if you need to reference column names in other conditional statements.

The columns in the ORDER by clause must be included in an aggregate function or a GROUP by clause.

When group by and order by is used together, order by is followed by the group by.




First, the window function

First_value (first value for group)
    
Select Id,money,
First_value (Money)-Over (partition by-ID ORDER BY- money
rows between 1 preceding and 1 following)
From Winfunc

Each row corresponds to the Data window from the first row to the last row
rows between unbounded preceding and unbounded following

Lead (money,2) takes a record value with a trailing distance of 2, and no null is taken

Select Id,money,lead (money,2) over (order by money) from Winfunc


Lag (money,2) in the lead opposite


Rank () Sort function with Row_number ()

Select Id,money, Rank () over (partition by ID order by money) from Winfunc
Results

1001 1
1001 2
1001 2
1001 4

Dense_rank ()

Select Id,money, Dense_rank () over (partition by ID order by money) from Winfunc

Results

1001 1
1001 2
1001 2
1001 3

Cume_dist ()

Calculation formula: The same value maximum line number/total number of rows

select Id,money, Cume_dist () over (partition by ID Order by Money) from Winfunc

Result

&NBSP;&NBSP;&NBSP;&NBSP;1001 100 0.25
    1001 0.75
    1001 0.75
    1001 1

Percent_rank (), the first one always starts from zero

Calculation formula: (the same value the smallest line number-1)/(total number-1)

result

    1001 100 0
    1001 0.25
    1001 0.25
    1001 1

Ntile (2) Shard


ASC, Nulls last is the default
desc, nulls First is the default

Select Id,money, Ntile (2) over (order by money Desc nulls last) from Winfunc;


Mixed functions (using the method in Java)

Java_method and reflect are the same.


Select Java_method ("Java.lang.Math", "sqrt", cast (id as Double)) from Winfunc;


UDTF table function explode () Mate lateral view keyword

Select ID, adid from winfunc lateral view explode (split (type, ' B ')) TT as Adid


1001 ABC

Row career

1001 A

1001 C


Regular-expression functions

The like character "_" denotes any single character, while the character "%" denotes any number of characters

Rlike followed by regular expressions

Select 1 from dual where ' footbar ' rlike ' ^f.*r$ ';

Regular expression substitution function

Regexp_replace (String a,string b,string C)
Replace the section in string A with Java regular expression B with C

Select Regexp_replace (' Foobar ', ' oo|ar ', ') from dual;

Back to FB


Regexp_extract (String subject,string pattern,int index)


Select Regexp_extract (' Foothebar ', ' foo (. *?) (bar) ', 1) from dual;

Returns the, () represents a group in a regular expression, and 1 indicates the index of the first group


1. Greedy match (. *), | always matches to the last |


Select Regexp_extract (' 979|7.10.80|8684 ', '. *\\| (. *) ', 1) from dual;

Returns 8684


2. Non-greedy matching (. *?) Add a question mark to tell the regular engine to repeat the last character as little as possible


Select Regexp_extract (' 979|7.10.80|8684 ', ' (. *?) \\| (. *) ', 1) from dual;















This article is from "not what Daniel qq:934033381" blog, please make sure to keep this source http://tianxingzhe.blog.51cto.com/3390077/1710582

Hive built-in functions (Analysis functions, window functions)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.