Hive common built-in functions and their use

Source: Internet
Author: User

function Classification

HIVE CLI Command

Shows how many functions are available for the current session
SHOW FUNCTIONS;

Displaying description information for a function
DESC FUNCTION concat;

Display extended description information for a function
DESC FUNCTION EXTENDED concat;

Simple Functions

The calculation granularity of the function is a single record.
Relational operations
Mathematical operations
Logical operations
Numerical calculation
Type conversions
Date function
Conditional functions
String functions
Statistical functions

Aggregation Functions

The granularity of the data processed by the function is more than one record.
SUM ()-sum
Count ()-Calculate the amount of data
AVG ()-Averaging straight
distinct-to find the number of different values
min-to find the minimum value
max-to find the most human value

aggregate Functions

Composite type Building
Complex Type Access
Complex Type length

Special Functions window Functions

Application Scenarios
for partition sorting
Dynamic GROUP BY
Top N
Cumulative calculation
Hierarchical query

Windowing functions

leadlagFIRST_VALUELAST_VALUE
Analytic Functions

Analytics functions

RANKROW_NUMBERDENSE_RANKCUME_DISTPERCENT_RANKNTILE
Mixed Functions
java_method(class,method [,arg1 [,arg2])reflect(class,method [,arg1 [,arg2..]])hash(a1 [,a2...])
UDTF
lateralViewLATERAL VIEW udtf(expression) tableAlias AS columnAlias (‘,‘ columnAlias)*  fromClauseFROM baseTable (lateralView)*  

Ateral view is used in conjunction with split, explode and other UDTF, which can split a row of data into multiple rows of data, based on which the data is aggregated. Lateral view first calls for each row of the original table UDTF,UTDF splits a row into one or more rows, lateral view then combines the results to produce a virtual table that supports the alias table.

Common functions Demo:
 Create table employee (ID string,money double, type string)row format delimited Fields terminated  by ' \ t ' lines  terminated by ' \ n ' stored as textfile;   load data local inpath '/liguodong/hive/data '  into table Employee; Select *  from employee;Priority is not and OR Select Id,money  from employee where (id=' 1001 ' or id=' 1002 ' )  and money=' + ';

Cast Type conversions

select cast(1.5 as int);

If judgment

if  (Con, , " ); Hive () > select  if  (2  >1 ,  ' YES ' ,  ' NO ' ); YES  
case   When  con then   " when  con then   "  else   " end  ( the same type) Select case  when  id= ' 1001 '  then   ' v1001 '  when  id= ' 1002 '  then   ' v1002 '  else   ' v1003 '  end  from employee;  

get_json_object

get_json_object(json 解析函数,用来处理json,必须是json格式)select get_json_object(‘{"name":"jack","age":"20"}‘,‘$.name‘);

URL parsing functions

parse_url(stringstringstring keyToExtract])select parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1‘‘HOST‘from1;

string Join function: Concat
Syntax: concat (String A, String B ...)
return value: String
Description: Returns the result of an input string connection, supporting any input string
Example:

select concat(‘abc‘,‘def’,‘gh‘) from lxw_dual;abcdefgh

delimited string join function: Concat_ws
Syntax: Concat_ws (String SEP, String A, String B ...)
return value: String
Description: Returns the result of the input string concatenated, and SEP represents the delimiter between each string

concat_ws(string SEP, array<stringselect concat_ws(‘,‘,‘abc‘,‘def‘,‘gh‘from lxw_dual;abc,def,gh

列出该字段所有不重复的值,相当于去重collect_set(id)  //返回的是数组列出该字段所有的值,列出来不去重 collect_list(id)   //返回的是数组select collect_set(id) from taborder;

求和sum(money)统计列数count(*)select sum(num),count(*)from taborder;

Window functions

byorderby money)selectbyorderbyfrom taborder;

rows1and1 following (当前行以及当前行的前一行与后一行)hive (liguodong)>selectbyorderbyROWS2AND CURRENT ROW) from taborder;

selectbyfrom taborder;

 lead 去当前行后面的第二行的值 lead(money,2) over (orderby money) lag  去当前行前面的第二行的值 lag(money,2) over (orderby``` ```select ch, num, lead(num,2) over (orderby num) from taborder;

select ch, num, lag(num,2) over (order by num) from taborder;

Rank Rank

overbyidby money)
select ch, num, rank() over(partition by ch order by num) as rank from taborder;

select ch, num, dense_rank() over(partition by ch order by num) as dense_rank from taborder;

Cume_dist

Cume_dist (maximum line number/line number of the same value) Cume_dist () Over(Partition by IDOrder byMoney) Percent_rank (the smallest line number of the same value-1)/(Number of rows-1) The first one always0Start Percent_rank () Over(Partition by IDOrder byMoney) Select Ch,num,cume_dist () Over(Partition byCH Order byNum asCume_dist, Percent_rank () Over(Partition byCH Order byNum asPercent_rank fromTabOrder;

 ntile分片  ntile(2) over (orderbydesc)  分两份  select ch,num,ntile(2) over (orderbydescfrom taborder;

Mixed functions

select id,java_method("java.lang,Math","sqrt",cast(id as double)) as sqrt from hiveTest;

Udtf

select id,adid  from employee  lateral view explode(split(type,‘B‘as adid; explode 把一列转成多行hive (liguodong)>  select id,adid                >  from hiveDemo                >  lateral view explode(split(str,‘,‘as adid;

Regular Expressions
Functions that use regular expressions
regexp_replace(string subject A,string B,string C)
regexp_extract(string subject,string pattern,int index)

select regexp_replace(‘foobar‘‘oo|ar‘‘‘fromselect regexp_replace(‘979|7.10.80|8684‘‘.*\\|(.*)‘,1from1;

select regexp_replace(‘979|7.10.80|8684‘‘(.*?)\\|(.*)‘,1from1;

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Hive common built-in functions and their use

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.