HIVE-UDF operation
Operation procedure of UDF:
Add A custom function to the jar file in the HIVE session , and then create the function, The function is then used.
Below is an example of the following topics:
Topic: Statistics of PV and UV for each activity
First, Java through the regular expression, intercept the title name.
Take a link to intercept the red string.
http://cms.yhd.com/sale/vtxqclczfto? tc=ad.0.0.17280-32881642.1&tp=1.1.36.9.1.leffwdz-10-35rcm &ti=zx8h
As an example.
The core code is as follows,
ImportJava.util.regex.Matcher;ImportJava.util.regex.Pattern;ImportOrg.apache.hadoop.hive.ql.exec.UDF; Public classGetcommentnameoridextendsUDF { Publicstring Evaluate (string url,string flag) {string str=NULL; Pattern P= Pattern.compile (flag+ "/[a-za-z0-9]+"); Matcher m=p.matcher (URL); if(M.find ()) {str= M.group (0). toLowerCase (). Split ("/") [1]; } returnstr; } Public Static voidMain (string[] args) {String URL= "http://cms.yhd.com/sale/vtxqCLCzfto?tc=ad.0.0.17280-32881642.1&tp=1.1.36.9.1.LEffwdz-10-35RcM&ti= zx8h "; Getcommentnameorid GS=NewGetcommentnameorid (); System.out.println (gs.evaluate (URL,"Sale")); }}
To pass the parameter:
Url:http://cms.yhd.com/sale/vtxqclczfto?tc=ad.0.0.17280-32881642.1&tp=1.1.36.9.1.leffwdz-10-35rcm&ti= zx8h
Flag:sale
Finally, the result is:vtxqclczfto
Second, UDF operation
1. Create a table in the Rptest library
Create Table bigintbigint by (ds String,hour string);
2, hit the jar package, and upload to the established path
Add Jar/opt/litong/lib/hiveudf.jar
3, specify the attribute class, create function
Create temporary function Getcommentnameorid as ' com.litong.hive.udf.GetCommentNameOrId ';
4. Add Data to Table rpt_sale_daily
InsertOverwriteTableRptest.rpt_sale_daily Partition (DS='2015-08-28', hour=' -')SelectGetcommentnameorid (URL, "sale") Huodong,Count(URL) PV,Count(distinctGUID) UV from default. Track_log awhereDs='2015-08-28' andHour=' -'Group byds,getcommentnameorid (URL, "sale");InsertOverwriteTableRptest.rpt_sale_daily Partition (DS='2015-08-28', hour=' +')SelectGetcommentnameorid (URL, "sale") Huodong,Count(URL) PV,Count(distinctGUID) UV from default. Track_log awhereDs='2015-08-28' andHour=' +'Group byDs,getcommentnameorid (U
5. Check if data is inserted successfully
OK, data added successfully.
Hive Learning Five "hive advanced-udf Operation Case" detailed