Hive Learning Five "hive advanced-udf Operation Case" detailed

Source: Internet
Author: User

HIVE-UDF operation

Operation procedure of UDF:

Add A custom function to the jar file in the HIVE session , and then create the function, The function is then used.

Below is an example of the following topics:

Topic: Statistics of PV and UV for each activity

First, Java through the regular expression, intercept the title name.

Take a link to intercept the red string.

http://cms.yhd.com/sale/vtxqclczfto? tc=ad.0.0.17280-32881642.1&tp=1.1.36.9.1.leffwdz-10-35rcm &ti=zx8h

As an example.

The core code is as follows,

ImportJava.util.regex.Matcher;ImportJava.util.regex.Pattern;ImportOrg.apache.hadoop.hive.ql.exec.UDF; Public classGetcommentnameoridextendsUDF { Publicstring Evaluate (string url,string flag) {string str=NULL; Pattern P= Pattern.compile (flag+ "/[a-za-z0-9]+"); Matcher m=p.matcher (URL); if(M.find ()) {str= M.group (0). toLowerCase (). Split ("/") [1]; }        returnstr; }         Public Static voidMain (string[] args) {String URL= "http://cms.yhd.com/sale/vtxqCLCzfto?tc=ad.0.0.17280-32881642.1&tp=1.1.36.9.1.LEffwdz-10-35RcM&ti= zx8h "; Getcommentnameorid GS=NewGetcommentnameorid (); System.out.println (gs.evaluate (URL,"Sale")); }}

To pass the parameter:

Url:http://cms.yhd.com/sale/vtxqclczfto?tc=ad.0.0.17280-32881642.1&tp=1.1.36.9.1.leffwdz-10-35rcm&ti= zx8h

Flag:sale

Finally, the result is:vtxqclczfto

Second, UDF operation

1. Create a table in the Rptest library

Create Table  bigintbigint by (ds String,hour string);

2, hit the jar package, and upload to the established path

Add Jar/opt/litong/lib/hiveudf.jar

3, specify the attribute class, create function

Create temporary function Getcommentnameorid as ' com.litong.hive.udf.GetCommentNameOrId ';

4. Add Data to Table rpt_sale_daily

InsertOverwriteTableRptest.rpt_sale_daily Partition (DS='2015-08-28', hour=' -')SelectGetcommentnameorid (URL, "sale") Huodong,Count(URL) PV,Count(distinctGUID) UV from default. Track_log awhereDs='2015-08-28'  andHour=' -'Group  byds,getcommentnameorid (URL, "sale");InsertOverwriteTableRptest.rpt_sale_daily Partition (DS='2015-08-28', hour=' +')SelectGetcommentnameorid (URL, "sale") Huodong,Count(URL) PV,Count(distinctGUID) UV from default. Track_log awhereDs='2015-08-28'  andHour=' +'Group  byDs,getcommentnameorid (U

5. Check if data is inserted successfully

  

OK, data added successfully.

  

  

Hive Learning Five "hive advanced-udf Operation Case" detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.