Writing and using udtf in hive)

Source: Internet
Author: User
1. udtf Introduction

Udtf (User-Defined table-Generating Functions) is used to address the need to input one row and output multiple rows (on-to-output maping.

 

2. Write the udtf you need

Inherit from org. Apache. hadoop. hive. QL. UDF. Generic. genericudtf and implement initialize, process, and close methods.

Udtf first calls the initialize method, which returns the information of the row returned by udtf (number and type ).

After initialization, the process method is called.ProcessIn the functionProcessMedium, every timeForward ()Call to generate a row. If multiple columns are generated, you can put the values of multiple columns in an array, and then pass the arrayForward ()Function.

Finally, call the close () method to clear the method to be cleared.

The following is a string used to split "key: value;". The returned result is key and value. For reference:

 Import Java. util. arraylist;  Import  Org. Apache. hadoop. hive. QL. UDF. Generic. genericudtf;  Import  Org.apache.hadoop.hive.ql.exe C. udfargumentexception;  Import  Org.apache.hadoop.hive.ql.exe C. udfargumentlengthexception;  Import  Org. Apache. hadoop. hive. QL. Metadata. hiveexception;  Import  Org. Apache. hadoop. hive. serde2.objectinspector. objectinspector;  Import Org. Apache. hadoop. hive. serde2.objectinspector. objectinspectorfactory;  Import  Org. Apache. hadoop. hive. serde2.objectinspector. structobjectinspector;  Import  Org. Apache. hadoop. hive. serde2.objectinspector. Primitive. primitiveobjectinspectorfactory;  Public   Class Explodemap Extends  Genericudtf {@ override  Public   Void Close () Throws Hiveexception {  //  Todo auto-generated method stub  } @ Override  Public  Structobjectinspector initialize (objectinspector [] ARGs)  Throws  Udfargumentexception {  If (ARGs. length! = 1 ){  Throw   New Udfargumentlengthexception ("explodemap takes only one argument");}  If (ARGs [0]. getcategory ()! = Objectinspector. Category. Primitive ){  Throw   New Udfargumentexception ("explodemap takes string as a parameter" );} Arraylist <String> fieldnames = New Arraylist <string> (); Arraylist <Objectinspector> fieldois = New Arraylist <objectinspector> (); Fieldnames. Add ( "Col1"); Fieldois. Add (primitiveobjectinspectorfactory. javastringobjectinspector); fieldnames. Add ( "Col2" ); Fieldois. Add (primitiveobjectinspectorfactory. javastringobjectinspector );  Return  Objectinspectorfactory. getstandardstructobjectinspector (fieldnames, fieldois);} @ override  Public   Void Process (object [] ARGs) Throws  Hiveexception {string Input = ARGs [0 ]. Tostring (); string [] Test = Input. Split (";");  For ( Int I = 0; I <test. length; I ++ ){  Try  {String [] Result = Test [I]. Split (":" ); Forward (result );}  Catch  (Exception e ){  Continue  ;}}}} 

 

3. Usage

Udtf can be used either directly after select or with lateral view.

 

1: Used in direct select

 
SelectExplode_map (properties)As(Col1, col2)FromSRC;

Other fields cannot be added.

 
SelectA, explode_map (properties)As(Col1, col2)FromSRC

Nested call is not allowed.

 
SelectExplode_map (properties ))FromSRC

It cannot be used with group by/cluster by/distribute by/sort

SelectExplode_map (properties)As(Col1, col2)FromSRCGroup ByCol1, col2

 

2: used with lateral view

 
SelectSRC. ID, mytable. col1, mytable. col2FromSRC lateralViewExplode_map (properties) mytableAsCol1, col2;

This method is more convenient for daily use. The execution process is equivalent to executing the extraction twice separately and then union to a table.

 

References

Http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF
Http://wiki.apache.org/hadoop/Hive/DeveloperGuide/UDTF
Http://www.slideshare.net/pauly1/userdefined-table-generating-functions

 

Self-http://blog.csdn.net/tylgoodluck/article/details/7003083

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.