1. udtf Introduction
Udtf (User-Defined table-Generating Functions) is used to address the need to input one row and output multiple rows (on-to-output maping.
2. Write the udtf you need
Inherit from org. Apache. hadoop. hive. QL. UDF. Generic. genericudtf and implement initialize, process, and close methods.
Udtf first calls the initialize method, which returns the information of the row returned by udtf (number and type ).
After initialization, the process method is called.ProcessIn the functionProcessMedium, every timeForward ()Call to generate a row. If multiple columns are generated, you can put the values of multiple columns in an array, and then pass the arrayForward ()Function.
Finally, call the close () method to clear the method to be cleared.
The following is a string used to split "key: value;". The returned result is key and value. For reference:
Import Java. util. arraylist; Import Org. Apache. hadoop. hive. QL. UDF. Generic. genericudtf; Import Org.apache.hadoop.hive.ql.exe C. udfargumentexception; Import Org.apache.hadoop.hive.ql.exe C. udfargumentlengthexception; Import Org. Apache. hadoop. hive. QL. Metadata. hiveexception; Import Org. Apache. hadoop. hive. serde2.objectinspector. objectinspector; Import Org. Apache. hadoop. hive. serde2.objectinspector. objectinspectorfactory; Import Org. Apache. hadoop. hive. serde2.objectinspector. structobjectinspector; Import Org. Apache. hadoop. hive. serde2.objectinspector. Primitive. primitiveobjectinspectorfactory; Public Class Explodemap Extends Genericudtf {@ override Public Void Close () Throws Hiveexception { // Todo auto-generated method stub } @ Override Public Structobjectinspector initialize (objectinspector [] ARGs) Throws Udfargumentexception { If (ARGs. length! = 1 ){ Throw New Udfargumentlengthexception ("explodemap takes only one argument");} If (ARGs [0]. getcategory ()! = Objectinspector. Category. Primitive ){ Throw New Udfargumentexception ("explodemap takes string as a parameter" );} Arraylist <String> fieldnames = New Arraylist <string> (); Arraylist <Objectinspector> fieldois = New Arraylist <objectinspector> (); Fieldnames. Add ( "Col1"); Fieldois. Add (primitiveobjectinspectorfactory. javastringobjectinspector); fieldnames. Add ( "Col2" ); Fieldois. Add (primitiveobjectinspectorfactory. javastringobjectinspector ); Return Objectinspectorfactory. getstandardstructobjectinspector (fieldnames, fieldois);} @ override Public Void Process (object [] ARGs) Throws Hiveexception {string Input = ARGs [0 ]. Tostring (); string [] Test = Input. Split (";"); For ( Int I = 0; I <test. length; I ++ ){ Try {String [] Result = Test [I]. Split (":" ); Forward (result );} Catch (Exception e ){ Continue ;}}}}
3. Usage
Udtf can be used either directly after select or with lateral view.
1: Used in direct select
SelectExplode_map (properties)As(Col1, col2)FromSRC;
Other fields cannot be added.
SelectA, explode_map (properties)As(Col1, col2)FromSRC
Nested call is not allowed.
SelectExplode_map (properties ))FromSRC
It cannot be used with group by/cluster by/distribute by/sort
SelectExplode_map (properties)As(Col1, col2)FromSRCGroup ByCol1, col2
2: used with lateral view
SelectSRC. ID, mytable. col1, mytable. col2FromSRC lateralViewExplode_map (properties) mytableAsCol1, col2;
This method is more convenient for daily use. The execution process is equivalent to executing the extraction twice separately and then union to a table.
References
Http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF
Http://wiki.apache.org/hadoop/Hive/DeveloperGuide/UDTF
Http://www.slideshare.net/pauly1/userdefined-table-generating-functions
Self-http://blog.csdn.net/tylgoodluck/article/details/7003083