Hive Custom Function UDAF development

Source: Internet
Author: User

Hive Custom Function UDAF development
Hive supports custom functions, UDAF is to accept multiple lines, output one line. This function is usually used when group by.
In fact, the best learning materials are the official examples. I'm using the version 0.10 hive here, so for the examples inHttps://github.com/apache/hive/tree/branch-0.10/contrib/src/java/org/apache/hadoop/hive/contrib/udaf/example

The functional requirements I have here are:Actioncount (act_code,act_times, ' 1 ')
if act_code== ' 1 ', then the act_times in a group will be added together.
Package Hive.udaf;import Org.apache.hadoop.hive.ql.exec.udaf;import Org.apache.hadoop.hive.ql.exec.UDAFEvaluator ;/** * * It should is very easy-follow and can be used as a example for writing * New Udafs. * * Note that Hive internally uses a different mechanism (called GENERICUDAF) to * implement built-in aggregation Functio NS, which is harder to program but * more efficient.   * */public Final class Actioncount extends Udaf {/** * The internal state of a aggregation for average.   * * Note that this was only needed if the internal state cannot was represented * by a primitive. * The internal state can also contains fields with types like * arraylist<string> and HASHMAP&LT;STRING,DOUBL   e> if needed.    */public static class Udafstate {private Long mCount;  Private long mSum; }/** * The actual class for doing the aggregation.   Hive would automatically look * for all internal classes of the UDAF that implements Udafevaluator. */public static Class Udafexampleavgevaluator implements Udafevaluator {udafstate state;      Public Udafexampleavgevaluator () {super ();      state = new Udafstate ();    Init ();     }/** * Reset the state of the aggregation.      */public void init () {state.msum = 0;    State.mcount = 0;     }/** * Iterate through one row of original data.     * The number and type of arguments need to the same as we call this UDAF * from Hive command line.     * * This function should always return true. */Public Boolean iterate (String act_code,long act_times,string act_type)//a line {if (act_code. Equals (act_t        ype)) {state.msum + = Act_times;      state.mcount++;    } return true; }/** * Terminate a partial aggregation and return the state.     If the state was a * primitive, just return primitive Java classes like Integer or String. */Public Udafstate terminatepartial () {//state pass//This is SQL standard- Average of zero items should be null. return State.mcount = = 0?    Null:state;     }/** * Merge with a partial aggregation. * * This function should always had a single argument which had the same * type as the return value of terminate     Partial ().        */Public boolean merge (Udafstate o) {//Sub-task merge if (o! = null) {state.msum + = O.msum;      State.mcount + = O.mcount;    } return true;     }/** * Terminates the aggregation and return the final result.      */Public long terminate () {//Return final result//This is SQL standard-average of zero items should be null. return State.mcount = = 0?    0:state.msum; }} private Actioncount () {//Prevent instantiation}}

The key is to deeply understand the map-reduce work model to better harness hive.
The author of this article: Linger
This article link: http://blog.csdn.net/lingerlanlan/article/details/41920151



Hive Custom Function UDAF development

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.