Hive Custom Function UDAF development
Hive supports custom functions, UDAF is to accept multiple lines, output one line. This function is usually used when group by.
In fact, the best learning materials are the official examples. I'm using the version 0.10 hive here, so for the examples inHttps://github.com/apache/hive/tree/branch-0.10/contrib/src/java/org/apache/hadoop/hive/contrib/udaf/example
The functional requirements I have here are:Actioncount (act_code,act_times, ' 1 ')
if act_code== ' 1 ', then the act_times in a group will be added together.
Package Hive.udaf;import Org.apache.hadoop.hive.ql.exec.udaf;import Org.apache.hadoop.hive.ql.exec.UDAFEvaluator ;/** * * It should is very easy-follow and can be used as a example for writing * New Udafs. * * Note that Hive internally uses a different mechanism (called GENERICUDAF) to * implement built-in aggregation Functio NS, which is harder to program but * more efficient. * */public Final class Actioncount extends Udaf {/** * The internal state of a aggregation for average. * * Note that this was only needed if the internal state cannot was represented * by a primitive. * The internal state can also contains fields with types like * arraylist<string> and HASHMAP<STRING,DOUBL e> if needed. */public static class Udafstate {private Long mCount; Private long mSum; }/** * The actual class for doing the aggregation. Hive would automatically look * for all internal classes of the UDAF that implements Udafevaluator. */public static Class Udafexampleavgevaluator implements Udafevaluator {udafstate state; Public Udafexampleavgevaluator () {super (); state = new Udafstate (); Init (); }/** * Reset the state of the aggregation. */public void init () {state.msum = 0; State.mcount = 0; }/** * Iterate through one row of original data. * The number and type of arguments need to the same as we call this UDAF * from Hive command line. * * This function should always return true. */Public Boolean iterate (String act_code,long act_times,string act_type)//a line {if (act_code. Equals (act_t ype)) {state.msum + = Act_times; state.mcount++; } return true; }/** * Terminate a partial aggregation and return the state. If the state was a * primitive, just return primitive Java classes like Integer or String. */Public Udafstate terminatepartial () {//state pass//This is SQL standard- Average of zero items should be null. return State.mcount = = 0? Null:state; }/** * Merge with a partial aggregation. * * This function should always had a single argument which had the same * type as the return value of terminate Partial (). */Public boolean merge (Udafstate o) {//Sub-task merge if (o! = null) {state.msum + = O.msum; State.mcount + = O.mcount; } return true; }/** * Terminates the aggregation and return the final result. */Public long terminate () {//Return final result//This is SQL standard-average of zero items should be null. return State.mcount = = 0? 0:state.msum; }} private Actioncount () {//Prevent instantiation}}
The key is to deeply understand the map-reduce work model to better harness hive.
The author of this article: Linger
This article link: http://blog.csdn.net/lingerlanlan/article/details/41920151
Hive Custom Function UDAF development