Detailed process of UDF implementation in hive

Hive udfs are similar to user-defined functions such as MySQL.

However, it needs to be written in Java instead of using traditional SQL.

To implement a UDF, follow these steps:

  1. Implement a Java class that inherits from UDF
  2. Compress the jar package and add it to the hive classpath.
  3. Generate a user-defined function and execute select
  4. Delete the temporary function just created

The UDF below is a function I added to the hive array.

Used to determine whether an array contains a value. This function is not available in hive standard functions.

package;import java.util.*;import org.apache.hadoop.hive.ql.exec.UDF;import;import;import; public final class ArrayContains extends UDF {   public BooleanWritable evaluate(ArrayList<String> arr,Text ele)    {        BooleanWritable rtn = new BooleanWritable(false);        if (arr == null || arr.size() < 1)        {            return rtn;        }        try {            String cstr = ele.toString();               for (String str : arr)            {                if (str.equals(cstr))                {                    rtn = new BooleanWritable(true);                    break;                }            }         } catch (Exception e) {            e.printStackTrace();        }         return rtn;    }}

Then execute the compilation and packaging:

Javac-classpath/opt/hadoop_client/hadoop/hadoop-0.20.2 + 228-core.jar:/opt/hadoop_client/hive/lib/hive-exec-0.5.0.jar src/COM/Sohu/hadoop/hive/UDF/arraycontains. java-d build
Jar-cvf hadooop-mc-udf.jar-C build.

Finally, execute the hive QL query:

Hive-e "add JAR/opt/fe-/ UDF/hadooop-mc-udf.jar; drop temporary function array_contains; create temporary function array_contains as 'com. sohu. hadoop. hive. UDF. arraycontains '; select SUV, channelid from pvlog_pre where array_contains (channelid, '2 ')"

