Hive udfs are similar to user-defined functions such as MySQL.
However, it needs to be written in Java instead of using traditional SQL.
To implement a UDF, follow these steps:
- Implement a Java class that inherits from UDF
- Compress the jar package and add it to the hive classpath.
- Generate a user-defined function and execute select
- Delete the temporary function just created
The UDF below is a function I added to the hive array.
Used to determine whether an array contains a value. This function is not available in hive standard functions.
12345678910111213141516171819202122232425262728293031323334 |
package com.sohu.hadoop.hive.udf;import java.util.*;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.BooleanWritable;import org.apache.hadoop.io.Text; public final class ArrayContains extends UDF { public BooleanWritable evaluate(ArrayList<String> arr,Text ele) { BooleanWritable rtn = new BooleanWritable(false); if (arr == null || arr.size() < 1) { return rtn; } try { String cstr = ele.toString(); for (String str : arr) { if (str.equals(cstr)) { rtn = new BooleanWritable(true); break; } } } catch (Exception e) { e.printStackTrace(); } return rtn; }} |
Then execute the compilation and packaging:
Javac-classpath/opt/hadoop_client/hadoop/hadoop-0.20.2 + 228-core.jar:/opt/hadoop_client/hive/lib/hive-exec-0.5.0.jar src/COM/Sohu/hadoop/hive/UDF/arraycontains. java-d build
Jar-cvf hadooop-mc-udf.jar-C build.
Finally, execute the hive QL query:
Hive-e "add JAR/opt/fe-/ UDF/hadooop-mc-udf.jar; drop temporary function array_contains; create temporary function array_contains as 'com. sohu. hadoop. hive. UDF. arraycontains '; select SUV, channelid from pvlog_pre where array_contains (channelid, '2 ')"