Hive的UDF,其實很類似Mysql之類的自訂函數
不過它需要用java來編寫,而不是用傳統的SQL來完成
實現一個UDF的步驟如下:
- 實現一個Java Class,繼承自UDF
- 打成jar包,並加入到Hive的ClassPath中
- 產生自訂函數,執行select
- 刪除剛才建立的臨時函數
下面這個UDF,是我給hive的array增加的一個函數
用來判斷array中是否包含某個值,hive的標準函數中並沒有此功能函數
12345678910111213141516171819202122232425262728293031323334 |
package com.sohu.hadoop.hive.udf;import java.util.*;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.BooleanWritable;import org.apache.hadoop.io.Text; public final class ArrayContains extends UDF { public BooleanWritable evaluate(ArrayList<String> arr,Text ele) { BooleanWritable rtn = new BooleanWritable(false); if (arr == null || arr.size() < 1) { return rtn; } try { String cstr = ele.toString(); for (String str : arr) { if (str.equals(cstr)) { rtn = new BooleanWritable(true); break; } } } catch (Exception e) { e.printStackTrace(); } return rtn; }} |
然後執行編譯打包:
javac -classpath /opt/hadoop_client/hadoop/hadoop-0.20.2+228-core.jar:/opt/hadoop_client/hive/lib/hive-exec-0.5.0.jar src/com/sohu/hadoop/hive/udf/ArrayContains.java -d build
jar -cvf hadooop-mc-udf.jar -C build .
最後執行Hive QL查詢:
hive -e "add jar /opt/ysz/udf/hadooop-mc-udf.jar;drop temporary function array_contains;create temporary function array_contains as 'com.sohu.hadoop.hive.udf.ArrayContains';select suv,channelid from pvlog_pre where array_contains(channelid,'2')"