在做日誌分析的過程中,用到了hadoop架構中的hive,不過有些Tlog用hive中的函數處理顯得力不從心,就需要用udf來進行擴充處理了
1 在eclipse中建立java project hiveudf 然後建立class package(com.afan) name(UDFLower)
2 添加jar library hadoop-0.20.2-core.jar hive-exec-0.7.0-cdh3u0.jar兩個檔案到project
3 編寫代碼
package com.afan;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;public class UDFLower extends UDF{ public Text evaluate(final Text s){ if (null == s){ return null; } return new Text(s.toString().toLowerCase()); }}
4 編譯輸出打包檔案為udf_hive.jar
5 將udf_hive.jar放入配置好的linux系統的檔案夾中路徑為/home/udf/udf_hive.jar
6 開啟hive命令列測試
hive> add jar /home/udf/udf_hive.jar;
Added
udf_hive.jar to class path
Added resource: udf_hive.jar
建立udf函數
hive> create temporary function my_lower as 'com.afan.UDFLower';
建立測試資料
hive> create table dual (info string);
匯入資料檔案data.txt
data.txt檔案內容為
WHO
AM
I
HELLO
hive>
load data local inpath '/home/data/data.txt' into table dual;
hive>
select info from dual;
Total
MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201105150525_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201105150525_0003
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201105150525_0003
2011-05-15 06:46:05,459 Stage-1 map = 0%, reduce = 0%
2011-05-15 06:46:10,905 Stage-1 map = 100%, reduce = 0%
2011-05-15 06:46:13,963 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201105150525_0003
OK
WHO
AM
I
HELLO
使用udf函數
hive> select my_lower(info) from dual;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201105150525_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201105150525_0002
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201105150525_0002
2011-05-15 06:43:26,100 Stage-1 map = 0%, reduce = 0%
2011-05-15 06:43:34,364 Stage-1 map = 100%, reduce = 0%
2011-05-15 06:43:37,484 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201105150525_0002
OK
who
am
i
hello
經測試成功通過
參考文章http://landyer.iteye.com/blog/1070377