In the process of log analysis, hive is used in the hadoop framework. However, some log processing functions in hive are insufficient, and UDF is needed for expansion.
1. Create a Java project hiveudf in eclipse and then create a class package (COM. afan) name (udflower)
2 add jar library hadoop-0.20.2-core.jar hive-exec-0.7.0-cdh3u0.jar two files to project
3. write code
package com.afan;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;public class UDFLower extends UDF{ public Text evaluate(final Text s){ if (null == s){ return null; } return new Text(s.toString().toLowerCase()); }}
4. Compile and output the package file udf_hive.jar.
5. Put udf_hive.jar In the configured Linux folder. The path is/home/UDF/udf_hive.jar.
6 open hive Command Line Test
Hive> Add JAR/home/UDF/udf_hive.jar;
Added
Udf_hive.jar to class path
Added Resource: udf_hive.jar
Create a UDF
Hive> create temporary function my_lower as 'com. afan. udflower ';
Create Test Data
Hive> Create Table dual (Info string );
Import data file data.txt
The content of data.txt is
Who
AM
I
Hello
Hive>
Load data local inpath '/home/data/data.txt' into Table dual;
Hive>
Select info from dual;
Total
Mapreduce jobs = 1
Launching job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce Operator
Starting job = job_201105150525_0003, tracking url = http: // localhost: 50030/jobdetails. jsp? Jobid = job_201105150525_0003
Kill command =/usr/local/hadoop/bin/../bin/hadoop job-dmapred. Job. Tracker = localhost: 9001-kill job_201105150525_0003
06:46:05, 459 stage-1 Map = 0%, reduce = 0%
06:46:10, 905 stage-1 Map = 100%, reduce = 0%
06:46:13, 963 stage-1 Map = 100%, reduce = 100%
Ended job = job_201105150525_0003
OK
Who
AM
I
Hello
Use UDF
Hive> select my_lower (Info) from dual;
Total mapreduce jobs = 1
Launching job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce Operator
Starting job = job_201105150525_0002, tracking url = http: // localhost: 50030/jobdetails. jsp? Jobid = job_201105150525_0002
Kill command =/usr/local/hadoop/bin/../bin/hadoop job-dmapred. Job. Tracker = localhost: 9001-kill job_201105150525_0002
06:43:26, 100 stage-1 Map = 0%, reduce = 0%
06:43:34, 364 stage-1 Map = 100%, reduce = 0%
06:43:37, 484 stage-1 Map = 100%, reduce = 100%
Ended job = job_201105150525_0002
OK
Who
AM
I
Hello
Successfully passed the test
Reference http://landyer.iteye.com/blog/1070377