Hive itself Query Language hql can complete most of the functions, but encountered special needs, it is necessary to write their own UDF implementation. The following is a complete case.
1. Writing UDFs in Eclipse
The ① project adds hive to all the jar packages under Lib and share under Hadoop-common-2.5.1.jar in Hadoop (Hadoop is now the latest version number 2.5.1).
②udf class to inherit the Org.apache.hadoop.hive.ql.exec.UDF class. class to implement the Evaluate.
When we use our own UDF in Hive, Hive invokes the Evaluate method in the class to implement a specific function
③ Export the project as a jar file.
Note: The JDK for the project is consistent with the JDK of the cluster.
Detailed Examples:
Package com.zx.hive.udf;
Import Org.apache.hadoop.hive.ql.exec.UDF;
public class Udftestlength extends udf{public Integer Evaluate (String s) { if (s==null) { return null; } else{ return s.length ();}}
To make the above class into a jar, I use Eclipse to export directly to the Test-udf.jar package. Then put it in the/root folder.
2. Define the function call procedure yourself :
① Add jar package (run in hive command line)
hive> add Jar/root/test-udf.jar;
② creates a temporary function that is invalidated when the Hive command line is closed.
Hive> Create temporary function testlength as ' com.zx.hive.udf.UdfTestLength ';
③ Call
Hive> Select ID, name, testlength (name) from student;
④ saving query results to HDFs
hive> CREATE table result row format delimited fields terminated by ' \ t ' as select ID,testlength(Nation) from Student
(Reprint Please specify, many other contents see:http://blog.csdn.net/hwwn2009/article/details/41289197)
3, the problem encountered:
① need to refer to a third-party package in two ways:
1) When executing hive hql, manually add the jar package required by the UDF via the Add statement: Add Jar/root/***.jar (Test pass).
2) Install Eclipse plugin: Fatjar (test pass)
Online installation Fatjar:
Eclipse Menu bar Help >software updates >search for new features to install>new update site>
Fill in the name and URL
Name: Write a fat bar at random
URL: This is the address input for the Fat jar Http://kurucz-grafika.de/fatjar
To use the Fatjar packaging method:
② program needs to read external resources
First, the external resource is added before the UDF executes, and the command add file [file] is used for a temporary registration in hive.
The file address that is called internally in the UDF is represented directly by the local file address.
For example: String filepath = "/home/dev/test/test.txt", after uploading to hive. The external file address only needs to be changed to string filepath = "./test.txt";
(Reprint Please specify, many other contents see:http://blog.csdn.net/hwwn2009/article/details/41289197)
UDF implementation and considerations for hive