UDF implementation and considerations for hive

Source: Internet
Author: User

Hive itself Query Language hql can complete most of the functions, but encountered special needs, it is necessary to write their own UDF implementation. The following is a complete case.

1. Writing UDFs in Eclipse

The ① project adds hive to all the jar packages under Lib and share under Hadoop-common-2.5.1.jar in Hadoop (Hadoop is now the latest version number 2.5.1).
②udf class to inherit the Org.apache.hadoop.hive.ql.exec.UDF class. class to implement the Evaluate.

When we use our own UDF in Hive, Hive invokes the Evaluate method in the class to implement a specific function
③ Export the project as a jar file.
Note: The JDK for the project is consistent with the JDK of the cluster.


Detailed Examples:

Package com.zx.hive.udf;
Import Org.apache.hadoop.hive.ql.exec.UDF;
public class Udftestlength extends udf{public    Integer Evaluate (String s)    {        if (s==null)        {            return null;        } else{            return s.length ();}}    
To make the above class into a jar, I use Eclipse to export directly to the Test-udf.jar package. Then put it in the/root folder.



2. Define the function call procedure yourself :

① Add jar package (run in hive command line)
hive> add Jar/root/test-udf.jar;

② creates a temporary function that is invalidated when the Hive command line is closed.
Hive> Create temporary function testlength as ' com.zx.hive.udf.UdfTestLength ';

③ Call
Hive> Select ID, name, testlength (name) from student;

④ saving query results to HDFs

hive> CREATE table result row format delimited fields terminated by ' \ t ' as select ID,testlength(Nation) from Student

(Reprint Please specify, many other contents see:http://blog.csdn.net/hwwn2009/article/details/41289197)


3, the problem encountered:

① need to refer to a third-party package in two ways:

1) When executing hive hql, manually add the jar package required by the UDF via the Add statement: Add Jar/root/***.jar (Test pass).
2) Install Eclipse plugin: Fatjar (test pass)

Online installation Fatjar:
Eclipse Menu bar Help >software updates >search for new features to install>new update site>
Fill in the name and URL
Name: Write a fat bar at random
URL: This is the address input for the Fat jar Http://kurucz-grafika.de/fatjar

To use the Fatjar packaging method:



② program needs to read external resources

First, the external resource is added before the UDF executes, and the command add file [file] is used for a temporary registration in hive.
The file address that is called internally in the UDF is represented directly by the local file address.

For example: String filepath = "/home/dev/test/test.txt", after uploading to hive. The external file address only needs to be changed to string filepath = "./test.txt";

(Reprint Please specify, many other contents see:http://blog.csdn.net/hwwn2009/article/details/41289197)

UDF implementation and considerations for hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.