Hive (iv) Hive function and hive shell

Source: Internet
Author: User
Tags mathematical functions save file python script

First, the Hive function

1. Hive Built-in function

(1) More content, see "Hive Official Documents"
Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
(2) Detailed explanation:
Http://blog.sina.com.cn/s/blog_83bb57b70101lhmk.html

(3) test the shortcut to the built-in function:

1. Creating a dual Table CREATE table dual (ID string);
2. Load a file (one line, a space) to the dual table
3, select substr (' Huangbo ', 2,3) from dual;

(4) View built-in function show functions;

Show details of functions desc function abs;

Display extended information for functions desc function extended concat;

(5) Detailed use see document

2. Hive Custom Function

When the built-in functions provided by Hive do not meet your business processing needs, consider using user-defined functions at this time
A UDF (user-defined function) acts on a single data row, producing a data row as output. (mathematical functions, String functions)
UDAF (user-defined aggregate function user-defined Aggregation funcation): Receives multiple input data rows and produces an output data row. (Count, Max)

3. An example of a simple UDF function:

(1) First develop a simple Java class, inherit Org.apache.hadoop.hive.ql.exec.UDF, overload evaluate method

(2) Upload a jar package to the server

(3) Adding a jar package to Hive's Classpath

Hive>add Jar/root/hivejar/udf.jar;

(4) Create a temporary function to associate with a well-developed class.

Hive>create temporary function toLowerCase as ' com.mazh.udf. toLowerCase ';

(5) At this point, you can use the custom function in the HQL

Select toLowerCase (name), age from STUDENTSS;

4, transform implementation (the JSON data in the timestamp field into a date number)

The TRANSFORM keyword for Hive provides the ability to invoke a self-write script in SQL. Suitable for functions that are not in Hive and do not want to write UDFs
It is explained by an example in detail.

(1) First load the Rating.json file to an original table in hive Rat_json
CREATE table Rat_json (line string) row format delimited;
Load data local inpath '/root/rating.json ' into table Rat_json;
(2) Create rate This table is used to store the fields that parse the JSON:
CREATE TABLE rate (movie int, rate int, Unixtime int, UserID int.) row format delimited fields
Terminated by ' \ t ';
Parse JSON to get the result and then deposit the rate table:
Insert into table rate Select Get_json_object (line, ' $.movie ') as Moive,get_json_object (line, ' $.rate ')
As Rate,get_json_object (line, ' $.timestamp ') as-unixtime,get_json_object (line, ' $.uid ') as UserID
From Rat_json;
(3) Use Transform+python to convert Unixtime to weekday
Edit a Python script file first

VI weekday_mapper.py

#!/bin/python
Import Sys
Import datetime
For line in Sys.stdin:
line = Line.strip ()
Movie,rate,unixtime,userid = Line.split (' \ t ')
Weekday = Datetime.datetime.fromtimestamp (float (unixtime)). Isoweekday ()
print ' \ t '. Join ([movie, Rate, str (weekday), UserID])

Save File
Then, add the file to Hive's classpath:

Last query to see if the data is correct:
SELECT DISTINCT (weekday) from lastjsontable;

Second, Hive Shell operation

Hive (iv) Hive function and hive shell

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.