First, the Hive function
1. Hive Built-in function
(1) More content, see "Hive Official Documents"
Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
(2) Detailed explanation:
Http://blog.sina.com.cn/s/blog_83bb57b70101lhmk.html
(3) test the shortcut to the built-in function:
1. Creating a dual Table CREATE table dual (ID string);
2. Load a file (one line, a space) to the dual table
3, select substr (' Huangbo ', 2,3) from dual;
(4) View built-in function show functions;
Show details of functions desc function abs;
Display extended information for functions desc function extended concat;
(5) Detailed use see document
2. Hive Custom Function
When the built-in functions provided by Hive do not meet your business processing needs, consider using user-defined functions at this time
A UDF (user-defined function) acts on a single data row, producing a data row as output. (mathematical functions, String functions)
UDAF (user-defined aggregate function user-defined Aggregation funcation): Receives multiple input data rows and produces an output data row. (Count, Max)
3. An example of a simple UDF function:
(1) First develop a simple Java class, inherit Org.apache.hadoop.hive.ql.exec.UDF, overload evaluate method
(2) Upload a jar package to the server
(3) Adding a jar package to Hive's Classpath
Hive>add Jar/root/hivejar/udf.jar;
(4) Create a temporary function to associate with a well-developed class.
Hive>create temporary function toLowerCase as ' com.mazh.udf. toLowerCase ';
(5) At this point, you can use the custom function in the HQL
Select toLowerCase (name), age from STUDENTSS;
4, transform implementation (the JSON data in the timestamp field into a date number)
The TRANSFORM keyword for Hive provides the ability to invoke a self-write script in SQL. Suitable for functions that are not in Hive and do not want to write UDFs
It is explained by an example in detail.
(1) First load the Rating.json file to an original table in hive Rat_json
CREATE table Rat_json (line string) row format delimited;
Load data local inpath '/root/rating.json ' into table Rat_json;
(2) Create rate This table is used to store the fields that parse the JSON:
CREATE TABLE rate (movie int, rate int, Unixtime int, UserID int.) row format delimited fields
Terminated by ' \ t ';
Parse JSON to get the result and then deposit the rate table:
Insert into table rate Select Get_json_object (line, ' $.movie ') as Moive,get_json_object (line, ' $.rate ')
As Rate,get_json_object (line, ' $.timestamp ') as-unixtime,get_json_object (line, ' $.uid ') as UserID
From Rat_json;
(3) Use Transform+python to convert Unixtime to weekday
Edit a Python script file first
VI weekday_mapper.py
#!/bin/python
Import Sys
Import datetime
For line in Sys.stdin:
line = Line.strip ()
Movie,rate,unixtime,userid = Line.split (' \ t ')
Weekday = Datetime.datetime.fromtimestamp (float (unixtime)). Isoweekday ()
print ' \ t '. Join ([movie, Rate, str (weekday), UserID])
Save File
Then, add the file to Hive's classpath:
Last query to see if the data is correct:
SELECT DISTINCT (weekday) from lastjsontable;
Second, Hive Shell operation
Hive (iv) Hive function and hive shell