[Hive-languagemanual] Transform [not understand]

Last Update:2015-01-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transform/map-reduce Syntax
- SQL Standard Based Authorization disallows TRANSFORM
- TRANSFORM Examples
Schema-less Map-reduce Scripts
Typing the output of TRANSFORM

Transform/map-reduce Syntax

Users can also plug in their own custom mappers and reducers in the data stream by using features natively supported in th E Hive 2.0 language. e.g. in order to run a custom mapper Script-map_script-and a custom reducer script-reduce_script-the user can issu E The following command which uses the TRANSFORM clause to embed the mapper and the reducer scripts.

By default, columns is transformed to STRING and delimited by TAB before feeding to the user script; Similarly, all NULL values would be converted to the literal string \ n order to differentiate NULL values from Empty strings. The standard output of the user script is treated as tab-separated STRINGcolumns, any cell containing only c3>\n would be re-interpreted as a NULL, and then the resulting STRING column would be cast to the data type specified In the table declaration in the usual. User scripts can output debug information to standard error which'll be is shown on the Task Detail page on Hadoop. These defaults can overridden with ROW FORMAT....

In Windows, use "cmd/c your_script" instead of just "Your_script"

Warning

Icon

It is your responsibility to sanitize any STRING columns prior to transformation. If your STRING column contains tabs, an identity transformer would not give the what do you started with! To help with this, see Regexp_replace and REPLACE the tabs with some other character on their it into the TRANSFORM () Cal L.

Warning

Icon

Formally, MAP ... and REDUCE ... are syntactic transformations of SELECT TRANSFORM (...). In other words, they serve as comments or notes to the reader of the query. Beware:use of these keywords may be dangerous as (e.g.) typing "reduce" does no force a REDUCE phase to occur a nd typing "MAP" does not force a new map phase!

Please also see Sort by/cluster by/distribute by and Larry Ogrodnek ' s blog post.

clusterBy: CLUSTER BY colName (‘,‘colName)*distributeBy: DISTRIBUTE BY colName (‘,‘colName)*sortBy: SORT BY colName (ASC | DESC)? (‘,‘colName (ASC | DESC)?)*rowFormat : ROW FORMAT (DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [ESCAPED BY char] [LINES SEPARATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES property_name=property_value, property_name=property_value, ...])outRowFormat : rowFormatinRowFormat : rowFormatoutRecordReader : RECORDREADER classNamequery: FROM ( FROM src MAP expression (‘,‘expression)* (inRowFormat)? USING ‘my_map_script‘ ( AS colName (‘,‘colName)* )? (outRowFormat)? (outRecordReader)? ( clusterBy? | distributeBy? sortBy? ) src_alias ) REDUCE expression (‘,‘expression)* (inRowFormat)? USING ‘my_reduce_script‘ ( AS colName (‘,‘colName)* )? (outRowFormat)? (outRecordReader)? FROM ( FROM src SELECT TRANSFORM ‘(‘expression (‘,‘expression)* ‘)‘ (inRowFormat)? USING ‘my_map_script‘ ( AS colName (‘,‘colName)* )? (outRowFormat)? (outRecordReader)? ( clusterBy? | distributeBy? sortBy? ) src_alias ) SELECT TRANSFORM ‘(‘expression (‘,‘expression)* ‘)‘ (inRowFormat)? USING ‘my_reduce_script‘ ( AS colName (‘,‘colName)* )? (outRowFormat)? (outRecordReader)?

SQL Standard Based Authorization disallows TRANSFORM

The TRANSFORM clause is disallowed when SQL standard based authorization are configured in Hive 0.13.0 and later releases ( HIVE-6415).

TRANSFORM Examples

Example #1:

FROM ( FROM pv_users MAP pv_users.userid, pv_users.date USING ‘map_script‘ AS dt, uid CLUSTER BY dt) map_outputINSERT OVERWRITE TABLE pv_users_reduced REDUCE map_output.dt, map_output.uid USING ‘reduce_script‘ AS date, count;FROM ( FROM pv_users SELECT TRANSFORM(pv_users.userid, pv_users.date) USING ‘map_script‘ AS dt, uid CLUSTER BY dt) map_outputINSERT OVERWRITE TABLE pv_users_reduced SELECT TRANSFORM(map_output.dt, map_output.uid) USING ‘reduce_script‘ AS date, count;

Example #2

FROM ( FROM src SELECT TRANSFORM(src.key, src.value) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe‘ USING ‘/bin/cat‘ AS (tkey, tvalue) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe‘ RECORDREADER ‘org.apache.hadoop.hive.ql.exec.TypedBytesRecordReader‘) tmapINSERT OVERWRITE TABLE dest1 SELECT tkey, tvalue

Schema-less Map-reduce Scripts

If there is no as clause after USING my_script, Hive assumes, the output of the script contains 2 PA Rts:key which is before the first tab, and value which are the rest after the first tab. Specifying as key, value because in this case, value would only contain the portion between the first tab and the Second tab if there is multiple tabs.

Note that we can directly does CLUSTER by key without specifying the output schema of the scripts.

FROM ( FROM pv_users MAP pv_users.userid, pv_users.date USING ‘map_script‘ CLUSTER BY key) map_outputINSERT OVERWRITE TABLE pv_users_reduced REDUCE map_output.key, map_output.value USING ‘reduce_script‘ AS date, count;

Typing the output of TRANSFORM

The output fields from a script is typed as strings by default; For example in

SELECT TRANSFORM(stuff)USING ‘script‘AS thing1, thing2

They can immediately casted with the syntax:

SELECT TRANSFORM(stuff)USING ‘script‘AS (thing1 INT, thing2 INT)

[Hive-languagemanual] Transform [not understand]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Hive-languagemanual] Transform [not understand]

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Hive-languagemanual] Transform [not understand]

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support