Hive Custom Textinputformat (old version mapreduceapi OK, new MAPREDUCEAPI implementation bug?) )

Source: Internet
Author: User

Our input file Hello0, the content is as follows:

Xiaowang [email protected][email protected] [email protected][email protected] Unknown

There are logically 3 records, separated by @[email protected]. We will implement the custom Textinputformat with the legacy MapReduce API and the new MapReduce API, and then use the hive configuration to load the data.

Start with the Legacy API

1, custom Format6 inherit from Textinputformat

 PackageMytestpackage;Importjava.io.IOException;Importorg.apache.hadoop.conf.Configuration;Importorg.apache.hadoop.io.LongWritable; ImportOrg.apache.hadoop.io.Text; ImportOrg.apache.hadoop.mapred.FileSplit; ImportOrg.apache.hadoop.mapred.InputSplit; Importorg.apache.hadoop.mapred.JobConf; Importorg.apache.hadoop.mapred.JobConfigurable; ImportOrg.apache.hadoop.mapred.LineRecordReader;ImportOrg.apache.hadoop.mapred.RecordReader; ImportOrg.apache.hadoop.mapred.Reporter; ImportOrg.apache.hadoop.mapred.TaskAttemptContext;ImportOrg.apache.hadoop.mapred.TextInputFormat;  Public classFormat6extendsTextinputformat {@Override PublicRecordreader Getrecordreader (inputsplit split, jobconf job, Reporter Reporter)throwsIOException {byte[] recorddelimiterbytes = "@[email protected]". GetBytes (); return NewLinerecordreader (Job, (Filesplit) split, recorddelimiterbytes); }    }

2. Export to Myinputformat.jar, put into hive/lib

3. Configure the use in Hive DDL

Create Table varchar (avarchar(as'mytestpackage.format6  "'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' ;

Successful load data-

New API

1, custom Format5 inherit from Textinputformat

 PackageMytestpackage;ImportOrg.apache.hadoop.mapreduce.InputSplit;ImportOrg.apache.hadoop.mapreduce.RecordReader;ImportOrg.apache.hadoop.mapreduce.TaskAttemptContext;ImportOrg.apache.hadoop.mapreduce.lib.input.LineRecordReader;ImportOrg.apache.hadoop.mapreduce.lib.input.TextInputFormat; Public classFormat5extendsTextinputformat {@Override PublicRecordreader Createrecordreader (inputsplit split, Taskattemptcontext TAC) {byte[] recorddelimiterbytes = "@[email protected]". GetBytes (); return NewLinerecordreader (recorddelimiterbytes); }    }

2. Export to Myinputformat.jar, put into hive/lib

3. Configure the use in Hive DDL

Create Table varchar (avarchar(as'mytestpackage.format5  "'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' ;

It's a mistake!

Put FORMAT5 in the MapReduce debug All Normal (http://www.cnblogs.com/silva/p/4490532.html), why do not give hive to use it? I don't understand. Please advise the students who know. Thanks!

Hive Custom Textinputformat (old version mapreduceapi OK, new MAPREDUCEAPI implementation bug?) )

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.