Our input file Hello0, the content is as follows:
Xiaowang [email protected][email protected] [email protected][email protected] Unknown
There are logically 3 records, separated by @[email protected]. We will implement the custom Textinputformat with the legacy MapReduce API and the new MapReduce API, and then use the hive configuration to load the data.
Start with the Legacy API
1, custom Format6 inherit from Textinputformat
PackageMytestpackage;Importjava.io.IOException;Importorg.apache.hadoop.conf.Configuration;Importorg.apache.hadoop.io.LongWritable; ImportOrg.apache.hadoop.io.Text; ImportOrg.apache.hadoop.mapred.FileSplit; ImportOrg.apache.hadoop.mapred.InputSplit; Importorg.apache.hadoop.mapred.JobConf; Importorg.apache.hadoop.mapred.JobConfigurable; ImportOrg.apache.hadoop.mapred.LineRecordReader;ImportOrg.apache.hadoop.mapred.RecordReader; ImportOrg.apache.hadoop.mapred.Reporter; ImportOrg.apache.hadoop.mapred.TaskAttemptContext;ImportOrg.apache.hadoop.mapred.TextInputFormat; Public classFormat6extendsTextinputformat {@Override PublicRecordreader Getrecordreader (inputsplit split, jobconf job, Reporter Reporter)throwsIOException {byte[] recorddelimiterbytes = "@[email protected]". GetBytes (); return NewLinerecordreader (Job, (Filesplit) split, recorddelimiterbytes); } }
2. Export to Myinputformat.jar, put into hive/lib
3. Configure the use in Hive DDL
Create Table varchar (avarchar(as'mytestpackage.format6 "'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' ;
Successful load data-
New API
1, custom Format5 inherit from Textinputformat
PackageMytestpackage;ImportOrg.apache.hadoop.mapreduce.InputSplit;ImportOrg.apache.hadoop.mapreduce.RecordReader;ImportOrg.apache.hadoop.mapreduce.TaskAttemptContext;ImportOrg.apache.hadoop.mapreduce.lib.input.LineRecordReader;ImportOrg.apache.hadoop.mapreduce.lib.input.TextInputFormat; Public classFormat5extendsTextinputformat {@Override PublicRecordreader Createrecordreader (inputsplit split, Taskattemptcontext TAC) {byte[] recorddelimiterbytes = "@[email protected]". GetBytes (); return NewLinerecordreader (recorddelimiterbytes); } }
2. Export to Myinputformat.jar, put into hive/lib
3. Configure the use in Hive DDL
Create Table varchar (avarchar(as'mytestpackage.format5 "'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' ;
It's a mistake!
Put FORMAT5 in the MapReduce debug All Normal (http://www.cnblogs.com/silva/p/4490532.html), why do not give hive to use it? I don't understand. Please advise the students who know. Thanks!
Hive Custom Textinputformat (old version mapreduceapi OK, new MAPREDUCEAPI implementation bug?) )