Hadoop input format (inputformat)

Source: Internet
Author: User

The inputformat Interface contains two methods: getsplits () and createrecordreader (). These two methods are used to define the input and read parts respectively.

 1 public abstract class InputFormat<K, V> { 2  3   /**  4    * Logically split the set of input files for the job.   5    *  6    * <p>Each {@link InputSplit} is then assigned to an individual {@link Mapper} 7    * for processing.</p> 8    * 9    * <p><i>Note</i>: The split is a <i>logical</i> split of the inputs and the10    * input files are not physically split into chunks. For e.g. a split could11    * be <i>&lt;input-file-path, start, offset&gt;</i> tuple. The InputFormat12    * also creates the {@link RecordReader} to read the {@link InputSplit}.13    * 14    * @param context job configuration.15    * @return an array of {@link InputSplit}s for the job.16    */17   public abstract 18     List<InputSplit> getSplits(JobContext context19                                ) throws IOException, InterruptedException;20   21   /**22    * Create a record reader for a given split. The framework will call23    * {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before24    * the split is used.25    * @param split the split to be read26    * @param context the information about the task27    * @return a new record reader28    * @throws IOException29    * @throws InterruptedException30    */31   public abstract 32     RecordReader<K,V> createRecordReader(InputSplit split,33                                          TaskAttemptContext context34                                         ) throws IOException, 35                                                  InterruptedException;36 37 }

 

 

 

 

Happening

Hadoop input format (inputformat)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.