The inputformat Interface contains two methods: getsplits () and createrecordreader (). These two methods are used to define the input and read parts respectively.
1 public abstract class InputFormat<K, V> { 2 3 /** 4 * Logically split the set of input files for the job. 5 * 6 * <p>Each {@link InputSplit} is then assigned to an individual {@link Mapper} 7 * for processing.</p> 8 * 9 * <p><i>Note</i>: The split is a <i>logical</i> split of the inputs and the10 * input files are not physically split into chunks. For e.g. a split could11 * be <i><input-file-path, start, offset></i> tuple. The InputFormat12 * also creates the {@link RecordReader} to read the {@link InputSplit}.13 * 14 * @param context job configuration.15 * @return an array of {@link InputSplit}s for the job.16 */17 public abstract 18 List<InputSplit> getSplits(JobContext context19 ) throws IOException, InterruptedException;20 21 /**22 * Create a record reader for a given split. The framework will call23 * {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before24 * the split is used.25 * @param split the split to be read26 * @param context the information about the task27 * @return a new record reader28 * @throws IOException29 * @throws InterruptedException30 */31 public abstract 32 RecordReader<K,V> createRecordReader(InputSplit split,33 TaskAttemptContext context34 ) throws IOException, 35 InterruptedException;36 37 }
Happening
Hadoop input format (inputformat)