The capabilities of the MapReduce program to invoke each class

Source: Internet
Author: User
Tags map class

1. Map class

The map class inherits the mapper in the library class, namely Mapper<keyin, Valuein, Keyout, valueout>. The map method is usually overridden in the map class, where map accepts only one key-value at a time, then pre-processes it, and then sends out the processed data. The Map method is:

protected void map (Object key, value value, context context)         throws IOException, interruptedexception{    context.write ((keyout) key, (valueout) value);

2, Reducer class

The Reducer class inherits the reducer in the class library, the prototype is Reducer<keyin, Valuein, Keyout, Valueout>,reduce class except for the reduce method, the others are the same as the map, and the functions are the same. The Reduce method is:

protected void reduce (Text key, interable<interwrite> values, context context)        throws  IOException, interruptedexception {    for(interable<intwritable> value:values) {        Context.write (Text key, intwritable value;    }           }

3. MapReduce Drive

In the simplest case, the code in the main function, typically includes:

Configuration conf =NewConfiguration (); //get Input Output file pathstring[] Otherargs =NewGenericoptionsparser (Conf,args). Getremainingargs (); if(Otherargs.length! = 2) {System.err.println ("Usage WordCount <int> <out>"); System.exit (2); } Job Job=NewJob (conf, "Dedup"); Job.setjarbyclass (Dedup.class);//Main classJob.setmapperclass (Map.class);//Map ClassJob.setcombinerclass (Reduce.class);//Job Composition ClassJob.setreducerclass (Reduce.class);//Reduce classJob.setoutputkeyclass (Text.class);//set the key class for the job output dataJob.setoutputvalueclass (Text.class);//set the value class for the job output dataFileinputformat.addinputpath (Job,NewPath (Otherargs[0]));//file InputFileoutputformat.setoutputpath (Job,NewPath (otherargs[1]));//file OutputSystem.exit (Job.waitforcompletion (true) ? 0:1); }

In fact, it also includes a maprecude minimum driver called the Minimapreducedriver class,

        New Job (conf, "Dedup");        Job.setjarbyclass (Dedup. class );         New Path (otherargs[0]));         New Path (otherargs[1]));        System.exit (Job.waitforcompletion (true)? 0:1);

4. InputFormat interface

The hierarchical structure of the InputFormat class is as follows. Textinputformat is the default implementation of InputFormat, which is effective when there is no explicit key-value in the input data, and the returned key represents the offset of this row of data, and value is the contents of the line.

5, Inputsplit class

By default, Fileinputformat and its subclasses split the file as a radix of 64MB (the same as the proposed split size). By processing files in chunks, you can have multiple map tasks work in parallel with one file. For large files, performance is greatly improved. The input to the map is one of the input shards, which is inputsplits.

Inputsplit subclasses have Filesplit and Combinefilesplit. Both include the file path, the Shard start location, the Shard size, and the host list where the Shard data is stored. But Combinefilesplit is for small files, it will be a lot of small files in a inputsplit, so that can handle a lot of small files.

For some files are not fragmented, you can do it in two ways, the first is to set the minimum shard size of the file to be larger than the file size, the second method is to use the Fileinputformat subclass, and overload the Issplitable method, set the return value to False.

6, Recordreader class

Inputsplit defines how to slice the work, and the Recordreader class defines how to load the data and convert it to a key-value pair that is appropriate for the map method to read. Its default input format is Textinputformat.

The capabilities of the MapReduce program to invoke each class

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.