MapReduce program template (with new/Legacy API)

Source: Internet
Author: User
Tags abstract
Recently in learning MapReduce programming, after reading the two books "Hadoop in Action" and "hadoop:the Definitive Guide", finally successfully ran a self-written mapreduce program.             The MapReduce program is generally modified on a template, so I'll post the mapreduce template here. There is also a key point: the MapReduce API before and after the hadoop-0.20.0, the following changes occurred:

(1) The new API tends to use abstract classes rather than interfaces. Mapper and reducer are abstract classes in the new API.

(2) The new API is in the Org.apache.hadoop.mapreduce package and the child package, the legacy API is placed in the org.apache.hadoop.mapred. In programming must pay attention to two packages do not mix or use the wrong, the program should be the correct unified import into a new package or old package. When I first started writing code, there was an error in the program due to the lack of attention, especially when the map or reduce class and the job configuration were just built.

(3) Context object is widely used in the new API, such as Mapcontext, which basically acts as the jobconf Outputcollector and reporter role.

(4) The new API supports both "push" and "pull" iterations.

(5) The new API is configured the same. The old API uses the Jobconf object for job configuration, and the job configuration in the new API is configured through configuration.

(6) The job control in the new API is performed with the job class to be responsible for the old version using Jobclient.          This is also a place to note when writing code. The above is from the hadoop:the definitive guide in the code, I added some of my own thought important points of attention, hoping to be useful.
Old API version of the template:
[Java]  View plain copy import java.io.ioexception;   import java.util.iterator;      import org.apache.hadoop.conf.configuration;   import  org.apache.hadoop.conf.configured;   import org.apache.hadoop.fs.path;   import  org.apache.hadoop.io.text;   import org.apache.hadoop.mapred.jobclient;   import  org.apache.hadoop.mapred.mapper;   import org.apache.hadoop.mapred.jobconf;   Import  org.apache.hadoop.mapred.MapReduceBase;   import org.apache.hadoop.mapred.outputcollector;    import org.apache.hadoop.mapred.reducer;   import  org.apache.hadoop.mapred.reporter;   import org.apache.hadoop.mapred.fileinputformat;    import org.apache.hadoop.mapred.fileoutputformat;   import  org.apache.hadoop.mapred.keyvaluetextinputformat;   Import org.apache.hadoop.mapred.textoutputformat;        /**   *  There are differences between the old and new versions of the references in this class, You can use jobconf in mapred, but only job   *  in MapReduce. And Fileinputoutformat,fileoutputformat is present in two classes, so it leads to the following    *  errors, as long as all is set in a class, it is possible    * /  //import org.apache.hadoop.mapreduce.job;  //import  org.apache.hadoop.mapreduce.lib.input.fileinputformat;  //import  org.apache.hadoop.mapreduce.lib.input.keyvaluetextinputformat;  //import  org.apache.hadoop.mapreduce.lib.output.fileoutputformat;  //import  org.apache.hadoop.mapreduce.lib.output.textoutputformat;      import  org.apache.hadoop.util.tool;   import org.apache.hadoop.util.toolrunner;      /**   *    *  @author  napoleongjc   *  @version  1.0   */  /*   *  is a map/reduce framework that providesGeneral mechanism for  mapper or reducer output data    *  (including intermediate output results and output of the job).    * reporter is a mechanism for Map/reduce applications to report progress, set application-level status messages,    *  update counters (counters).    */   public class myjob extends configured implements tool{        //Remember Map reduce's class signature format        public 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.