MapReduce program template (with new/Legacy API)

Last Update:2018-07-26 Source: Internet

Author: User

Tags abstract

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently in learning MapReduce programming, after reading the two books "Hadoop in Action" and "hadoop:the Definitive Guide", finally successfully ran a self-written mapreduce program. The MapReduce program is generally modified on a template, so I'll post the mapreduce template here. There is also a key point: the MapReduce API before and after the hadoop-0.20.0, the following changes occurred:

(1) The new API tends to use abstract classes rather than interfaces. Mapper and reducer are abstract classes in the new API.

(2) The new API is in the Org.apache.hadoop.mapreduce package and the child package, the legacy API is placed in the org.apache.hadoop.mapred. In programming must pay attention to two packages do not mix or use the wrong, the program should be the correct unified import into a new package or old package. When I first started writing code, there was an error in the program due to the lack of attention, especially when the map or reduce class and the job configuration were just built.

(3) Context object is widely used in the new API, such as Mapcontext, which basically acts as the jobconf Outputcollector and reporter role.

(4) The new API supports both "push" and "pull" iterations.

(5) The new API is configured the same. The old API uses the Jobconf object for job configuration, and the job configuration in the new API is configured through configuration.

(6) The job control in the new API is performed with the job class to be responsible for the old version using Jobclient. This is also a place to note when writing code. The above is from the hadoop:the definitive guide in the code, I added some of my own thought important points of attention, hoping to be useful.
Old API version of the template:
[Java] View plain copy import java.io.ioexception; import java.util.iterator; import org.apache.hadoop.conf.configuration; import org.apache.hadoop.conf.configured; import org.apache.hadoop.fs.path; import org.apache.hadoop.io.text; import org.apache.hadoop.mapred.jobclient; import org.apache.hadoop.mapred.mapper; import org.apache.hadoop.mapred.jobconf; Import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.outputcollector; import org.apache.hadoop.mapred.reducer; import org.apache.hadoop.mapred.reporter; import org.apache.hadoop.mapred.fileinputformat; import org.apache.hadoop.mapred.fileoutputformat; import org.apache.hadoop.mapred.keyvaluetextinputformat; Import org.apache.hadoop.mapred.textoutputformat; /** * There are differences between the old and new versions of the references in this class, You can use jobconf in mapred, but only job * in MapReduce. And Fileinputoutformat,fileoutputformat is present in two classes, so it leads to the following * errors, as long as all is set in a class, it is possible * / //import org.apache.hadoop.mapreduce.job; //import org.apache.hadoop.mapreduce.lib.input.fileinputformat; //import org.apache.hadoop.mapreduce.lib.input.keyvaluetextinputformat; //import org.apache.hadoop.mapreduce.lib.output.fileoutputformat; //import org.apache.hadoop.mapreduce.lib.output.textoutputformat; import org.apache.hadoop.util.tool; import org.apache.hadoop.util.toolrunner; /** * * @author napoleongjc * @version 1.0 */ /* * is a map/reduce framework that providesGeneral mechanism for mapper or reducer output data * (including intermediate output results and output of the job). * reporter is a mechanism for Map/reduce applications to report progress, set application-level status messages, * update counters (counters). */ public class myjob extends configured implements tool{ //Remember Map reduce's class signature format public

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More