Small case of high-order mapreduce_4

Small case of high-order mapreduce_4_reducer side coupling

Last Update:2015-08-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Data Set File:

Customers

1,stephanie Leung,555-555-5552,edward kim,123-456-78903,jose Madriz,281-330-80044,david Stork,408-555-0000

Orders

3,a,12.95,02-jun-20081,b,88.25,20-may-20082,c,32.00,30-nov,20073,d,25.02,22-jan-2009

The program expects to achieve the result:

1Stephanie Leung,555-555-555,b,88.25,20-may-20082edward Kim,123-456-7890,c,32.00,30-nov,20073jose Madriz, 281-330-8004,d,25.02,22-jan-20093jose madriz,281-330-8004,a,12.95,02-jun-2008

Next, let's implement this applet:

In the previous article, we need to implement several classes, one is a subclass of Taggedmapoutput, and two are datajoinmapperbase subclasses, one is mapper, one is reducer, the following is a concrete implementation:

The Taggedwritable class inherits from the Taggedmapoutput:

Import Java.io.datainput;import Java.io.dataoutput;import Java.io.ioexception;import Org.apache.hadoop.contrib.utils.join.taggedmapoutput;import Org.apache.hadoop.io.text;import Org.apache.hadoop.io.writable;import Org.apache.hadoop.util.reflectionutils;/*taggedmapoutput is an abstract data type, Encapsulates the label and record contents here as Datajoinmapperbase output value type, need to implement writable interface, so to implement two serialization method custom input type */public class Taggedwritable extends Taggedmapoutput {Private writable Data;public taggedwritable () {This.tag = new Text ();} public taggedwritable (Writable da TA)//constructor {//tag is to partition the dataset by key This.tag = new Text ();//tag can be set through the Settag () method this.data = data;} @Overridepublic void ReadFields (Datainput in) throws IOException {Tag.readfields (in);  String Dataclz = In.readutf (); if (This.data = = NULL | |!this.data.getclass (). GetName (). Equals (Dataclz)) {try {this.data = (writable) Reflectionutils.newinstance (Class.forName (DATACLZ), null);} catch (ClassNotFoundException e) {e.printstacktrace ();}} Data.readfields (in);} @Overridepublic void Write (DATAOUTPUT out) throws IOException {tag.write, Out.writeutf (This.data.getClass (). GetName ());d ata.write (out);} @Overridepublic writable GetData () {return data;}}

The Joinmapperl class inherits from the Datajoinmapperbase:

Import Org.apache.hadoop.contrib.utils.join.datajoinmapperbase;import Org.apache.hadoop.contrib.utils.join.taggedmapoutput;import Org.apache.hadoop.io.text;import Com.demo.writables.taggedwritable;public class Joinmapper extends Datajoinmapperbase {//This is called at the start of a task to generate tags// The role of the label----tag directly as a file name is to partition the dataset @overrideprotected Text Generateinputtag (String inputfile) {System.out.println (" Inputfile = "+ inputfile); return new Text (inputfile);}  Here we have determined that the delimiter is ', ', more generally, that the user should be able to specify the delimiter and the group key himself. Set the group key @overrideprotected Text Generategroupkey (taggedmapoutput record) {String tag = ((Text) Record.gettag ()). toString (); System.out.println ("tag =" + tag); String line = ((Text) Record.getdata ()). ToString (); string[] tokens = Line.split (","); return new Text (Tokens[0]);} Returns a taggedwritable@overrideprotected taggedmapoutput generatetaggedmapoutput (Object value) with any text tag we want { Taggedwritable Retv = new Taggedwritable ((Text) value); Retv.settag (This.inputtag); Do not forget to set the label of the current key value return retv;///}}

Joinreducer Integrated from Datajoinreducerbase:

Import Org.apache.hadoop.contrib.utils.join.datajoinreducerbase;import Org.apache.hadoop.contrib.utils.join.taggedmapoutput;import Org.apache.hadoop.io.text;import Com.demo.writables.taggedwritable;public class Joinreducer extends Datajoinreducerbase {//two parameter array size must be the same, And at most equals the number of data sources @overrideprotected Taggedmapoutput combine (object[] tags, object[] values) {if (Tags.length < 2) return Null This step implements the inner junction string joinedstr = ""; for (int i = 0; i < values.length; i++) {if (i > 0) joinedstr + = ",";//comma as the original two data source record link delimiter taggedwritable tw = (taggedwritable) values[i]; String line = ((Text) Tw.getdata ()). ToString (); string[] tokens = Line.split (",", 2); Divides a record into two groups, removing the group key name from the first group. Joinedstr + = tokens[1];} Taggedwritable Retv = new Taggedwritable (new text (JOINEDSTR)); Retv.settag ((Text) tags[0]); This only retv the group key as the final output key. return Retv;}}

There is also a program entry for MapReduce:

Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.conf.configured;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapred.fileinputformat;import Org.apache.hadoop.mapred.fileoutputformat;import Org.apache.hadoop.mapred.jobclient;import Org.apache.hadoop.mapred.jobconf;import Org.apache.hadoop.mapred.textinputformat;import Org.apache.hadoop.mapred.textoutputformat;import Org.apache.hadoop.util.tool;import Org.apache.hadoop.util.toolrunner;import Com.demo.mappers.joinmapper;import Com.demo.reducers.joinreducer;import Com.demo.writables.taggedwritable;public class DataJoinDriver extends  Configured implements Tool {public int run (string[] args) throws Exception {Configuration conf = getconf (); if (args.length ! = 2) {System.err.println ("Usage:datajoin <input path> <output path>"); System.exit (-1);} Path in = new Path (args[0]); Path out = new path (args[1]); jobconf job = new jobconf (conF, Datajoindriver.class); Job.setjobname ("Datajoin");//filesystem HDFs =filesystem.get (conf); FileSystem HDFs = In.getfilesystem (conf); Fileinputformat.setinputpaths (Job, in), if (Hdfs.exists (New Path (args[1))) {Hdfs.delete (New path (Args[1]), true);} Fileoutputformat.setoutputpath (Job, out); Job.setmapperclass (Joinmapper.class); Job.setreducerclass ( Joinreducer.class); Job.setinputformat (Textinputformat.class); Job.setoutputformat (TextOutputFormat.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Taggedwritable.class); Jobclient.runjob (Job); return 0;} public static void Main (string[] args) throws Exception {args = new string[]{"Hdfs://localhost:9000/input/different data SOURCE Data/*.txt "," hdfs://localhost:9000/output/secondoutput1 "};int res = Toolrunner.run (new Configuration (), New Datajoindriver (), args); System.exit (res);}}

Program Run Result:

Small case of high-order mapreduce_4_reducer side coupling

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Small case of high-order mapreduce_4_reducer side coupling

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Small case of high-order mapreduce_4_reducer side coupling

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support