Small case of high-order mapreduce_4_reducer side coupling

Source: Internet
Author: User

Data Set File:

Customers

1,stephanie Leung,555-555-5552,edward kim,123-456-78903,jose Madriz,281-330-80044,david Stork,408-555-0000

Orders

3,a,12.95,02-jun-20081,b,88.25,20-may-20082,c,32.00,30-nov,20073,d,25.02,22-jan-2009

The program expects to achieve the result:

1Stephanie Leung,555-555-555,b,88.25,20-may-20082edward Kim,123-456-7890,c,32.00,30-nov,20073jose Madriz, 281-330-8004,d,25.02,22-jan-20093jose madriz,281-330-8004,a,12.95,02-jun-2008

Next, let's implement this applet:

In the previous article, we need to implement several classes, one is a subclass of Taggedmapoutput, and two are datajoinmapperbase subclasses, one is mapper, one is reducer, the following is a concrete implementation:

The Taggedwritable class inherits from the Taggedmapoutput:

Import Java.io.datainput;import Java.io.dataoutput;import Java.io.ioexception;import Org.apache.hadoop.contrib.utils.join.taggedmapoutput;import Org.apache.hadoop.io.text;import Org.apache.hadoop.io.writable;import Org.apache.hadoop.util.reflectionutils;/*taggedmapoutput is an abstract data type, Encapsulates the label and record contents here as Datajoinmapperbase output value type, need to implement writable interface, so to implement two serialization method custom input type */public class Taggedwritable extends Taggedmapoutput {Private writable Data;public taggedwritable () {This.tag = new Text ();} public taggedwritable (Writable da TA)//constructor {//tag is to partition the dataset by key This.tag = new Text ();//tag can be set through the Settag () method this.data = data;} @Overridepublic void ReadFields (Datainput in) throws IOException {Tag.readfields (in);  String Dataclz = In.readutf (); if (This.data = = NULL | |!this.data.getclass (). GetName (). Equals (Dataclz)) {try {this.data = (writable) Reflectionutils.newinstance (Class.forName (DATACLZ), null);} catch (ClassNotFoundException e) {e.printstacktrace ();}} Data.readfields (in);} @Overridepublic void Write (DATAOUTPUT out) throws IOException {tag.write, Out.writeutf (This.data.getClass (). GetName ());d ata.write (out);} @Overridepublic writable GetData () {return data;}}

The Joinmapperl class inherits from the Datajoinmapperbase:

Import Org.apache.hadoop.contrib.utils.join.datajoinmapperbase;import Org.apache.hadoop.contrib.utils.join.taggedmapoutput;import Org.apache.hadoop.io.text;import Com.demo.writables.taggedwritable;public class Joinmapper extends Datajoinmapperbase {//This is called at the start of a task to generate tags// The role of the label----tag directly as a file name is to partition the dataset @overrideprotected Text Generateinputtag (String inputfile) {System.out.println (" Inputfile = "+ inputfile); return new Text (inputfile);}  Here we have determined that the delimiter is ', ', more generally, that the user should be able to specify the delimiter and the group key himself. Set the group key @overrideprotected Text Generategroupkey (taggedmapoutput record) {String tag = ((Text) Record.gettag ()). toString (); System.out.println ("tag =" + tag); String line = ((Text) Record.getdata ()). ToString (); string[] tokens = Line.split (","); return new Text (Tokens[0]);} Returns a taggedwritable@overrideprotected taggedmapoutput generatetaggedmapoutput (Object value) with any text tag we want { Taggedwritable Retv = new Taggedwritable ((Text) value); Retv.settag (This.inputtag); Do not forget to set the label of the current key value return retv;///}}

Joinreducer Integrated from Datajoinreducerbase:

Import Org.apache.hadoop.contrib.utils.join.datajoinreducerbase;import Org.apache.hadoop.contrib.utils.join.taggedmapoutput;import Org.apache.hadoop.io.text;import Com.demo.writables.taggedwritable;public class Joinreducer extends Datajoinreducerbase {//two parameter array size must be the same, And at most equals the number of data sources @overrideprotected Taggedmapoutput combine (object[] tags, object[] values) {if (Tags.length < 2) return Null This step implements the inner junction string joinedstr = ""; for (int i = 0; i < values.length; i++) {if (i > 0) joinedstr + = ",";//comma as the original two data source record link delimiter taggedwritable tw = (taggedwritable) values[i]; String line = ((Text) Tw.getdata ()). ToString (); string[] tokens = Line.split (",", 2); Divides a record into two groups, removing the group key name from the first group. Joinedstr + = tokens[1];} Taggedwritable Retv = new Taggedwritable (new text (JOINEDSTR)); Retv.settag ((Text) tags[0]); This only retv the group key as the final output key. return Retv;}}

There is also a program entry for MapReduce:

Import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.conf.configured;import Org.apache.hadoop.fs.filesystem;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapred.fileinputformat;import Org.apache.hadoop.mapred.fileoutputformat;import Org.apache.hadoop.mapred.jobclient;import Org.apache.hadoop.mapred.jobconf;import Org.apache.hadoop.mapred.textinputformat;import Org.apache.hadoop.mapred.textoutputformat;import Org.apache.hadoop.util.tool;import Org.apache.hadoop.util.toolrunner;import Com.demo.mappers.joinmapper;import Com.demo.reducers.joinreducer;import Com.demo.writables.taggedwritable;public class DataJoinDriver extends  Configured implements Tool {public int run (string[] args) throws Exception {Configuration conf = getconf (); if (args.length ! = 2) {System.err.println ("Usage:datajoin <input path> <output path>"); System.exit (-1);} Path in = new Path (args[0]); Path out = new path (args[1]); jobconf job = new jobconf (conF, Datajoindriver.class); Job.setjobname ("Datajoin");//filesystem HDFs =filesystem.get (conf); FileSystem HDFs = In.getfilesystem (conf); Fileinputformat.setinputpaths (Job, in), if (Hdfs.exists (New Path (args[1))) {Hdfs.delete (New path (Args[1]), true);} Fileoutputformat.setoutputpath (Job, out); Job.setmapperclass (Joinmapper.class); Job.setreducerclass ( Joinreducer.class); Job.setinputformat (Textinputformat.class); Job.setoutputformat (TextOutputFormat.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (Taggedwritable.class); Jobclient.runjob (Job); return 0;} public static void Main (string[] args) throws Exception {args = new string[]{"Hdfs://localhost:9000/input/different data SOURCE Data/*.txt "," hdfs://localhost:9000/output/secondoutput1 "};int res = Toolrunner.run (new Configuration (), New Datajoindriver (), args); System.exit (res);}}

Program Run Result:



Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Small case of high-order mapreduce_4_reducer side coupling

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.