MapReduce Programming Example (v) __ Programming

Source: Internet
Author: User
Tags shuffle static class

Prerequisite Preparation:

1.hadoop installation is operating normally. Hadoop installation Configuration Please refer to: Ubuntu under Hadoop 1.2.1 Configuration installation

2. The integrated development environment is normal. Integrated development environment Configuration Please refer to: Ubuntu building Hadoop Source Reading environment


MapReduce Programming Examples:

MapReduce Programming Example (i), detailing running the first MapReduce program in an integrated environment WordCount and Code Analysis

MapReduce Programming Example (ii), calculating average student scores

MapReduce Programming Example (iii), data deduplication

MapReduce Programming Example (iv), sorting

MapReduce Programming Example (v), MapReduce implements single-table association

MapReduce Programming Example (vi), MapReduce implements multi-table Association


Single-Table Association:

Describe:

Self-Join solution problem for single table. The values of the Grandchild-grandparent table are listed according to the Child-parent table, as shown in the following table.

Child Parent
Tom Lucy
Tom Jim
Lucy David
Lucy Lili
Jim Lilei
Jim SuSan
Lily Green
Lily Bians
Green Well
Green Millshell
Havid James
James LiT
Richard Cheng
Cheng Lihua

Problem Analysis:

Obviously need to decompose into two tables to self-connect, and the left and right tables are actually child-parent tables, through the parent field to make a key value to connect. In conjunction with the MapReduce feature, MapReduce will shuffle the same key together to reduce for processing. OK, this has the idea, the left table of the parent as a key output, the right table of child as a key output, so shuffle after a very natural, left and right connected together, there are wood. Then we obtain the necessary data by Cartesian the left and right table.

Package com.t.hadoop;
Import java.io.IOException;

Import Java.util.Iterator;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Import Org.apache.hadoop.util.GenericOptionsParser;
	
	/** * Single Table association * @author DaT dev.tao@gmail.com * */public class Stjoin {public static int time = 0; public static class Stjoinmapper extends Mapper<object, text, text, text>{@Override protected void Map (Object
			Key, Text value, Context context) throws IOException, interruptedexception {string childname = new String ();
			String parentname = new string ();
			String relation = new string ();
			String line = value.tostring ();
			int i = 0; while (Line.charat (i)! = ") {i++;
			} string[] values = {line.substring (0,i), line.substring (i+1)};
				if (Values[0].compareto ("child")! = 0) {childname = values[0];
				ParentName = values[1]; 
				Relation = "1";//left and right table partition flag Context.write (new text (parentname), new text (relation+ "+" +childname));//Left Table relation = "2"; Context.write (new text (childname), new text (relation+ "+" +parentname));//Right Table}} public static class Stjoinr Educe extends Reducer<text, text, text, text>{@Override protected void Reduce (text key, Iterable<text> V Alues,context Context) throws IOException, interruptedexception {if (time ==0) {//Output header Context.write (New Text ("G
				Randchild "), New Text (" grandparent "));
			Time + +;
			} int grandchildnum = 0;
			string[] grandchild = new STRING[10];
			int grandparentnum = 0;
			string[] grandparent = new STRING[10];
			iterator<text> ite = Values.iterator ();
				while (Ite.hasnext ()) {String record = Ite.next (). toString (); int len = Record.length ();
				int i = 2;
				if (Len ==0) continue;
				
				Char relation = record.charat (0);
					if (relation = = ' 1 ') {//Is left table take the child string childname = new string ();
						while (I < len) {//parse name ChildName = ChildName + Record.charat (i);
					i++;
					} Grandchild[grandchildnum] = ChildName;
				grandchildnum++;
					}else{//is the right table to take the parent string parentname = new string ();
						while (I < len) {//parse name ParentName = ParentName + Record.charat (i);
					i++;
					} Grandparent[grandparentnum] = ParentName;
				grandparentnum++; }}//left and right two tables for Cartesian product if (grandchildnum!=0&&grandparentnum!=0) {for (int m=0;m<grandchildnum;m++) {fo R (int n=0;n<grandparentnum;n++) {System.out.println ("grandchild" +grandchild[m] + "grandparent" + grandparent[n])
						;
					Context.write (new text (Grandchild[m]), new text (Grandparent[n])); }}}}} public static void Main (string[] args) throws IOException, ClassNotFoundException, Interruptedexception{Configuration conf = new configuration ();
		string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();
			if (otherargs.length<2) {System.out.println ("parameter Error");
		System.exit (2);
		} Job Job = new Job (conf);
		Job.setjarbyclass (Stjoin.class);
		Job.setmapperclass (Stjoinmapper.class);
		Job.setreducerclass (Stjoinreduce.class);
		Job.setoutputkeyclass (Text.class);
		
		Job.setoutputvalueclass (Text.class);
		Fileinputformat.addinputpath (Job, New Path (Otherargs[0]));
		
		Fileoutputformat.setoutputpath (Job, New Path (Otherargs[1]));
	System.exit (Job.waitforcompletion (true)? 0:1);
 }
}


Incoming parameters:

Hdfs://localhost:9000/user/dat/stjon_input Hdfs://localhost:9000/user/dat/stjon_output

Output Result:

Grandchild grandparent
Richard Lihua
Lily Well
Lily Millshell
Havid LiT
Tom Lilei
Tom, SuSan.
Tom Lili
Tom David


ok~! Welcome students to Exchange ~ ~

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.