Prerequisite Preparation:
1.hadoop installation is operating normally. Hadoop installation Configuration Please refer to: Ubuntu under Hadoop 1.2.1 Configuration installation
2. The integrated development environment is normal. Integrated development environment Configuration Please refer to: Ubuntu building Hadoop Source Reading environment
MapReduce Programming Examples:
MapReduce Programming Example (i), detailing running the first MapReduce program in an integrated environment WordCount and Code Analysis
MapReduce Programming Example (ii), calculating average student scores
MapReduce Programming Example (iii), data deduplication
MapReduce Programming Example (iv), sorting
MapReduce Programming Example (v), MapReduce implements single-table association
MapReduce Programming Example (vi), MapReduce implements multi-table Association
Single-Table Association:
Describe:
Self-Join solution problem for single table. The values of the Grandchild-grandparent table are listed according to the Child-parent table, as shown in the following table.
Child Parent
Tom Lucy
Tom Jim
Lucy David
Lucy Lili
Jim Lilei
Jim SuSan
Lily Green
Lily Bians
Green Well
Green Millshell
Havid James
James LiT
Richard Cheng
Cheng Lihua
Problem Analysis:
Obviously need to decompose into two tables to self-connect, and the left and right tables are actually child-parent tables, through the parent field to make a key value to connect. In conjunction with the MapReduce feature, MapReduce will shuffle the same key together to reduce for processing. OK, this has the idea, the left table of the parent as a key output, the right table of child as a key output, so shuffle after a very natural, left and right connected together, there are wood. Then we obtain the necessary data by Cartesian the left and right table.
Package com.t.hadoop;
Import java.io.IOException;
Import Java.util.Iterator;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.util.GenericOptionsParser;
/** * Single Table association * @author DaT dev.tao@gmail.com * */public class Stjoin {public static int time = 0; public static class Stjoinmapper extends Mapper<object, text, text, text>{@Override protected void Map (Object
Key, Text value, Context context) throws IOException, interruptedexception {string childname = new String ();
String parentname = new string ();
String relation = new string ();
String line = value.tostring ();
int i = 0; while (Line.charat (i)! = ") {i++;
} string[] values = {line.substring (0,i), line.substring (i+1)};
if (Values[0].compareto ("child")! = 0) {childname = values[0];
ParentName = values[1];
Relation = "1";//left and right table partition flag Context.write (new text (parentname), new text (relation+ "+" +childname));//Left Table relation = "2"; Context.write (new text (childname), new text (relation+ "+" +parentname));//Right Table}} public static class Stjoinr Educe extends Reducer<text, text, text, text>{@Override protected void Reduce (text key, Iterable<text> V Alues,context Context) throws IOException, interruptedexception {if (time ==0) {//Output header Context.write (New Text ("G
Randchild "), New Text (" grandparent "));
Time + +;
} int grandchildnum = 0;
string[] grandchild = new STRING[10];
int grandparentnum = 0;
string[] grandparent = new STRING[10];
iterator<text> ite = Values.iterator ();
while (Ite.hasnext ()) {String record = Ite.next (). toString (); int len = Record.length ();
int i = 2;
if (Len ==0) continue;
Char relation = record.charat (0);
if (relation = = ' 1 ') {//Is left table take the child string childname = new string ();
while (I < len) {//parse name ChildName = ChildName + Record.charat (i);
i++;
} Grandchild[grandchildnum] = ChildName;
grandchildnum++;
}else{//is the right table to take the parent string parentname = new string ();
while (I < len) {//parse name ParentName = ParentName + Record.charat (i);
i++;
} Grandparent[grandparentnum] = ParentName;
grandparentnum++; }}//left and right two tables for Cartesian product if (grandchildnum!=0&&grandparentnum!=0) {for (int m=0;m<grandchildnum;m++) {fo R (int n=0;n<grandparentnum;n++) {System.out.println ("grandchild" +grandchild[m] + "grandparent" + grandparent[n])
;
Context.write (new text (Grandchild[m]), new text (Grandparent[n])); }}}}} public static void Main (string[] args) throws IOException, ClassNotFoundException, Interruptedexception{Configuration conf = new configuration ();
string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();
if (otherargs.length<2) {System.out.println ("parameter Error");
System.exit (2);
} Job Job = new Job (conf);
Job.setjarbyclass (Stjoin.class);
Job.setmapperclass (Stjoinmapper.class);
Job.setreducerclass (Stjoinreduce.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Text.class);
Fileinputformat.addinputpath (Job, New Path (Otherargs[0]));
Fileoutputformat.setoutputpath (Job, New Path (Otherargs[1]));
System.exit (Job.waitforcompletion (true)? 0:1);
}
}
Incoming parameters:
Hdfs://localhost:9000/user/dat/stjon_input Hdfs://localhost:9000/user/dat/stjon_output
Output Result:
Grandchild grandparent
Richard Lihua
Lily Well
Lily Millshell
Havid LiT
Tom Lilei
Tom, SuSan.
Tom Lili
Tom David
ok~! Welcome students to Exchange ~ ~