Prerequisite Preparation:
1.hadoop installation is operating normally. Hadoop installation Configuration Please refer to: Ubuntu under Hadoop 1.2.1 Configuration installation
2. The integrated development environment is normal. Integrated development environment Configuration Please refer to: Ubuntu building Hadoop Source Reading environment
MapReduce Programming Examples:
MapReduce Programming Example (i), detailing running the first MapReduce program in an integrated environment WordCount and Code Analysis
MapReduce Programming Example (ii), calculating average student scores
MapReduce Programming Example (iii), data deduplication
MapReduce Programming Example (iv), sorting
MapReduce Programming Example (v), MapReduce implements single-table association
MapReduce Programming Example (vi), MapReduce implements multi-table Association
Input:
2013-11-01 AA
2013-11-02 BB
2013-11-03 cc
2013-11-04 AA
2013-11-05 DD
2013-11-06 DD
2013-11-07 AA
2013-11-09 cc
2013-11-10 EE
2013-11-01 BB
2013-11-02 on 33
2013-11-03 cc
2013-11-04 BB
2013-11-05 on 23
2013-11-06 DD
2013-11-07 on 99
2013-11-09 on 99
2013-11-10 EE
.....
.....
.....
Data duplication, each row in map as a key,value value arbitrarily, after shuffle input into reduce the uniqueness of the key to directly output key
The code is too simple, not explained, on the code:
Package com.t.hadoop;
Import java.io.IOException;
Import Java.util.HashSet;
Import Java.util.StringTokenizer;
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.fs.Path;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Import Org.apache.hadoop.util.GenericOptionsParser; /** * Data deduplication * @author DaT dev.tao@gmail.com * * */public class Dedup {public static class Mymapper extends mapper< object, text, text, text>{@Override protected void Map (object key, text value, context context) throws Ioexce
Ption, Interruptedexception {context.write (value, New Text ("")); }} public static class Myreducer extends Reducer<text, text, text, text>{@Override protected void reduce ( Text Key, iterable<text>Value, Context context) throws IOException, interruptedexception {context.write (Key, New Text ("")); }} public static void Main (string[] args) throws IOException, ClassNotFoundException, interruptedexception{confi
Guration conf = new Configuration ();
string[] Otherargs = new Genericoptionsparser (conf, args). Getremainingargs ();
if (otherargs.length<2) {System.out.println ("parameter errors!");
System.exit (2);
} Job Job = new Org.apache.hadoop.mapreduce.Job (conf, "Dedup");
Job.setjarbyclass (Dedup.class);
Job.setmapperclass (Mymapper.class);
Job.setcombinerclass (Myreducer.class);
Job.setreducerclass (Myreducer.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Text.class);
Fileinputformat.addinputpath (Job, New Path (Otherargs[0]));
Fileoutputformat.setoutputpath (Job, New Path (Otherargs[1]));
System.exit (Job.waitforcompletion (true)? 0:1);
}
}
Output results
2013-11-01 AA
2013-11-01 BB
2013-11-02 on 33
2013-11-02 BB
2013-11-03 cc
2013-11-03 cc
2013-11-04 on 98
2013-11-04 AA
2013-11-04 BB
2013-11-05 on 23
2013-11-05 on 93
2013-11-05 DD
2013-11-06 on 99
2013-11-06 DD
2013-11-07 on 92
2013-11-07 on 99
2013-11-07 AA
2013-11-09 on 99
2013-11-09 AA
2013-11-09 cc
2013-11-10 EE