Brief Introduction
The role of combiner is to combine the multiple of a map generation <KEY,VALUE>
into a new one <KEY,VALUE>
, and then the new one <KEY,VALUE>
as the input of reduce;
There is a combine function between the map function and the reduce function to reduce the intermediate result of the map output, which reduces the data of the map output and reduces the network transmission load ;
It is not possible to use Combiner,combiner in all cases (such as summing ) for a summary of records, but the averaging scenario cannot use combiner. If you can use Combiner, in general, it is consistent with our reduce function.
when do I run combiner?
1, when the job set combiner, and spill number to min.num.spill.for.combine
(the default is 3), then combiner will be executed before the merge;
2, but in some cases, the merge began to execute, but the number of spill files did not meet the demand, this time combiner may be executed after the merge;
3, Combiner also may not run, Combiner will consider a load situation of the cluster at that time. If the cluster load is very large, will try to finish the map as early as possible, empty resources, so, will not be executed.
Instance code:
PackageMycombiner;ImportJava.io.IOException;ImportJava.net.URI;ImportOrg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.FileSystem;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class combinerexp { Private Finalstatic String Input_path ="Hdfs://master:8020/input";Private Finalstatic String Output_path ="Hdfs://master:8020/output.txt"; public static class mymapper extends Mapper<longwritable, Text, Text, intwritable>{ PrivateIntwritable one =NewIntwritable (1);//1 PrivateText Word =NewText ();@Override protectedvoid map (longwritable key, Text value, context context)throwsIOException, interruptedexception {string[] str = value.tostring (). Split ("\\s+"); for(String string:str) {System.out.println (string); Word.set (string); Context.write (Word, one); }}} public static class myreducer extends Reducer<Text, intwritable, Text, intwritable>{ Privateintwritable result =NewIntwritable ();@Override protectedvoid reduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {int sum =0; for(intwritableVal: values) {sum+=Val. get (); } result.set (sum); Context.write (Key,result); }} public static void Main (string[] args)throwsException {//1, ConfigurationConfiguration conf =NewConfiguration ();FinalFileSystem FileSystem = Filesystem.get (NewURI (Input_path), conf);if(Filesystem.exists (NewPath (Output_path)) {Filesystem.delete (NewPath (Output_path),true); Job Job = job.getinstance (conf,"Word Count");///2, the method that must be executed to run the packageJob.setjarbyclass (Combinerexp.class);//3, Input pathFileinputformat.addinputpath (Job,NewPath (Input_path));//4, MapJob.setmapperclass (Mymapper.class);//5, CombinerJob.setcombinerclass (Myreducer.class);//6, Reducer //job.setreducerclass (myreducer.class);Job.setnumreducetasks (0);//reduce Number default is 1Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class);//7, Output pathFileoutputformat.setoutputpath (Job,NewPath (Output_path));//8, submitting JobsSystem.exit (Job.waitforcompletion (true) ?0:1); } }
[Email protected] liguodong]# HDFs dfs-ls-r/input/-rw-r--r--1Root supergroup - -- .- - A: the/input/input1-rw-r--r--1Root supergroup - -- .- - A: the/input/input2 combine does not execute when we have only map and combine without reduce. The result of the output is not summed. [Email protected] liguodong]# HDFs dfs-ls-r/output/-rw-r--r--3Liguodong supergroup0 -- .- - A: -/output/_success-rw-r--r--3Liguodong supergroup - -- .- - A: -/output/part-m-00000-rw-r--r--3Liguodong supergroup the -- .- - A: -/output/part-m-00001[Email protected] liguodong]# HDFs dfs-cat/output/part-m-00000Hello1You1Hello1Everyone1Hello1Hadoop1[Email protected] liguodong]# HDFs dfs-cat/output/part-m-00001Hello1You1Hello1Me1Hi1Baby1When we put the first -The line comment is canceled and the theThe Combine function is executed when the line comment is taken. [Main] INFO org.apache.hadoop.mapreduce.job-counters: + FileSystem Counters ...Map-ReduceFrameworkMapInput records=6 MapOutput records= A......InputSplit bytes=192Combine input records= ACombine Output records=9......ReduceInput records=9 ReduceOutput records=7Spilled records= -...... Virtual memory (bytes) snapshot=0 TotalCommitted heap usage (bytes) =457912320 File Input FormatCounters BytesRead= $ FileOutputFormatCounters Bytes written=Wuyi[Email protected] hadoop]# HDFs dfs-ls-r/output/-rw-r--r--3Liguodong supergroup0 -- .- - A: A/output/_success-rw-r--r--3Liguodong supergroupWuyi -- .- - A: A/output/part-r-00000[Email protected] hadoop]# HDFs Dfs-cat/output/pa*baby1Everyone1Hadoop1Hello5Hi1Me1You2
Combiner components of MapReduce