Combiner components of MapReduce

Source: Internet
Author: User
Tags hdfs dfs

Brief Introduction

The role of combiner is to combine the multiple of a map generation <KEY,VALUE> into a new one <KEY,VALUE> , and then the new one <KEY,VALUE> as the input of reduce;

There is a combine function between the map function and the reduce function to reduce the intermediate result of the map output, which reduces the data of the map output and reduces the network transmission load ;

It is not possible to use Combiner,combiner in all cases (such as summing ) for a summary of records, but the averaging scenario cannot use combiner. If you can use Combiner, in general, it is consistent with our reduce function.

when do I run combiner?

1, when the job set combiner, and spill number to min.num.spill.for.combine (the default is 3), then combiner will be executed before the merge;
2, but in some cases, the merge began to execute, but the number of spill files did not meet the demand, this time combiner may be executed after the merge;
3, Combiner also may not run, Combiner will consider a load situation of the cluster at that time. If the cluster load is very large, will try to finish the map as early as possible, empty resources, so, will not be executed.

Instance code:
 PackageMycombiner;ImportJava.io.IOException;ImportJava.net.URI;ImportOrg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.FileSystem;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class combinerexp {    Private Finalstatic String Input_path ="Hdfs://master:8020/input";Private Finalstatic String Output_path ="Hdfs://master:8020/output.txt"; public static class mymapper extends Mapper<longwritable, Text,  Text, intwritable>{        PrivateIntwritable one =NewIntwritable (1);//1    PrivateText Word =NewText ();@Override    protectedvoid map (longwritable key, Text value, context context)throwsIOException, interruptedexception {string[] str = value.tostring (). Split ("\\s+"); for(String string:str)                {System.out.println (string);                Word.set (string);            Context.write (Word, one); }}} public static class myreducer extends Reducer<Text, intwritable, Text, intwritable>{        Privateintwritable result =NewIntwritable ();@Override        protectedvoid reduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {int sum =0; for(intwritableVal: values) {sum+=Val. get ();            } result.set (sum);        Context.write (Key,result); }} public static void Main (string[] args)throwsException {//1, ConfigurationConfiguration conf =NewConfiguration ();FinalFileSystem FileSystem = Filesystem.get (NewURI (Input_path), conf);if(Filesystem.exists (NewPath (Output_path)) {Filesystem.delete (NewPath (Output_path),true); Job Job = job.getinstance (conf,"Word Count");///2, the method that must be executed to run the packageJob.setjarbyclass (Combinerexp.class);//3, Input pathFileinputformat.addinputpath (Job,NewPath (Input_path));//4, MapJob.setmapperclass (Mymapper.class);//5, CombinerJob.setcombinerclass (Myreducer.class);//6, Reducer        //job.setreducerclass (myreducer.class);Job.setnumreducetasks (0);//reduce Number default is 1Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class);//7, Output pathFileoutputformat.setoutputpath (Job,NewPath (Output_path));//8, submitting JobsSystem.exit (Job.waitforcompletion (true) ?0:1); }  }
[Email protected] liguodong]# HDFs dfs-ls-r/input/-rw-r--r--1Root supergroup -  -- .- -  A: the/input/input1-rw-r--r--1Root supergroup -  -- .- -  A: the/input/input2 combine does not execute when we have only map and combine without reduce. The result of the output is not summed. [Email protected] liguodong]# HDFs dfs-ls-r/output/-rw-r--r--3Liguodong supergroup0  -- .- -  A: -/output/_success-rw-r--r--3Liguodong supergroup -  -- .- -  A: -/output/part-m-00000-rw-r--r--3Liguodong supergroup the  -- .- -  A: -/output/part-m-00001[Email protected] liguodong]# HDFs dfs-cat/output/part-m-00000Hello1You1Hello1Everyone1Hello1Hadoop1[Email protected] liguodong]# HDFs dfs-cat/output/part-m-00001Hello1You1Hello1Me1Hi1Baby1When we put the first -The line comment is canceled and the theThe Combine function is executed when the line comment is taken. [Main] INFO org.apache.hadoop.mapreduce.job-counters: +    FileSystem Counters ...Map-ReduceFrameworkMapInput records=6        MapOutput records= A......InputSplit bytes=192Combine input records= ACombine Output records=9......ReduceInput records=9        ReduceOutput records=7Spilled records= -...... Virtual memory (bytes) snapshot=0         TotalCommitted heap usage (bytes) =457912320    File Input FormatCounters BytesRead= $    FileOutputFormatCounters Bytes written=Wuyi[Email protected] hadoop]# HDFs dfs-ls-r/output/-rw-r--r--3Liguodong supergroup0  -- .- -  A: A/output/_success-rw-r--r--3Liguodong supergroupWuyi  -- .- -  A: A/output/part-r-00000[Email protected] hadoop]# HDFs Dfs-cat/output/pa*baby1Everyone1Hadoop1Hello5Hi1Me1You2

Combiner components of MapReduce

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.