Combiner components of MapReduce

Last Update:2015-06-13 Source: Internet

Author: User

Tags hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Brief Introduction

The role of combiner is to combine the multiple of a map generation <KEY,VALUE> into a new one <KEY,VALUE> , and then the new one <KEY,VALUE> as the input of reduce;

There is a combine function between the map function and the reduce function to reduce the intermediate result of the map output, which reduces the data of the map output and reduces the network transmission load ;

It is not possible to use Combiner,combiner in all cases (such as summing ) for a summary of records, but the averaging scenario cannot use combiner. If you can use Combiner, in general, it is consistent with our reduce function.

when do I run combiner?

1, when the job set combiner, and spill number to min.num.spill.for.combine (the default is 3), then combiner will be executed before the merge;
2, but in some cases, the merge began to execute, but the number of spill files did not meet the demand, this time combiner may be executed after the merge;
3, Combiner also may not run, Combiner will consider a load situation of the cluster at that time. If the cluster load is very large, will try to finish the map as early as possible, empty resources, so, will not be executed.

Instance code:

 PackageMycombiner;ImportJava.io.IOException;ImportJava.net.URI;ImportOrg.apache.hadoop.conf.Configuration;ImportOrg.apache.hadoop.fs.FileSystem;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class combinerexp {    Private Finalstatic String Input_path ="Hdfs://master:8020/input";Private Finalstatic String Output_path ="Hdfs://master:8020/output.txt"; public static class mymapper extends Mapper<longwritable, Text,  Text, intwritable>{        PrivateIntwritable one =NewIntwritable (1);//1    PrivateText Word =NewText ();@Override    protectedvoid map (longwritable key, Text value, context context)throwsIOException, interruptedexception {string[] str = value.tostring (). Split ("\\s+"); for(String string:str)                {System.out.println (string);                Word.set (string);            Context.write (Word, one); }}} public static class myreducer extends Reducer<Text, intwritable, Text, intwritable>{        Privateintwritable result =NewIntwritable ();@Override        protectedvoid reduce (Text key, iterable<intwritable> values, context context)throwsIOException, interruptedexception {int sum =0; for(intwritableVal: values) {sum+=Val. get ();            } result.set (sum);        Context.write (Key,result); }} public static void Main (string[] args)throwsException {//1, ConfigurationConfiguration conf =NewConfiguration ();FinalFileSystem FileSystem = Filesystem.get (NewURI (Input_path), conf);if(Filesystem.exists (NewPath (Output_path)) {Filesystem.delete (NewPath (Output_path),true); Job Job = job.getinstance (conf,"Word Count");///2, the method that must be executed to run the packageJob.setjarbyclass (Combinerexp.class);//3, Input pathFileinputformat.addinputpath (Job,NewPath (Input_path));//4, MapJob.setmapperclass (Mymapper.class);//5, CombinerJob.setcombinerclass (Myreducer.class);//6, Reducer        //job.setreducerclass (myreducer.class);Job.setnumreducetasks (0);//reduce Number default is 1Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class);//7, Output pathFileoutputformat.setoutputpath (Job,NewPath (Output_path));//8, submitting JobsSystem.exit (Job.waitforcompletion (true) ?0:1); }  }

[Email protected] liguodong]# HDFs dfs-ls-r/input/-rw-r--r--1Root supergroup -  -- .- -  A: the/input/input1-rw-r--r--1Root supergroup -  -- .- -  A: the/input/input2 combine does not execute when we have only map and combine without reduce. The result of the output is not summed. [Email protected] liguodong]# HDFs dfs-ls-r/output/-rw-r--r--3Liguodong supergroup0  -- .- -  A: -/output/_success-rw-r--r--3Liguodong supergroup -  -- .- -  A: -/output/part-m-00000-rw-r--r--3Liguodong supergroup the  -- .- -  A: -/output/part-m-00001[Email protected] liguodong]# HDFs dfs-cat/output/part-m-00000Hello1You1Hello1Everyone1Hello1Hadoop1[Email protected] liguodong]# HDFs dfs-cat/output/part-m-00001Hello1You1Hello1Me1Hi1Baby1When we put the first -The line comment is canceled and the theThe Combine function is executed when the line comment is taken. [Main] INFO org.apache.hadoop.mapreduce.job-counters: +    FileSystem Counters ...Map-ReduceFrameworkMapInput records=6        MapOutput records= A......InputSplit bytes=192Combine input records= ACombine Output records=9......ReduceInput records=9        ReduceOutput records=7Spilled records= -...... Virtual memory (bytes) snapshot=0         TotalCommitted heap usage (bytes) =457912320    File Input FormatCounters BytesRead= $    FileOutputFormatCounters Bytes written=Wuyi[Email protected] hadoop]# HDFs dfs-ls-r/output/-rw-r--r--3Liguodong supergroup0  -- .- -  A: A/output/_success-rw-r--r--3Liguodong supergroupWuyi  -- .- -  A: A/output/part-r-00000[Email protected] hadoop]# HDFs Dfs-cat/output/pa*baby1Everyone1Hadoop1Hello5Hi1Me1You2

Combiner components of MapReduce

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More