Hadoop Study Notes (III): combiner funcitons

Source: Internet
Author: User
ArticleDirectory
    • Declare combiner Function

Many mapreduceProgramLimited by the available bandwidth on the cluster, it will try its best to minimize the intermediate data that needs to be transmitted between map and reduce tasks. Hadoop allows you to declare a combiner function to process map output, and use your own map processing result as the reduce input. Because combiner function is onlyOptimization, Hadoop does not guarantee how many times this method will be called for a map output. In other words, no matter how many times the combiner function is called, the corresponding reduce output results should be the same.

Next we willStudy Notes (1)Assume that the weather data read in 1950 is completed by two maps. The output of the first map is as follows:
(1950, 0)
(1950, 20)
(1950, 10)

The output of the second map is:
(1950, 25)
(1950, 15)

The input obtained by reduce is: (1950, [0, 20, 10, 25, 15]), and the output is: (1950, 25)

Because 25 is the maximum value in the set, we can use a combiner function similar to the reduce function to find the maximum value in each map output. In this way, the reduce input becomes:
(1950, [20, 25])

The temperature values of each funciton can be expressed as follows: max (0, 20, 10, 25, 15) = max (0, 20, 10), max (25, 15) = max (20, 25) = 25

Note: not all functions have this attribute (functions with this attribute are called commutative and associative). For example, if we want to calculate the average temperature, combiner function cannot be used in this way, because mean (0, 20, 10, 25, 15) = 14, mean (0, 20, 10), mean (25, 15) = mean (10, 20) = 15

Combiner function cannot replace reduce function (because reduce function still needs to process records with the same key from different maps ). However, it can help reduce the data that needs to be transmitted between map and reduce, so the combiner function is worth considering.

Declare combiner Function

Now let's go backStudy Notes (1)For this program, the implementation of combiner is the same as that of reducer. The only change is to set the specific implementation class (that is, reducer class) of combiner in the job,CodeSee the underline section below ).

 1   Public   Class Maxtemperaturewithcombiner {  2     Public   Static   Void Main (string [] ARGs) Throws  Exception {  3       If (ARGs. length! = 2 ){  4 System. Err. println ("Usage: maxtemperaturewithcombiner <input path>" +5 "<Output path>" );  6 System. Exit (-1 );  7   }  8 Job job = New  Job ();  9 Job. setjarbyclass (maxtemperaturewithcombiner. Class );  10 Job. setjobname ("max temperature" );  11   12 Fileinputformat. addinputpath (job, New PATH (ARGs [0 ]);  13 Fileoutputformat. setoutputpath (job, New PATH (ARGs [1 ]);  14  15 Job. setmapperclass (maxtemperaturemapper. Class  );  16       Job. setcombinerclass (maxtemperaturereducer.Class);  17 Job. setreducerclass (maxtemperaturereducer. Class  ); 18   19 Job. setoutputkeyclass (text. Class  );  20 Job. setoutputvalueclass (intwritable. Class  );  21   22 System. Exit (job. waitforcompletion ( True )? 0: 1 ); 23   }  24 }

Reprinted please indicate the source: http://www.cnblogs.com/beanmoon/archive/2012/12/09/2805684.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.