ArticleDirectory
- Declare combiner Function
Many mapreduceProgramLimited by the available bandwidth on the cluster, it will try its best to minimize the intermediate data that needs to be transmitted between map and reduce tasks. Hadoop allows you to declare a combiner function to process map output, and use your own map processing result as the reduce input. Because combiner function is onlyOptimization, Hadoop does not guarantee how many times this method will be called for a map output. In other words, no matter how many times the combiner function is called, the corresponding reduce output results should be the same.
Next we willStudy Notes (1)Assume that the weather data read in 1950 is completed by two maps. The output of the first map is as follows:
(1950, 0)
(1950, 20)
(1950, 10)
The output of the second map is:
(1950, 25)
(1950, 15)
The input obtained by reduce is: (1950, [0, 20, 10, 25, 15]), and the output is: (1950, 25)
Because 25 is the maximum value in the set, we can use a combiner function similar to the reduce function to find the maximum value in each map output. In this way, the reduce input becomes:
(1950, [20, 25])
The temperature values of each funciton can be expressed as follows: max (0, 20, 10, 25, 15) = max (0, 20, 10), max (25, 15) = max (20, 25) = 25
Note: not all functions have this attribute (functions with this attribute are called commutative and associative). For example, if we want to calculate the average temperature, combiner function cannot be used in this way, because mean (0, 20, 10, 25, 15) = 14, mean (0, 20, 10), mean (25, 15) = mean (10, 20) = 15
Combiner function cannot replace reduce function (because reduce function still needs to process records with the same key from different maps ). However, it can help reduce the data that needs to be transmitted between map and reduce, so the combiner function is worth considering.
Declare combiner Function
Now let's go backStudy Notes (1)For this program, the implementation of combiner is the same as that of reducer. The only change is to set the specific implementation class (that is, reducer class) of combiner in the job,CodeSee the underline section below ).
1 Public Class Maxtemperaturewithcombiner { 2 Public Static Void Main (string [] ARGs) Throws Exception { 3 If (ARGs. length! = 2 ){ 4 System. Err. println ("Usage: maxtemperaturewithcombiner <input path>" +5 "<Output path>" ); 6 System. Exit (-1 ); 7 } 8 Job job = New Job (); 9 Job. setjarbyclass (maxtemperaturewithcombiner. Class ); 10 Job. setjobname ("max temperature" ); 11 12 Fileinputformat. addinputpath (job, New PATH (ARGs [0 ]); 13 Fileoutputformat. setoutputpath (job, New PATH (ARGs [1 ]); 14 15 Job. setmapperclass (maxtemperaturemapper. Class ); 16 Job. setcombinerclass (maxtemperaturereducer.Class); 17 Job. setreducerclass (maxtemperaturereducer. Class ); 18 19 Job. setoutputkeyclass (text. Class ); 20 Job. setoutputvalueclass (intwritable. Class ); 21 22 System. Exit (job. waitforcompletion ( True )? 0: 1 ); 23 } 24 }
Reprinted please indicate the source: http://www.cnblogs.com/beanmoon/archive/2012/12/09/2805684.html