The Combiner function is to merge the output of the mapper, combiner output as input to the reducer, which reduces the data transfer between the map task and the Reducer task.
1, set combiner in job and do not set combiner, observe reducer input situation
Use the following code to set combiner
Job.setcombinerclass (Maxtemperaturereducer.class);
@Override
public int run (string[] args) throws Exception {
Job Job = new Job ();
Job.setjarbyclass (Maxtemperature.class);
Job.setjobname ("Max temperature");
Fileinputformat.addinputpath (Job, New Path (Args[0]));
Fileoutputformat.setoutputpath (Job, New Path (Args[1]));
Job.setmapperclass (Maxtemperaturemapper.class);
Job.setcombinerclass (maxtemperaturereducer.class); set combiner
Job.setreducerclass (Maxtemperaturereducer.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class);
Job.waitforcompletion (TRUE);
Output Task Completion status
SYSTEM.OUT.PRINTLN ("Task Name:" + job.getjobname ());
SYSTEM.OUT.PRINTLN ("Task succeeded:" + (Job.issuccessful ()? ") Yes: "no"));
SYSTEM.OUT.PRINTLN ("Input line number:" + job.getcounters (). Findcounter ("Org.apache.hadoop.mapred.task$counter", "Map_input_ RECORDS "). GetValue ());
System.out.println ("Number of output lines:" + job.getcounters (). Findcounter ("Org.apache.hadoop.mapred.task$counter", "Map_output_ RECORDS "). GetValue ());
System.out.println ("Number of output lines:" + job.getcounters (). Findcounter ("Org.apache.hadoop.mapred.task$counter", "Reduce_input _records "). GetValue ());
Return job.issuccessful ()? 0:1;
}
2, the following is not set combiner condition output results, reducer input line number and mapper output line number is equal
Task Name: Max temperature
Mission Success: Yes
Map_input_records Input line number: 1207
Map_output_records number of rows: 1190
Reduce_input_records number of rows: 1190
Task start: 2015-04-24 14:26:00
End of Mission: 2015-04-24 14:26:03
Task time: 0.04995 minutes
3, the following is the setting combiner output results, after combiner, reducer input line number greatly reduced.
Task Name: Max temperature
Mission Success: Yes
Map_input_records Input line number: 1207
Map_output_records number of rows: 1190
Reduce_input_records number of rows: 1
Task start: 2015-04-24 14:28:23
End of Mission: 2015-04-24 14:28:25
Task time: 0.030966667 minutes
This article is from the "10110275" blog, please be sure to keep this source http://10120275.blog.51cto.com/10110275/1637950
Combiner practices in Hadoop