Sinsing Notes of the Hadoop authoritative guide to the third article combiner

Source: Internet
Author: User
Tags types of functions

The bandwidth available on the cluster limits the number of mapreduce jobs, so the most important thing to do is to avoid the data transfer between the map task and the reduce task as much as possible. Hadoop allows users to specify a merge function for the output of the map task, and sometimes we also call it combiner, which is like mapper and reducer.

The output of the merge function as input to the reduce function, because the merge function is an optimization scheme, Hadoop cannot determine how many times the merge function needs to be called for any record in the map task output. No matter how many times we call the merge function, the output of the reducer should be consistent. The rules for merging functions qualify the types of functions that can be used.

We still need the reduce function to handle records with the same key in different map outputs, which can effectively reduce the amount of data transferred between map and reduce, and it is prudent to use combiner in MapReduce operations.

In the MapReduce program, the merge function is defined by the Reducer interface, and we need to set the Combiner class in jobconf, which is used setcombinerclass this method.














Sinsing Notes of the Hadoop authoritative guide to the third article combiner

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.