Analysis of the MapReduce wordcount of Hadoop

Source: Internet
Author: User
Tags map class

The design idea of MapReduce

The main idea is divide and conquer (divide and conquer), divide and conquer the algorithm. It is a map process to divide a big problem into small problems and then execute them on each node in the cluster. After the map process is over, there is a ruduce process that brings together the results of all the map phase outputs. Steps to write a mapreduce program: 1. Turn the problem into a MapReduce model 2. Set parameters for the run 3. Write the Map Class 4. Write the Reduce class example: Count the number of words to split the file into splits, each file as a split, And the file is split into a line <key,value>, the MapReduce framework is automatically completed, where the line offset (that is, the key value) includes the number of characters in the carriage return
The segmented <key,value> is processed to the user-defined map method (Tokenizermapper) to generate a new <key,value> pair.
After the <key,value> pair of the map method output is obtained, mapper will sort them according to the key value and execute the combine procedure to accumulate the key to the same value values to get the final output of mapper.
The reducer first sorts the data received from the Mapper, and then is processed by the user-defined reduce method (Intsumreducer) to obtain a new <key,value> pair, and as the output of the WordCount

Let's look at the official examples:
1: Divide block blocks into three Split2: Each split corresponds to a mapper3: Three mapper outputs are shuffling, and each map output is simply key-value rather than key-valuelist,   So shuffling the job is to convert the map output into reducer input process. Before Reducer starts, the shuffle to do is divided into two-step copy and sort stages.

Analysis of the MapReduce wordcount of Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.