MapReduce understanding-in-depth understanding of MapReduce

Source: Internet
Author: User
Tags static class

The previous blogs focused on Hadoop's storage HDFs, followed by a few blogs about Hadoop's computational framework MapReduce. This blog mainly explains the specific implementation process of the MapReduce framework, as well as the shuffle process, of course, this technical blog has been particularly numerous and written very good, I wrote a blog before the relevant reading, benefited. The references to some blogs and materials will only be listed in the resources below. MapReduce Understanding Mapredeuce, we can separate it to understand: mapping (Mapping): Apply the same action to each target in the collection. That is, if you want to multiply each cell in the form by two, then the operation that applies the function individually to each cell is mapping (which represents the move calculation instead of moving the data); degenerate (reducing): Iterates through the elements in the collection to return a synthesized result. That is, the output form contains a list of numbers and this task belongs to reducing.

Calculation framework
A simple MapReduce execution process
A simple understanding of the MapReduce computational framework:

Put what you need to calculate into the MapReduce and then return a result that we expect. So first we need a source (something to compute) that is input, and then MapReduce operates the input, by defining a good calculation model and finally getting an output (expected result).

Computational models
Map and reduce
Our main discussion here is the MapReduce computational model:

When running a MapReduce compute task, the task process is divided into two phases: the map phase and the reduce phase, each with a key-value pair (Key/value) as input and output. What the programmer has to do is define the functions of the two phases: the map function and the reduce function. The example code takes the MapReduce statistic word count as an example (pseudo code), the main four modules to explain, such as the above figure calculation frame:

Input, data read-in

1
2
3
4
5
6
Set data input source
fileinputformat.setinputpaths (Job, args[0]);
Fileinputformat.setinputdirrecursive (Job, true); Recursive
Job.setinputformatclass (textinputformat.class);	Sets the input format

//textinputformat, a default text input format, mapper one row of data in a single read of the text.

Calculate with Mapper

1 2 3 4 5 6 7 8 9-ten-all-in-one-all-in-one-off
//set the job's mapper calculation class and K2, V2 type Job.setmapperclass (Wordcountmapper.class);	1. Set Mapper class Job.setmapoutputkeyclass (Text.class); Set the type of mapper output key Job.setmapoutputvalueclass (Longwritable.class);//Set Mapper output value type//wordcountmapper class/** * The custom map needs to inherit Mapper * K1: Line ordinal * V1: Line info * K2: Word * V2: Times */public static class Wordcountmapper extends Mapper< ;
    longwritable,text,text,longwritable> {text K2 = new Text ();

    Longwritable v2 = new longwritable (); @Override protected void Map (longwritable key, Text value, Context context) throws IOException, Interrupt

        edexception {//1. Get row information String line = value.tostring (); 2. Gets the word used for the row string[] words = Line.split (

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.