Let you quickly understand the concept of MapReduce 1 framework

Source: Internet
Author: User

Sort out the basic concepts of MapReduce 1 for your reference only.

The figure above shows the MapReduce workflow. The following describes an instance.

MapReduce divides the processing process into two stages: map stage and reduce stage. Each key-value pair is used as the input and output, and its type is selected by the programmer. The programmer can specify a function: map function and reduce function.

A simple example is provided to illustrate the details of MapReduce. Assume that a bunch of weather data needs to be processed, and we are only interested in the year and temperature fields in the weather data. The function of a map function is a data preparation stage. The Reduce function prepares the data that can be processed in the reduce stage. The Reduce function obtains the maximum value from multiple temperature data within a year. The map function needs to filter out error data and some data that does not comply with the rules, filter out valid data, and send it to the reduce stage for processing.

To visualize map operations, assume there are multiple rows of record data in the following input file.

0067011990999991950051507004... 9999999425+ 00001 + 99999999999...
0043011990999991950051512004... 9999999425+ 00221 + 99999999999...
0043011990999991950051518004... 9999999N9-00111 + 99999999999...
0043012650999991949032412004... 0500001133 + 01111 + 99999999999...
0043012650999991949032418004... 0500001133 + 00781 + 99999999999...

At the map function processing stage, the data becomes a key-value pair.

(0, 0067011990999991950051507004... 9999999425+ 00001 + 99999999999 ...)
(106,004 301199099999195005152134... 999999999385+ 00221 + 99999999999 ...)
(212,004 3011990999991950051518004... 9999999N9-00111 + 99999999999 ...)
(318,004 301231699999194903248244... 0500001104+ 01111 + 99999999999 ...)
(424,004 3012316999991949032418004... 0500001133 + 00781 + 99999999999 ...)

The keyword is linear deviation, which can be ignored in the map function. Map copies and captures the corresponding year and temperature fields, and finally releases them to the corresponding reduce function for further processing. The submitted data is as follows:

(1950, 0)
(1950, 22)
(1950, −11)
(1949,111)
(1949, 78)

This output is processed by the MapReduce framework before being sent to the reduce function. MapReduce framework processing includes sorting and grouping by keywords. The data we see before reduce processing is as follows:

(1949, [111, 78])
(1950, [0, 22, −11])

The reduce function iterates value based on the keyword, selects the maximum value as the key value, and outputs the value.

(1949,111)
(1950, 22)

The above is the final output result sequence. The following flowchart can be used to describe the entire process.

In the future, we will introduce MapReduce 2, namely YARN (Hadoop 2's new architecture ). To be continued!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.