Big Data operation Model MapReduce principle

Source: Internet
Author: User

Big Data operation Model MapReduce principle2016-01-24 Du Yishu

MapReduce is a parallel operation model of a large data set, proposed by Google, and the use of MapReduce as a computational model in today's popular Hadoop

MapReduce Popular explanation


The library to count the number of books, there are 10 shelves, the administrator to speed up the statistics, to find 10 students, each student is responsible for the statistics of a bookshelf book number

Zhang Classmate Statistics Bookshelf 1
Wang Classmate Statistics Bookshelf 2
Liu Classmate Statistics Bookshelf 3
......


After a while, 10 students in succession to the administrator to report their own statistics, the administrator put together the numbers, they got the total number of books

This process can be understood as the working process of MapReduce.

There are two core operations in MapReduce

(1) Map

The administrator assigns which classmate counts which bookshelf, each student carries on the same "statistic" operation, this process is the map

(2) Reduce

The administrator summarizes the results of each classmate, the process is reduce

Work process disassembly of MapReduce


Here's a classic case (word count) to see how MapReduce works.

There was a text file that was divided into 4 parts, which were stored in 4 servers.

Text 1:the weather is good
Text 2:today is good
Text 3:good weather is good
Text 4:today has good weather

Requirements : Count the number of occurrences of each word

Processing process

01

word processing


Map Node 1

Input: (Text1, "The weather is good")

Output: (the, 1), (weather, 1), (is, 1), (good, 1)



Map Node 2

Input: (Text2, "Today is good")

Output: (Today, 1), (is, 1), (good, 1)



Map Node 3

Input: (Text3, "Good weather is good")

Output: (good, 1), (weather, 1), (is, 1), (good, 1)



Map Node 4

Input: (Text3, "Today has good weather")

Output: (Today, 1), (has, 1), (good, 1), (weather, 1)



02

Sort


Map Node 1



Map Node 2



Map Node 3



Map Node 4



03

Merge


Map Node 1



Map Node 2



Map Node 3



Map Node 4



04

Summary statistics


MapReduce introduced the concept of barrier , some translated as "synchronization Barrier", which I understand as "dividing line", is a dividing line into reduce

barrier 's role is to combine the combined results

For example, using 3 reduce nodes, you need to regroup the results of the 4 map nodes above, put the same words together, and assign them to 3 reduce nodes

Reduce node statistics to calculate the final result

Big Data operation Model MapReduce principle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.