Schematic mapreduce principle and execution process
Source: Internet
Author: User
Keywordsnbsp; same execute diagram text
Description:
&http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; The following figure is from Nanjing University computer Department Huang Yihua Teacher of the MapReduce course courseware, here a little collation and summary.
This article aims to contact the MapReduce, but the workflow of MapReduce is still not very clear personnel, of course, including bloggers themselves, hope to learn with you.
The principle of
MapReduce
MapReduce borrowed from the idea in the functional programming language Lisp, Lisp (List 處理) is a sort of lists processing language that can handle list elements as a whole.
such as: (Add # (1 2 3 4) # (4 3 2 1) will produce results: # (5 5 5 5)
MapReduce is similar to Lisp because MapReduce in the final reduce phase is also the operation of the column with key.
The following picture is how MapReduce works.
1 First document data records (such as lines in the text, or rows in the data table) are passed into the map function in the form of a "key value pair", which is then processed by the map function (such as statistical frequency) and then output to the intermediate result.
2 before the key value to enter reduce processing, must wait until all map functions have been completed, so both to achieve this synchronization and improve operational efficiency, the process in the middle of the MapReduce introduced barrier (synchronization barrier)
Completes the statistics of the intermediate results of the map while synchronizing, including a. Merges the value values of the same key for the same map node, B. The same key value pairs from different maps are then sent to the same reduce for processing.
3 in the reduce phase, each reduce node obtains a key-value pair that has the same key from all map nodes. The reduce node merges these key values.
take frequency statistics as an example.
Frequency statistics is to count the number of occurrences of a word in all texts, and the case program in Hadoop is WordCount, commonly known as the "Hello World" of Hadoop programming.
Because 11545.html "> we have more than one text, we can count the number of words appearing in each text in parallel, and then make the final totals.
So this can well reflect the map,reduce process.
It can be found that this picture is the further refinement of the above diagram, mainly reflected in:
1 The Combiner node is responsible for merging the same key in the same map as mentioned above to avoid repetitive transmission, thereby reducing communication overhead in the transmission.
2 The Partitioner node is responsible for dividing the intermediate results produced by the map to ensure that the same key reaches the same value.
Original link: http://blog.csdn.net/michael_kong_nju/article/details/23826979
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.