The process of Map/reduce algorithm

Source: Internet
Author: User
Keywords nbsp; algorithm execution each
Tags blog code data data generation distributed distributed computing how to html
Views (316)/Comments (1)/Rating (0/0)

Http://hi.baidu.com/wuxiaoming1733/blog/item/a860bcfbe1f1f92a4e4aeae8.html

The process of the map/reduce algorithm is:

1. Partition (dividing data)
Dividing the data into 1000 parts, the process is automatically completed by Skynet

2, Map
In addition to dividing the data, you need to map the code that operates the data to each operator node to execute concurrently. Each of the 1000 nodes performs its own task, and the execution results are returned after execution.

3, Partition
These 1000-point execution results need to be merged, so we divide the data again, for example, divided into 10, the process is Skynet automatically

4. Reduce
Distribute the reduce code and reduce data to 10 nodes for execution, and return the data after each node completes. If needed again, reduce can be executed again. Finally reduce is a total result.

In fact, the principle of map/reduce algorithm is very simple, well, look at the Skynet below, how do we achieve it? In fact, we need to write only two methods of code: A map method, tells Skynet how to execute each data, a reduce method, tells Skynet how to merge each data, so this parallel algorithm eventually use Skynet to write, also very simple:


Ruby Code

Class Mapreducetest       include Skynetdebugger                     def self.map (datas)         results = {}         Datas.each do |data|           Results[data] | | = 0           Results[data] = 1        end                            [results]            end            def self.reduce (datas)         results = {}         Datas.each do |hashes|           Hashes.each do |key,value|         &Nbsp;   Results[key] | | = 0             Results[key] + = value           end       end        Results      end   end   class mapreducetest include Skynetdebugger def self.map (datas results = {} Datas.each do |data| Results[data] | | = 0 Results[data] + 1 end [results] End def self.reduce (datas) results = {} Datas.each do |hashes| Hashes.each do |key,value| Results[key] | | = 0 Results[key] + = value End End Results End





This is the simplest, but full ruby version of the Map/reduce code. We need to write a map method that tells Skynet to count the occurrences of each word, and we also need to write a reduce method to tell Skynet to merge the statistical results of each map. All right, all the rest of the work is skeynet.







The idea of
Map/reduce is very simple, in other words, any language can be achieved. Google's map/reduce is famous not because the idea is ingenious, but because it sums up distributed computing in a very simple way. Any distributed computing, the most core tasks are: 1 task divided into 2 data merge, if the task can not be divided, then use what distributed framework is useless. For example, the clustering calculation of super large matrices, if the algorithm itself can not be divided, then there is no way to distribute. So everything involved in distributed issues, division is the most important.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.