1. Partition (dividing data) Dividing the data into 1000 parts, the process is automatically completed by Skynet
2, Map In addition to dividing the data, you need to map the code that operates the data to each operator node to execute concurrently. Each of the 1000 nodes performs its own task, and the execution results are returned after execution.
3, Partition These 1000-point execution results need to be merged, so we divide the data again, for example, divided into 10, the process is Skynet automatically
4. Reduce Distribute the reduce code and reduce data to 10 nodes for execution, and return the data after each node completes. If needed again, reduce can be executed again. Finally reduce is a total result.
In fact, the principle of map/reduce algorithm is very simple, well, look at the Skynet below, how do we achieve it? In fact, we need to write only two methods of code: A map method, tells Skynet how to execute each data, a reduce method, tells Skynet how to merge each data, so this parallel algorithm eventually use Skynet to write, also very simple:
Ruby Code
Class Mapreducetest include Skynetdebugger def self.map (datas) results = {} Datas.each do |data| Results[data] | | = 0 Results[data] = 1 end [results] end def self.reduce (datas) results = {} Datas.each do |hashes| Hashes.each do |key,value|      &Nbsp; Results[key] | | = 0 Results[key] + = value end end Results end end class mapreducetest include Skynetdebugger def self.map (datas results = {} Datas.each do |data| Results[data] | | = 0 Results[data] + 1 end [results] End def self.reduce (datas) results = {} Datas.each do |hashes| Hashes.each do |key,value| Results[key] | | = 0 Results[key] + = value End End Results End
This is the simplest, but full ruby version of the Map/reduce code. We need to write a map method that tells Skynet to count the occurrences of each word, and we also need to write a reduce method to tell Skynet to merge the statistical results of each map. All right, all the rest of the work is skeynet.
The idea of
Map/reduce is very simple, in other words, any language can be achieved. Google's map/reduce is famous not because the idea is ingenious, but because it sums up distributed computing in a very simple way. Any distributed computing, the most core tasks are: 1 task divided into 2 data merge, if the task can not be divided, then use what distributed framework is useless. For example, the clustering calculation of super large matrices, if the algorithm itself can not be divided, then there is no way to distribute. So everything involved in distributed issues, division is the most important.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.