Classes in ruby Google Map/reduce framework Skynet introduction _ruby topics

Source: Internet
Author: User

Skynet is a very loud name because it is the super computer network that dominates humanity in the classic series "Terminator" starring Arnold Schwarzenegger has. But this article's Skynet is not so scary, it is a ruby version of the Google Map/reduce framework of the name.

Google's map/reduce framework is too famous, he can cut a task into a lot of, to n computers in parallel, the results returned by parallel merge, finally get the results of the operation. It is said that Google a search results will map to 7000 servers in parallel execution, so how terrible the distributed computing power Ah! With Map/reduce, programmers can use simple code to write robust, parallel distributed applications without needing to focus on distributed frameworks, and can give full play to computer cluster operations.

There are already several frameworks for implementing Map/reduce algorithms, the most famous of which may be Yahoo's Open-source project Hadoop, but Hadoop is not written in Ruby, but in Ruby's world, Adam Pisoni has developed a ruby version of the Map/reduce framework, which is Skynet.

Adam Pisoni developed Skynet because Adam Pisoni's company Geni.com is an internet site located in the family SNS. The news push function provided by the website requires that it can extract the content of interest from a large number of users and push it to the user. This is actually a distributed computing model, to be able to distribute the task to more than one server to perform, and finally merge the task back. Adam Pisoni did not find the right framework, eventually developed its own Skynet, using the map/reduce algorithm to provide this distributed computing platform.

Using Skynet to develop map/reduce distributed applications is very simple, let's give a simple example to see: Assuming there is a 1GB text file, our task is to count the number of occurrences of each word in the file. The traditional approach is of course very simple, in order to read the contents of the file, the word statistics on the line, but there is no doubt that the execution speed will be very slow. If we have an operational cluster of 1000 servers, how can we use Skeynet to execute this program concurrently, thus shortening the statistical time?

The process of the map/reduce algorithm is:

1. Partition (dividing data)
Dividing the data into 1000 parts, the process is automatically completed by Skynet

2, Map
In addition to dividing the data, you need to map the code that operates the data to each operator node to execute concurrently. Each of these 1000 nodes performs its own task, and the execution results are returned after execution.

3, Partition
These 1000-point execution results need to be merged, so we divide the data again, for example, divided into 10, the process is also Skynet automatic completion

4. Reduce
The reduce code and reduce data are distributed to 10 nodes for execution, and the data is returned after each node completes execution. If needed again, reduce can be executed again. Finally reduce is a total result.

In fact, the principle of map/reduce algorithm is very simple, well, look at the Skynet below, how do we achieve it? In fact, we need to write only two methods of code: A map method, tells Skynet how to execute each data, a reduce method, tells Skynet how to merge each data, so this parallel algorithm eventually use Skynet to write, also very simple:

Copy Code code as follows:

Class Mapreducetest
Include Skynetdebugger

def self.map (datas)
results = {}
Datas.each do |data|
Results[data] | | = 0
Results[data] + + 1
End
[Results]
End

def self.reduce (datas)
results = {}
Datas.each do |hashes|
Hashes.each do |key,value|
Results[key] | | = 0
Results[key] + = value
End
End
Results
End
End

This is the simplest, but full ruby version of the Map/reduce code. We need to write a map method that tells Skynet to count the occurrences of each word, and we also need to write a reduce method to tell Skynet to merge the statistical results of each map. All right, all the rest of the work is skeynet to take over, isn't it easy!

Of course, let the map/reduce run. We also need to do some work, such as installing Skynet, configuring Skynet parallel nodes, and so on, these trivial work can look Skynet own document: http://skynet.rubyforge.org/ doc/index.html, no more details.

It's worth mentioning that Skynet can be well integrated with the rails framework, and you can throw some of the most time-consuming, map/reduce jobs in rails to Skynet and run them asynchronously, for example:

Copy Code code as follows:

Mymodel.distributed_find (: All,: Conditions => "created_on < ' #{3.days.ago}"). Each (: Some_method)

The last 3 days since all the model query processing to perform after the time-consuming operation Some_method to Skynet, let Skynet use his powerful computing network to carry out.

You can also execute asynchronously:

Copy Code code as follows:

Model_object.send_later (: Method, Options,: Save)

Take the time-consuming tasks to Skynet and execute them asynchronously.

Skynet is a great tool for web2.0 sites that have powerful computing networks and require a lot of time-consuming operations, and he can make it easy for programmers to write robust and efficient distributed applications!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.