Building parallel computing platforms based on Redis and R languages (yiyou)

Source: Internet
Author: User

Recently research Gearman found a lot of problems, about the problem of queue persistence for half a month or not to solve, and domestic can reference too little information, so consider changing a plan to try. The architecture of the Gearman cluster is posted as follows:


You can see the problem with this schema, and when persistence does not work, it can only be done by multiple jobserver running simultaneously to ensure the normal operation of the cluster. In addition the client and worker this data transmission needs through JobServer, cannot one step. This does not highlight the advantage when the volume of data is large.

I do r language some time, and the language in recent years compared to fire, for statistics, analysis, modeling, visualization, high efficiency. In order to undertake the results of previous research, the most ideal way is to implement distributed scheduling and parallel computing based on the existing R language. Based on the understanding of Gearman framework, this paper analyzes its realization mechanism,

Gearman is based on TCP/IP protocol connection, and the client does not know the worker information, through the JobServer intermediary function, completes the dispatch. Redis is an open source API that is written in ANSI C, supports the web, can be persisted in memory, key-value databases, and provides multiple languages. From March 15, 2010 onwards, the development work of Redis is hosted by VMware. Since May 2013, the development of Redis has been sponsored by pivotal. The use of Redis database can be used for communication between the two, but also the ability to quickly read and write data, based on saved information, to achieve transparency of client and worker information, compared with Gearman has many advantages.

Based on the above idea, combining the Redis and r language, the implementation mechanism of Gearman is optimized, and the parallel computing framework supporting R language and NoSQL database is yiyou. The code structure is as follows:

Watch is used to view queue information; Yiyou-server is similar to JobServer, which is mainly used for assignment of tasks and management of NoSQL databases; Yiyou-stop shuts down the entire parallel cluster; Yiyou-client-lib is the client program dependent Library. Can be based on this custom client program, Yiyou-worker-lib for the work of the application of the library, can be based on this custom work-end program, Yiyou-dlib is a dependent library for parallel computing, based on it, the single-threaded detachable R code smoothly ported to the cluster, no need to change the previous code, Examples are as follows:

It can be seen that the parallel computation based on the yiyou framework, which relies on the computing power of the cluster, provides a feasible solution for big data analysis of R language.

At present, the framework is immature, providing only parallel decomposition of tasks, which can be based on this to write a similar to the MapReduce logic, looking forward to more exciting content ...

Building parallel computing platforms based on Redis and R languages (yiyou)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.