Analysis of cloud computing distributed parallel computing: programming model

Source: Internet
Author: User
Keywords Cloud
MapReduce is a distributed programming model developed by Google for mass data processing in large-scale groups. It implements two functions: map applies a function to all members of the collection, and then returns a result set based on this processing. and reduce is the classification and generalization of result sets that are processed in parallel by multiple threads, processes, or stand-alone systems from two or more maps. The Map () and Reduce () two functions may run in parallel, even if not at the same time on the same system.

Microsoft released its distributed Parallel computing platform--dryad Beta on December 21, 2010, becoming a competitor to Google's MapReduce distributed data computing platform. It allows developers to write large-scale parallel application models on Windows or. NET platforms and can be written on a stand-alone program is very easy to run on the Distributed Parallel computing platform, programmers can use the data Center server cluster to the data parallel processing, when the program developers in the operation of thousands of machines, without concern about distributed parallel processing system Details. This article will focus on Microsoft's latest Dryad platform functional principles and applications.

Dryad platform is also one of the key technologies to build Microsoft's cloud computing infrastructure. The real "landing" of cloud computing faces two major issues: how to build a massive infrastructure that is tightly coupled to the application? The infrastructure for building distributed platforms consists of Dryad, Dynamo and mapreduce frameworks.


▲ Figure 1 Data parallel computation

Another problem is how to build a new cloud-computing application that provides a richer user experience on the web. Yahoo expands the MapReduce and proposes a mapreducemerge framework that can be applied to multi-core processors. HP focuses its attention on the use of distributed shared memory, unlike MapReduce programming. IBM mainly uses Linux system images and Hadoop software (Google File system and MapReduce open source implementations). Microsoft has independently developed Dryad and dryadlinq, and it can be used to assist C # developers in distributed parallel processing of large-scale data in a computer cluster or data center, thereby increasing the performance and efficiency of program execution by several times.

Dryad and Dryadlinq are Mines created research projects designed to provide a distributed parallel computing platform, DRYADLINQ provides a high-level language interface that allows ordinary programmers to easily perform large-scale distributed computing, It combines Microsoft Dryad and LINQ two key technologies to build applications on this platform. Dryad the location relationship with the Microsoft architecture, as shown in Figure 2.


▲ Fig. 2 Relationship between Dryad and Microsoft architecture

Dryad like MapReduce, it is not only a programming model, but also an efficient task scheduling model. Dryad This programming model is not only suitable for cloud computing, but also has good performance on multi-core and multiprocessor as well as heterogeneous cluster.

We know that in visual Studio C + + There is a parallel computing programming framework that supports commonly used collaborative task scheduling and hardware resources (such as CPU and memory) management, through the work stealing algorithm can fully utilize the advantages of fine granularity parallelism, To ensure that idle threads are modeled by a certain policy, "steal" tasks from all thread queues, so that tasks and data granularity can be parallel. If a time-consuming task is only roughly split into four subtasks and executed concurrently, even if it is running on a four-core CPU, it will not be able to perform real-time dynamic load balancing, and three subtasks may have been completed very early, while another task is waiting on one core.

Dryad is similar to the above parallel framework, the same can be done to the computer and their CPU scheduling, the difference is that Dryad is designed to scale in a wide range of cluster computing platform, whether it is a single multi-core computer or to a cluster of multiple computers, and even has thousands of of computers in the data center, You can implement a programming framework for distributed parallel computing by modeling policies that are created in the task queue.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.