Grid Computing Based on azure cloud computing platform, Part 1: Development of grid applications

Source: Internet
Author: User

In part 1 of this series, we introduced the design model for grid computing on azure. In this articleArticle, We will use C # To develop a grid applicationProgramTo achieve this mode; in Part 2, we will first run this application locally and then run it on the cloud. To implement these functions, we need the auxiliary functions provided by the grid computing framework.

Role of the grid framework

Unless you are preparing to write a large number of underlying infrastructure software, you should select a framework for your grid application to eliminate heavy work and focus on the application.Code. Although azure has implemented a lot of services you want in the grid computing infrastructure, you still need to add grid-specific features between azure and grid applications. An excellent grid computing framework should do the following for you:

    • Provides scheduling and control capabilities for running jobs.
    • Retrieves input data from the underlying storage.
    • Generate tasks for grid actuators for execution
    • Distribute tasks to available executors
    • Tracking Task status when executing applications on the grid
    • Collect results from the actuator
    • Store results in the underlying storage

Shows how the framework combines grid applications with the azure platform. Application developers only need to write application-specific code to load input data, generate tasks, execute tasks, and save result data. This framework provides all the required features that greatly leverage the features of the azure platform.

In this article, we will use azure grid,CommunityVersion. Azure GRID provides four software components to implement all the features listed below:

    • The loader allows you to add your own code to extract input data from underlying resources and generate tasks.
    • Executor roles allow you to add your own code to execute application tasks.
    • Aggregators allow you to add your own code to store the results back to the underlying resources.
    • The grid Manager allows you to start work and monitor their execution.

Azure grid uses cloud resources only during the execution of your grid application, so that your fees are minimized. The underlying storage stores input data, results, and azure Grid tracking databases. Cloud storage is used to transmit parameters and collect results in the process of communicating with the actuator, and clears them when your grid application is executed. Once your grid application is executed, you can also suspend the running instance of the grid executor when you are idle, so you do not have to pay for the storage and computing time.

Application: fraud check

The application we will code is a fictitious fraud check program that uses certain rules to calculate the applicant's data to obtain the fraud likelihood score. Each applicant's record is processed as a grid task. The requester record has the following structure:

By applying business rules on the applicant's record, the fraud check program can calculate a fraud likelihood score between 0 and 1000, and 0 indicates the worst possible score. If the score is lower than 500, the application may be rejected.

Design grid applications

When designing a grid application, you need to determine the best way to divide your work into independent tasks that can be executed in parallel. You must first consider two key issues:

    • On what basis do you divide your work into tasks?
    • How many different types of tasks are there?

In the fraud check example, it makes sense to create a separate task for each requester record: the evaluation of fraud scores for each record is an atomic operation, and after all records are processed, their order does not matter.

For fraud check, only one task type is required. We name it "fraudscore ". The fraudscore task is to calculate fraud scores for applicants.

These tasks need to read input data and generate result data. The input data of fraudscore is recorded by the applicant, and the result data is a text field added to the spoofing score to explain the cause of the score. The parameters and returned results required by fraudscore are displayed in the following together with their names.

In some grid computing applications, you may need to access additional resources, such as databases or web services, when a task is completed. Fraudscore does not have such requirements. However, if necessary, you can enter parameters to provide required information, such as the Web service address and database connection string.

Develop grid applications

Now, the input parameters, tasks, and result fields of our grid application have been defined. We can continue to write the application. Azure grid only requires us to write code for loaders, application tasks, and aggregator.

Compile the loader code

The loader Code reads input data and generates tasks with parameters. Most of the time, the data comes from the database, but fraud check is written to read the input data from the workbook.

Azure GRID provides a template for your loader to start encoding through an apploader class. You need to implement the generatetasks method to obtain your input data and generate a task with the task type name and parameters. Your code creates a task object and returns it as an array. In the base class, gridloader processes your tasks as queues and then stores them in the cloud storage where the tasks are executed.

To implement the fraud check loader, we use the following code to replace the sample code created by the task, read records from the CSV data table, and create a task for each record.

The first line of the input data table should contain the parameter name, and the subsequent rows should contain the value, as shown previously. The process of creating a task is simple, that is, initializing a task object and assigning it to the constructor the following information:

    • Project name: The project name of your application. This is read from the configuration file settings.
    • Job ID: ID of the job running, a string. This value is provided externally to the generatetasks method.
    • Task ID: the unique identifier of the task, an integer.
    • Task Type: name of the task to be run.
    • Task status: Set it to task. Status. Pending to indicate that the task is not running yet.
    • Parameters: a dictionary set object for parameter names and values.
    • Results: NULL -- the result will be set later when the grid executor executes the task.

Add a task to a list set. Once all the tasks are generated, the list. toarray () is passed as the result to the loader, which queues these tasks to cloud storage.

Compile the aggregator code

After writing the loader, It is the aggregator, which processes the task results and stores them locally.

Azure grid uses a class named appaggregator to provide a template for your aggregators to start encoding. Three methods are required:

    • Openstorage, called when the first result is ready for processing, gives you the opportunity to open storage resources.
    • Storeresult, called when each result needs to be saved. The input parameters and results are transmitted in XML format.
    • Closestorage, called after the last result is saved, gives you the opportunity to close storage resources.

In the base class, gridaggregator processes results from cloud storage and calls your method to store these results.

In storeresult, parameters and results of the current task are passed in XML in the following format:

To implement the collection of fraud check, we will do the opposite of the loader, that is, add each result to the CSV file of the workbook.

    • In openstorage, open a. CSV file to accept the output and write the result to the row and column of the CSV file in the workbook.
    • In storeresult, the results (and the first and last names of input parameters contained in this context) are extracted from XML and written into the CSV file.
    • In closestorage, the file is closed.

Compile application task code

After compiling the loader and aggregator, you need to write the application code. The appworker class is used to include application task code. The current task is passed to an execute method, which checks the Task Type to determine which task code to execute.

For fraud check, use the switch statement in our application to check the type of our task-fraudscore, and execute the code to calculate the fraud likelihood Score Based on the applicant data in the input parameter.

The first business logic of the fraudscore code is to extract input parameters. In the task object, you can access them one by one through a dictionary set of names and string values.

Next, execute a series of business rules to calculate the score. The following is an excerpt:

Finally, fraudscore updates the result attributes of the task. It is also easy to set the name and string value in the dictionary set.

The basic class gridworker and workerrole queues the results to cloud storage and will be retrieved by the aggregator later.

Prepare for running

We have developed our own grid application and are ready to run it. Let's take a look at what we have just done: A framework is used to implement the loader, aggregator, and task code. We only need to write application-specific code.

The rest is to run the application. For grid applications, you should always perform a careful test and run a small number of tasks locally. Once you have confidence in your application design and code integrity, you can move to the cloud for large-scale execution. The next article in this series (Part 1) describes how an application runs.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.