Windows Azure uses hdinsight for development

Source: Internet
Author: User
Keywords Azure hdinsight azure

Windows Azurehdinsight provides the ability to run a dynamic provisioning cluster of Apache Hadoop to handle large data. You can find more information in the first blog in this series, or click here to start using it in the Windows Azure Portal. This article enumerates several different ways for developers to interact with Hdinsight, first by discussing different scenarios, and then delving into the various features of Hdinsight. Because our product is built on Apache Hadoop, developers can take advantage of an ecosystem with a wide range of tools and functions.

Speaking of scenarios, there are two distinct scenarios for the customers we've worked with, creating, using tools to work with large data, and consolidating hdinsight in the application, integrating work inputs and outputs into a larger application architecture. A key design for hdinsight is the integration of Windows Azure Blob storage as the default file system. This means that you can access the data in BLOB storage using existing and tools and APIs ' s. This article explains in more detail how we use blob Storage.

There are a number of tools available for creating a job. Deep down, it has a set of tools as part of the existing Hadoop ecosystem, and a group of projects we've built to help. NET Developers start learning Hadoop, and we've started a new project to help developers interact with Hadoop using JavaScript.

Create work

Existing Hadoop Tools

Hdinsight is using Apache Hadoop via Hortonworks Data platform, which has a high fidelity to the Hadoop ecosystem. As a result, many functions are identical to the original. This means that the investment and knowledge of any of the tools listed below are available in Hdinsight. The distributed processing cluster is created by the following Apache project:

Map/reduce on Hadoop, Map/reduce is the basis of distributed processing.  To write a job, programmers can use Java, or use Hadoop streaming for other languages and runtimes. Hive Hive uses an SQL-like syntax to express a compiled set of queries compiled into a map/reduce program. Hive supports many of the structures in SQL (aggregation, grouping, filtering, and so on) and makes it easy to parallel these queries across nodes in your cluster. Pig Pig is a data flow language that compiles into a series of map/reduce programs using a language called Pig correlation. Oozie Oozie is a workflow scheduler used to manage the movement of a circular graph, where actions can be map/reduce, Pig, Hive, or other work. For more information, please click here to see the QuickStart Guide.

Here you can find the list of Hadoop component updates. The following table represents the versions of the components in the current preview version:

Apache Hadoop

1.0.3

Apache Hive

0.9.0

Apache Pig

0.9.3

Apache Sqoop

1.4.2

Apache Oozie

3.2.0

Apache Hcatalog

0.4.1

Apache Templeton

0.1.4

In addition, other items in the Hadoop space, such as mahout (see example) or cascading, can also be conveniently used on hdinsight. We will write another article about these topics in the future.

. NET Tools

We are working to develop a set of tools that developers can use. NET skills and investments to use Hadoop. These items are placed on the CodePlex and you can download the toolkit from NuGet to create the work that runs on Hdinsight. For these introductions, see QuickStart on the CodePlex site.

. NET map/reduce LINQ to Hive

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.