Let Hadoop run Savanna on top of OpenStack

Last Update:2014-08-11 Source: Internet

Author: User

Keywords Hadoop OpenStack Savanna

Tags access allow users apache configuration configure different hadoop hardware

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Apache Hadoop is now widely adopted by organizations as the industry standard for MapReduce implementations, and the Savanna project is designed to allow users to run and manage Hadoop over OpenStack . Amazon has been providing Hadoop services over EMR (Elastic MapReduce) for years.

Savanna needed information from users to build clusters such as Hadoop's version, cluster topology, node hardware details, and some other information. Within a matter of minutes Savana will help users set up the cluster, and Savanna will also be able to help users scale the cluster (add or remove work nodes) as needed.

Program for the following use cases:

Quickly configure Hadoop clusters for Dev and QA

Provide "analytics as a service" for dedicated or unexpected analytic workloads (similar to EMR in AWS)

Take advantage of unused computing power in a generic OpenStack IaaS cloud .

The main features are as follows:

Appears as an OpenStack component

Managed through the REST API, the user interface as part of the OpenStack Dashboard.

Support for multiple Hadoop distributions:

Pluggable system as a Hadoop installation engine.

Integrates vendor-specific management tools such as Apache Ambari or Cloudera Managent Console.

Pre-defined templates for Hadoop configuration, with configuration parameters.

Details

Savanna products primarily communicate with the following OpenStack components:

Horizon - Provides a GUI to use all of Savanna's features.

Keystone - Authenticates users and provides security tokens to communicate with OpenStack to assign users specific OpenStack privileges.

Nova - Configure virtual machines for Hadoop clusters.

Glance - used to store Hadoop virtual machine images, each mirror contains the installed OS and Hadoop; pre-installed Hadoop should give us the convenience of node layout.

Swift - can be used as pre-storage for Hadoop jobs.

Regular workflow

Savanna gives users two APIs and UIs at different levels of abstraction based on use cases: Cluster configuration and analysis as a service.

The workflow for cluster quick configuration includes the following options:

Choose Hadoop version

Select whether or not to include the base image of pre-installed Hadoop

For a base image that does not pre-install Hadoop, Savanna will provide a pluggable deployment engine that incorporates vendor tools.

Define the cluster configuration, including the cluster size and topology, and set different Hadoop parameters (such as heap size).

Configurable templates will be provided for easy configuration of parameters.

Cluster Configuration: Savanna will provide virtual machines, install and configure Hadoop.

Action on the cluster: Add and remove nodes.

Stop the cluster when it is not needed.

The workflow for Analytics as a Service includes the following options:

Choose a predefined version

Configure the job:

Choose the type of job: pig, hive, jar-file, and so on

Provide job script source or jar path

Choose input and output data path (initially only supports Swift)

Choose the path for the log

Set the cluster size limit

Execute the job:

All cluster configurations and job executions are clearly presented to the user

After the job is completed, the cluster is automatically removed

Retrieve the calculation (for example from Swift)

User side

When using Savanna to configure the cluster, the user operates on two types of entities: Node Template and Cluster.

The Node Template is used to describe the nodes in a cluster and contains several parameters. The node type belongs to one of the properties of the Node Template and will determine what Hadoop will run on the node and determine the role that the node plays in the cluster, which may be Job Tracker, NameNode, TaskTracker, DataNode, or both Logical combination. The Node Template also holds hardware parameters that are used by the node's virtual machines and the Hadoop's work on the node.

The Cluster entity is used to describe Hadoop Cluster and describes the pre-installed Hadoop virtual machine features for cluster deployment and cluster topology. The topology is a list of node templates and the number of deployed nodes per template. With regard to the topology, Savanna verifies that the NameNode and JobTracker in the cluster are unique.

Each node template and cluster belong to the tenant assigned by the user, and the user can only access the object that has accessed the tenant. Users can only edit or delete objects they create. Of course, administrators can access all objects. Savanna needs to follow the same OpenStack access policy.

Savanna offers a variety of Hadoop cluster topologies, and Job Tracker and NameNode processes can choose to run on one or two separate virtual machines. The same cluster can contain many types of work nodes, work nodes can act as both TaskTracker and DataNode, also can play a role. Savanna allows users to set up a cluster with any combination of options.

Integration with Swift

In OpenStack, Swift is stored as a standard object, similar to Amazon S3. Often deployed on physical hosts, Swift has many enhancements to use as "HDFS on OpenStack."

The first file system for Swift: HADOOP-8545, so Hadoop jobs can run on Swift. In Swift, we have to change the request to Change I6b1ba25b. It maps endpoints to Object, Account, or Container lists so Swift can be integrated with software that relies on data location information to avoid network overhead.

Pluggable deployment and monitoring

Surveillance comes from vendor-specific Hadoop management tools, and Savanna integrates with Nagios and Zabbix pluggable external monitoring systems.

Deployment and monitoring tools will be installed on separate virtual machines, allowing a single instance to manage or monitor different clusters at the same time.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More