Statement:
This blog welcome reprint, but please keep the original author information, and please specify the source!
Guo Deqing
Team: Huawei Hangzhou OpenStack Team
The Sahara is designed to provide users with the ability to simply deploy Hadoop clusters, such as through a simple configuration: Hadoop version, cluster structure, node hardware information, and more. After the user has provided these parameters, Sahara quickly deploys the Hadoop cluster. It also supports the expansion and reduction of the cluster.
Its application scenarios include:
1) provides the ability to quickly configure and deploy Hadoop clusters on OpenStack.
2) Leverage the computing power of the OpenStack IaaS layer.
3) Provide analytics-as-a-service data analytics business, a bit like Amazon EMR.
The main features of Sahara include:
1) Sahara as a component of OpenStack.
2) managed by the dashboard call Rest API via OpenStack.
3) support for different Hadoop versions
4) Configurable Hadoop configuration template.
The Sahara class is OpenStack's Horizon (GUI), Keystone (providing authentication), Nova (to create a Hadoop cluster virtual machine), The Heat (Sahara can be configured to use Heat to coordinate the services required for a Hadoop cluster), Glance (for storing Hadoop virtual machine images), Swift (which can be used for data stored in Hadoop task processing), Cinder (for providing block storage), Neutron (providing network services), Ceilometer (for collecting information from the cluster for metering and monitoring purposes) have interaction.
The main work flow is introduced:
The common quick configuration cluster steps are as follows:
1) Select the Hadoop version
2) Select Mirror (if no pre-installed Hadoop,sahara in the mirror is also supported via pluggable deployment engine)
3) Set the parameters of the cluster: size, topology, etc.
4) Create the cluster: The Sahara will perform the installation of the virtual machine and the configuration of Hadoop.
5) Cluster Management: includes adding or removing nodes.
6) Delete the cluster
Common Analytical Services Workflow:
1) Select a pre-defined Hadoop version
2) Editing tasks
A) Select the task type: Pig, Hive, jar-file, etc.
b) Provide the script address of the task or the location of the jar package
c) Select the location of the input/output data
d) Select the location of the log
3) Set the size of the cluster
4) Perform the task
5) Get Task execution results
Sahara System Architecture Diagram:
The Sahara architecture contains several modules:
- Authentication module: Responsible for authentication and authorization, and Keystone Exchange.
- DAL (data access Layer): Related to database access.
- Supply engines (Provisioning engine): For and Components Nova, Heat, Cinder, glance switching
- Vendor plug-in: A plug-in form for configuring and starting a Hadoop service on a virtual machine. Existing solutions include: Apache Ambari and Cloudera (Hadoop data management software and service provider) Management Console.
- EDP (Elastic Data Processing): Responsible for scheduling and managing compute tasks on Hadoop clusters provided by Sahara.
- Rest API: Provides rest using the Sahara feature.
- Sahara Python client: Same as the CLI for other OpenStack components.
- GUI page for Sahara: Sahara related GUI is available on horizon.
Resources
Http://docs.openstack.org/developer/sahara/overview.html
Http://docs.openstack.org/developer/sahara/architecture.html
Basic concepts and architecture of Sahara