Basic concepts and architecture of Sahara

Source: Internet
Author: User

Statement:

This blog welcome reprint, but please keep the original author information, and please specify the source!

Guo Deqing

Team: Huawei Hangzhou OpenStack Team


The Sahara is designed to provide users with the ability to simply deploy Hadoop clusters, such as through a simple configuration: Hadoop version, cluster structure, node hardware information, and more. After the user has provided these parameters, Sahara quickly deploys the Hadoop cluster. It also supports the expansion and reduction of the cluster.

Its application scenarios include:

1) provides the ability to quickly configure and deploy Hadoop clusters on OpenStack.

2) Leverage the computing power of the OpenStack IaaS layer.

3) Provide analytics-as-a-service data analytics business, a bit like Amazon EMR.

The main features of Sahara include:

1) Sahara as a component of OpenStack.

2) managed by the dashboard call Rest API via OpenStack.

3) support for different Hadoop versions

4) Configurable Hadoop configuration template.

The Sahara class is OpenStack's Horizon (GUI), Keystone (providing authentication), Nova (to create a Hadoop cluster virtual machine), The Heat (Sahara can be configured to use Heat to coordinate the services required for a Hadoop cluster), Glance (for storing Hadoop virtual machine images), Swift (which can be used for data stored in Hadoop task processing), Cinder (for providing block storage), Neutron (providing network services), Ceilometer (for collecting information from the cluster for metering and monitoring purposes) have interaction.



The main work flow is introduced:

The common quick configuration cluster steps are as follows:

1) Select the Hadoop version

2) Select Mirror (if no pre-installed Hadoop,sahara in the mirror is also supported via pluggable deployment engine)

3) Set the parameters of the cluster: size, topology, etc.

4) Create the cluster: The Sahara will perform the installation of the virtual machine and the configuration of Hadoop.

5) Cluster Management: includes adding or removing nodes.

6) Delete the cluster

Common Analytical Services Workflow:

1) Select a pre-defined Hadoop version

2) Editing tasks

A) Select the task type: Pig, Hive, jar-file, etc.

b) Provide the script address of the task or the location of the jar package

c) Select the location of the input/output data

d) Select the location of the log

3) Set the size of the cluster

4) Perform the task

5) Get Task execution results

Sahara System Architecture Diagram:


The Sahara architecture contains several modules:

    1. Authentication module: Responsible for authentication and authorization, and Keystone Exchange.
    2. DAL (data access Layer): Related to database access.
    3. Supply engines (Provisioning engine): For and Components Nova, Heat, Cinder, glance switching
    4. Vendor plug-in: A plug-in form for configuring and starting a Hadoop service on a virtual machine. Existing solutions include: Apache Ambari and Cloudera (Hadoop data management software and service provider) Management Console.
    5. EDP (Elastic Data Processing): Responsible for scheduling and managing compute tasks on Hadoop clusters provided by Sahara.
    6. Rest API: Provides rest using the Sahara feature.
    7. Sahara Python client: Same as the CLI for other OpenStack components.
    8. GUI page for Sahara: Sahara related GUI is available on horizon.

Resources

Http://docs.openstack.org/developer/sahara/overview.html

Http://docs.openstack.org/developer/sahara/architecture.html



Basic concepts and architecture of Sahara

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.