Use Windows Azure VM to install and configure CDH to build a Hadoop Cluster

Source: Internet
Author: User

Use Windows Azure VM to install and configure CDH to build a Hadoop Cluster

This document describes how to use Windows Azure virtual machines and NETWORKS to install CDH (Cloudera Distribution Including Apache Hadoop) to build a Hadoop cluster.

The project uses CDH (Cloudera Distribution Including Apache Hadoop) in the private cloud to build a Hadoop cluster for big data computing. As a loyal fan of Microsoft, deploying CDH to Windows Azure virtual machines is an inevitable choice. Because CDH contains multiple open-source services, virtual machines need to provide a large number of open ports. The network of Virtual machines in Windows Azure is securely isolated. Therefore, multiple Virtual machines are created in the Virtual machines service of Windows Azure to install Hadoop cluster. The best solution is to create a Virtual network for the Hadoop cluster, resources and Services in a virtual network are like mutual access in a virtual private cloud, which is isolated from other resources in the virtual network to achieve security.

What is CDH?

CDH is the distribution of Apache Hadoop and related projects. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls.

 

Create a virtual network in Windows Azure

 

  1. Log on to the Windows Azure Management Portal and click "new" in the lower left corner ".

  1. In the Navigation Pane, click Network, virtual network, and custom create ".

  1. On the "virtual network details" screen, enter the virtual network configuration information and click the "Next" arrow. The configuration information entered here includes the virtual network name, geographic group region, and geographic group name.

A geo Group is a method used to physically combine Windows Azure services in the same data center to improve performance. Only one virtual network can be assigned a geographical group.

  1. Set DNS Server and VPN Connectivity. This step is not skipped. You need to set it again after the virtual network is created.

 

 

  1. On the "address space and subnet" screen, enter the following information and click the "Next" arrow. The address space must be the specific address range specified in CIDR Notation: 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16 (specified by RFC 1918 ). In this example, the Starting IP address is 192.168.0.0.

Click the right icon in the lower-right corner. In this case, Windows Azure creates your virtual network based on the submitted configuration.

 

 

In this case, you already have a virtual network in Windows Azure. You can view it on the "virtual network" tab of the portal. For more detailed configuration methods, refer to the Windows Azure official documentation to create a virtual network in Windows Azure.

 

Create a Linux VM from the Windows Azure Image Library

 

To create a Linux virtual machine, see the document create a virtual machine running Linux http://www.windowsazure.cn/zh-cn/manage/linux/tutorials/virtual-machine-from-gallery/ on Windows Azure

Note that in the "VIRTUAL machine configuration" dialog box, select the virtual network created in the previous step in the "REGION/affinity group/virtual network" option. In this example, select the virtual network "hadoopclusternetwork" created by the author ".

 

Open the following port for the virtual machine, that is, set the following Endpoints in the virtual machine configuration.

  • Enable port for Virtual machines
    • 7180 (Cloudera Manager web UI)
    • 8020,500 10, 50020,500 70, 50075 (HDFS NameNode and DataNode)
    • 8021 (MapReduce JobTracker)
    • 8888 (Hue web UI)
    • 9083 (Hive/HCatalog metastore)
    • 41415 (Flume agent)
    • 11000 (Oozie server)
    • 21050 (Impala JDBC port)

 

You can create multiple VMS in the same way to form a Hadoop cluster. Note that multiple VMS use the same virtual network.

Install CDH

 

Configure the HOSTNAME of each host

Vi/etc/sysconfig/network

Modify the HOSTNAME

Configure/etc/hosts on each host

Vi/etc/hosts

After modifying the HOSTNAME and hosts, we recommend that you restart.

Disable Firewall

Run the following command under root to temporarily disable the firewall.

Service iptables stop

Run the following command under root to permanently disable the firewall (that is, the firewall is disabled every time it is started), but it must be restarted to take effect.

Chkconfig iptables off

 

Disable SELinux

$ Setenforce 0

To disable selinux permanently, edit/etc/SELINUX/config and set selinux = disabled.
And then complete the installation.

 

Change Cloudera-manager-installer.bin Permissions

$ Chmod u + x cloudera-manager-installer.bin

$./Cloudera-manager-installer.bin

Next, accept the license agreement, press Enter and Next,

The installation interface is as follows:

 

Start the Cloudera Manager Admin Console

Through the Cloudera Manager Admin console, you can configure, manage, monitor Hadoop on the cluster, WEB-side URL is: http://myhost.example.com: 7180, myhost.example.com is your host domain to install Cloudera-Manager-installer.bin, of course, IP addresses can also be used. For example, my WEB-side URL is: http://hcc1.cloudapp.net: 7180. Follow the prompts to install the tool. The installation successful page is shown in.

 

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.