How to use Windows Azure to build a Hadoop cluster

Source: Internet
Author: User
Keywords Virtual network installation manager
Tags address apache based big data click cloud cloudera computing

Projects in the private cloud using CDH (Cloudera Distribution Including Apache Hadoop) Hadoop cluster for big data computing. As a big fan of Microsoft, deploying CDH into Windows Azure VMs is my inevitable choice. Because there are multiple open source services in the CDH, there are many ports that virtual machines need to open. Virtual machines in Windows Azure are securely isolated from one another, so creating multiple virtual machines in Virtual machines' services in Windows Azure To install a Hadoop cluster, the best solution is to create a virtual network for a Hadoop cluster, Resources and services are accessed as if they were in the virtual private cloud and isolated from other resources outside the virtual network for security.

What is CDH?

CDH is the distribution of Apache Hadoop and related projects. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls.

Create a virtual network in Windows Azure

Log in to the Windows Azure Management Portal and click New in the lower left corner.

In the navigation pane, click Network, click Virtual Network, and then click Create Custom.

On the Virtual Network Details screen, enter the configuration information for your virtual network and click the Next arrow. The configuration information entered here includes the name of the virtual network, the name of the geo-group area, and the geo-group.

Geotagging is a way to physically combine Windows Azure services to improve performance in the same data center. Geo groups can be assigned to only one virtual network.

Set DNS Server and VPN Connectivity. This step is not skipped. Set this parameter after the virtual network is created.

On the Address Spaces and Subnets screen, enter the following information and click the Next arrow. The address space must be a private address range specified with the CIDR notation: 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16, as specified by RFC 1918. In this case, the choice is Starting IP 192.168.0.0 selected.

Click the checkmark button in the lower right corner, and Windows Azure will create your virtual network based on the submitted configuration.

At this point, you have a virtual network in Windows Azure that you can see on the portal's "Virtual Networks" tab. More detailed configuration methods can refer to the official Windows Azure documentation to create a virtual network in Windows Azure.

Create a Linux virtual machine from your Windows Azure Image Gallery

The steps to create a Linux virtual machine can refer to the document "Create a virtual machine running Linux" on Windows Azure http://www.windowsazure.cn/en/manage/linux/tutorials/virtual-machine-from-gallery/

Note that the virtual network created in the previous step is selected in the "REGION / AFFINITY GROUP / VIRTUAL NETWORK" option in the Virtual Machine Configuration dialog box. Selected in this case is the author created virtual network "hadoopclusternetwork".

Open the following port for the virtual machine, that is, set the following Endpoints in the virtual machine configuration.

Enable port for Virtual machines

7180 (Cloudera Manager web UI)

8020, 50010, 50020, 50070, 50075 (HDFS NameNode and DataNode)

8021 (MapReduce JobTracker)

8888 (Hue web UI)

9083 (Hive / HCatalog metastore)

41415 (Flume agent)

11000 (Oozie server)

21050 (Impala JDBC port)

Use the same way to create multiple virtual machine composition Hadoop cluster, note that multiple virtual machines use the same virtual network.

Install CDH

Configure / etc / hosts on each host

Turn off the firewall

Run the following command in the root to temporarily shut down the firewall

service iptables stop

Run the following command in the root directory to shut down the firewall permanently (that is, it will be shut down each time you power on the switch), but it needs to be restarted to take effect.

chkconfig iptables off

Turn off SELinux

$ setenforce 0

If you need to permanently shut down, edit / etc / selinux / config and set SELINUX = disabled

, Then complete the installation.

Change the Cloudera-manager-installer.bin permission

$ chmod u + x cloudera-manager-installer.bin

$ ./cloudera-manager-installer.bin

Next, accept the license agreement, press Enter and Next,

The installation interface is as follows:

Start the Cloudera Manager Admin console

Through the Cloudera Manager Admin console, you can configure, manage and monitor Hadoop on the cluster. The web-based URL address is: http: //myhost.example.com:7180, myhost.example.com is where you installed Cloudera-Manager-installer. Bin host domain name, of course, is also possible with IP. For example, my web site URL address is: http: // hcc1.cloudapp.net: 7180. Follow the prompts to install, the successful installation interface as shown below.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.