Using Windows Azure to build a Hadoop cluster

Source: Internet
Author: User
Keywords Virtual network installation manager shutdown

The project uses CDH (Cloudera distribution including Apache Hadoop) in the private cloud to build a Hadoop cluster for large data calculations. As a loyal fan of Microsoft, deploying CDH to Windows Azure virtual machines is my choice. Because there are multiple open Source services in CDH, virtual machines need to be open to many ports. The network of virtual machines in Windows Azure is securely isolated, so creating multiple virtual machines to install Hadoop cluster in a service in virtual rogue Windows Azure is the best solution for creating a virtual network for the Hadoop cluster, The resources and services in a virtual network are like accessing each other in a virtual private cloud, and are isolated from other resources outside of the virtual network to achieve security.

What is CDH?

CDH is the distribution of Apache Hadoop and related projects. CDH is 100% apache-licensed Open source and are the only Hadoop solution to offer unified batch 處理, interactive SQL, and Interactive search, and role-based access controls.

Create a virtual network in Windows Azure

Log on to the Windows Azure management portal, and in the lower-left corner, click New.

On the virtual Network Details screen, enter configuration information for the virtual network, and then click the next arrow. The configuration information entered here includes the name of the virtual network, the geographic group region, and the name of the GEO Group.

Geo-Group is a way to improve performance by physically combining Windows Azure services in the same data center. You can assign a geo group to only one virtual network.

Set up DNS Server and VPN connectivity, this step is not skipped, set up when virtual network is created and needed.

On the address space and subnet screen, enter the following information, and then click the next arrow. The address space must be a private address range specified with CIDR notation: 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16 (specified by RFC 1918). In this case, the choice of starting IP is 192.168.0.0.

Click the Check button in the lower-right corner, when Windows Azure creates your virtual network based on the submitted configuration.

At this point, you have a virtual network in Windows Azure that you can see on the Portal's Virtual Network tab. More detailed configuration methods refer to the official Windows Azure documentation for creating virtual networks in Windows Azure.

To create a Linux virtual machine from the Windows Azure Image Library

To create a Linux virtual machine, refer to the documentation on Windows Azure, "Creating a virtual machine running Linux" http://www.windowsazure.cn/zh-cn/manage/linux/tutorials/ virtual-machine-from-gallery/

Note in the Region/affinity group/virtual receptacle option in the Virtual Machine Configuration dialog box, select the virtual network that you created in the previous step. In this case, the virtual network "Hadoopclusternetwork" created by the author is selected.

Open the following port for the virtual machine, that is, set the following endpoints in the virtual machine configuration.

Enable Port for Virtual Rogue

7180 (Cloudera Manager web UI)

8020, 50010, 50020, 50070, 50075 (HDFS Namenode and DataNode)

8021 (MapReduce jobtracker)

8888 (Hue Web UI)

9083 (Hive/hcatalog Metastore)

41415 (Flume agent)

11000 (Oozie server)

21050 (Impala JDBC Port)

To create multiple virtual machines in the same way as a cluster of Hadoop, note that multiple virtual machines use the same virtual network.

Install CDH

Configure hosts on individual hosts

Shutdown firewall

Under root execute the following command to temporarily turn off the firewall

Service Iptables Stop

The following command is executed under ROOT to permanently shut down the firewall (that is, it shuts down every time it is turned on), but requires a reboot to take effect.

Chkconfig iptables off

Close SELinux

$ setenforce 0

If you need to permanently close, edit/etc/selinux/config, set selinux=disabled

, and then complete the installation.

Change Cloudera-manager-installer.bin Permissions

$ chmod u+x Cloudera-manager-installer.bin

$/cloudera-manager-installer.bin

Next, accept the license agreement, press ENTER and Next,

The installation interface looks like this:

Start the Cloudera Manager Admin console

The URL address on the hadoop,web side of the cluster can be configured, managed, and monitored via the cloudera Manager Admin console: http://myhost.example.com : 7180,myhost.example.com is the host domain you install Cloudera-manager-installer.bin, of course, IP is also possible. For example, my web-side URL address is: http://hcc1.cloudapp.net:7180. Follow the prompts to install the successful interface as shown in the following illustration.

Original link: http://www.cnblogs.com/xuesong/p/3604080.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.