Use Windows Azure VM to install and configure CDH to build a Hadoop Cluster
This document describes how to use Windows Azure virtual machines and NETWORKS to install CDH (Cloudera Distribution Including Apache Hadoop) to build a Hadoop cluster.
The project uses CDH (Cloudera Distribution Including Apache Hadoop) in the private cloud to build a Hadoop cluster for big data computing. As a loyal fan of Microsoft, deploying CDH to Windows Azure virtual machines is an inevitable choice. Because CDH contains multiple open-source services, virtual machines need to provide a large number of open ports. The network of Virtual machines in Windows Azure is securely isolated. Therefore, multiple Virtual machines are created in the Virtual machines service of Windows Azure to install Hadoop cluster. The best solution is to create a Virtual network for the Hadoop cluster, resources and Services in a virtual network are like mutual access in a virtual private cloud, which is isolated from other resources in the virtual network to achieve security.
What is CDH?
CDH is the distribution of Apache Hadoop and related projects. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls.
Create a virtual network in Windows Azure
- Log on to the Windows Azure Management Portal and click "new" in the lower left corner ".
- In the Navigation Pane, click Network, virtual network, and custom create ".
- On the "virtual network details" screen, enter the virtual network configuration information and click the "Next" arrow. The configuration information entered here includes the virtual network name, geographic group region, and geographic group name.
A geo Group is a method used to physically combine Windows Azure services in the same data center to improve performance. Only one virtual network can be assigned a geographical group.
- Set DNS Server and VPN Connectivity. This step is not skipped. You need to set it again after the virtual network is created.
- On the "address space and subnet" screen, enter the following information and click the "Next" arrow. The address space must be the specific address range specified in CIDR Notation: 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16 (specified by RFC 1918 ). In this example, the Starting IP address is 192.168.0.0.
Click the right icon in the lower-right corner. In this case, Windows Azure creates your virtual network based on the submitted configuration.
In this case, you already have a virtual network in Windows Azure. You can view it on the "virtual network" tab of the portal. For more detailed configuration methods, refer to the Windows Azure official documentation to create a virtual network in Windows Azure.
Create a Linux VM from the Windows Azure Image Library
To create a Linux virtual machine, see the document create a virtual machine running Linux http://www.windowsazure.cn/zh-cn/manage/linux/tutorials/virtual-machine-from-gallery/ on Windows Azure
Note that in the "VIRTUAL machine configuration" dialog box, select the virtual network created in the previous step in the "REGION/affinity group/virtual network" option. In this example, select the virtual network "hadoopclusternetwork" created by the author ".
Open the following port for the virtual machine, that is, set the following Endpoints in the virtual machine configuration.
- Enable port for Virtual machines
- 7180 (Cloudera Manager web UI)
- 8020,500 10, 50020,500 70, 50075 (HDFS NameNode and DataNode)
- 8021 (MapReduce JobTracker)
- 8888 (Hue web UI)
- 9083 (Hive/HCatalog metastore)
- 41415 (Flume agent)
- 11000 (Oozie server)
- 21050 (Impala JDBC port)
You can create multiple VMS in the same way to form a Hadoop cluster. Note that multiple VMS use the same virtual network.
Install CDH
Configure the HOSTNAME of each host
Vi/etc/sysconfig/network
Modify the HOSTNAME
Configure/etc/hosts on each host
Vi/etc/hosts
After modifying the HOSTNAME and hosts, we recommend that you restart.
Disable Firewall
Run the following command under root to temporarily disable the firewall.
Service iptables stop
Run the following command under root to permanently disable the firewall (that is, the firewall is disabled every time it is started), but it must be restarted to take effect.
Chkconfig iptables off
Disable SELinux
$ Setenforce 0
To disable selinux permanently, edit/etc/SELINUX/config and set selinux = disabled.
And then complete the installation.
Change Cloudera-manager-installer.bin Permissions
$ Chmod u + x cloudera-manager-installer.bin
$./Cloudera-manager-installer.bin
Next, accept the license agreement, press Enter and Next,
The installation interface is as follows:
Start the Cloudera Manager Admin Console
Through the Cloudera Manager Admin console, you can configure, manage, monitor Hadoop on the cluster, WEB-side URL is: http://myhost.example.com: 7180, myhost.example.com is your host domain to install Cloudera-Manager-installer.bin, of course, IP addresses can also be used. For example, my WEB-side URL is: http://hcc1.cloudapp.net: 7180. Follow the prompts to install the tool. The installation successful page is shown in.
Install and configure Hadoop2.2.0 on CentOS
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition
Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)