Projects in the private cloud using CDH (Cloudera Distribution Including Apache Hadoop) Hadoop cluster for big data computing. As a big fan of Microsoft, deploying CDH into Windows Azure VMs is my inevitable choice. Because there are multiple open source services in the CDH, there are many ports that virtual machines need to open. Virtual machines in Windows Azure are securely isolated from one another, so creating multiple virtual machines in Virtual machines' services in Windows Azure To install a Hadoop cluster, the best solution is to create a virtual network for a Hadoop cluster, Resources and services are accessed as if they were in the virtual private cloud and isolated from other resources outside the virtual network for security.
What is CDH?
CDH is the distribution of Apache Hadoop and related projects. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls.
Create a virtual network in Windows Azure
Log in to the Windows Azure Management Portal and click New in the lower left corner.
In the navigation pane, click Network, click Virtual Network, and then click Create Custom.
On the Virtual Network Details screen, enter the configuration information for your virtual network and click the Next arrow. The configuration information entered here includes the name of the virtual network, the name of the geo-group area, and the geo-group.
Geotagging is a way to physically combine Windows Azure services to improve performance in the same data center. Geo groups can be assigned to only one virtual network.
Set DNS Server and VPN Connectivity. This step is not skipped. Set this parameter after the virtual network is created.
On the Address Spaces and Subnets screen, enter the following information and click the Next arrow. The address space must be a private address range specified with the CIDR notation: 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16, as specified by RFC 1918. In this case, the choice is Starting IP 192.168.0.0 selected.
Click the checkmark button in the lower right corner, and Windows Azure will create your virtual network based on the submitted configuration.
At this point, you have a virtual network in Windows Azure that you can see on the portal's "Virtual Networks" tab. More detailed configuration methods can refer to the official Windows Azure documentation to create a virtual network in Windows Azure.
Create a Linux virtual machine from your Windows Azure Image Gallery
The steps to create a Linux virtual machine can refer to the document "Create a virtual machine running Linux" on Windows Azure http://www.windowsazure.cn/en/manage/linux/tutorials/virtual-machine-from-gallery/
Note that the virtual network created in the previous step is selected in the "REGION / AFFINITY GROUP / VIRTUAL NETWORK" option in the Virtual Machine Configuration dialog box. Selected in this case is the author created virtual network "hadoopclusternetwork".
Open the following port for the virtual machine, that is, set the following Endpoints in the virtual machine configuration.
Enable port for Virtual machines
7180 (Cloudera Manager web UI)
8020, 50010, 50020, 50070, 50075 (HDFS NameNode and DataNode)
8021 (MapReduce JobTracker)
8888 (Hue web UI)
9083 (Hive / HCatalog metastore)
41415 (Flume agent)
11000 (Oozie server)
21050 (Impala JDBC port)
Use the same way to create multiple virtual machine composition Hadoop cluster, note that multiple virtual machines use the same virtual network.
Install CDH
Configure / etc / hosts on each host
Turn off the firewall
Run the following command in the root to temporarily shut down the firewall
service iptables stop
Run the following command in the root directory to shut down the firewall permanently (that is, it will be shut down each time you power on the switch), but it needs to be restarted to take effect.
chkconfig iptables off
Turn off SELinux
$ setenforce 0
If you need to permanently shut down, edit / etc / selinux / config and set SELINUX = disabled
, Then complete the installation.
Change the Cloudera-manager-installer.bin permission
$ chmod u + x cloudera-manager-installer.bin
$ ./cloudera-manager-installer.bin
Next, accept the license agreement, press Enter and Next,
The installation interface is as follows:
Start the Cloudera Manager Admin console
Through the Cloudera Manager Admin console, you can configure, manage and monitor Hadoop on the cluster. The web-based URL address is: http: //myhost.example.com:7180, myhost.example.com is where you installed Cloudera-Manager-installer. Bin host domain name, of course, is also possible with IP. For example, my web site URL address is: http: // hcc1.cloudapp.net: 7180. Follow the prompts to install, the successful installation interface as shown below.