Build a 5-node Hadoop cluster environment (CDH5)

Source: Internet
Author: User
Tags zookeeper ssh centos ssh server hadoop ecosystem

Tip: If you're not aware of Hadoop, you can view this article on the Hadoop ecosystem, which allows us to get an overview of the usage scenarios for tools in Hadoop and Hadoop ecosystems.

To build a distributed Hadoop cluster environment, here are the detailed steps to use CDH5.
First, hardware preparation

Basic configuration:

Operating system 64 guests
Cpu (Intel) Intel (R) I3 processor
Memory 8.00 GB (MHz)
Hard disk space remaining 50G

Smooth configuration:
Operating system 64 guests
Cpu (Intel) Intel (R) I5 processor or above configuration
Memory 16.00 GB (MHZ)
Hard disk space remaining 100G


Note: The above is a cluster on a single PC, so the memory requirements are high. If you build a clustered environment on more than one PC, you need only enough memory.


Second, the Software environment preparation

Virtual machines Vmware
Operating system CentOS6.5
Jdk Jdk-7u79-linux-x64.tar.gz
Remote connection Xshell





Hadoop ecosystem

Hadoop-2.6.0-cdh5.4.5.tar.gz

Hbase-1.0.0-cdh5.4.4.tar.gz

Hive-1.1.0-cdh5.4.5.tar.gz

Flume-ng-1.5.0-cdh5.4.5.tar.gz

Sqoop-1.4.5-cdh5.4.5.tar.gz

Zookeeper-3.4.5-cdh5.4.5.tar.gz


This article is to build CDH5 cluster environment, the above software can be downloaded from this website


third, host planning

Because we want to install a 5-node cluster environment, we assign IP address and host capabilities.

 

CDHNode1

/192.168.3.188

CDHNode2

/192.168. 3.189

CDHNode3

/192.168.3.190

CDHNode4

/192.168.3.191

CDHNode5

/192.168.3.192

Namenode

Is

Is

Whether

Whether

Whether

Datanode

Whether

Whether

Is

Is

Is

ResourceManager

Is

Is

Whether

Whether

Whether

Journalnode

Is

Is

Is

Is

Is

Zookeeper

Is

Is

Is

Whether

Whether

Note: Journalnode and zookeeper remain an odd number, at least not less than 3 nodes. Specific reasons, detailed later.


My host assignment is to install a CentOS system on a virtual machine on two PCs, with the following allocations:

CDHNode1 CDHNode2 CDHNode3 CDHNode4 CDHNode4
PC1 Is Is
PC2 Is Is Is
The reason for this allocation is to use Ha, two namenode on different PCs, if there is a PC exception, resulting in a namenode can not function, and Standy Namenode (standby Namenode) Active (active), Without affecting the operation of the entire cluster.


three, detailed installation steps

We first install the centos6.5 operating system on 1 hosts (chdnode1/192.168.3.188), configure the network with the root user, create Hadoop users, turn off the firewall, and install some prerequisite software. Prepare for the recorded cluster software installation.
CentOS6.5 Installation

On the host chdnode1/192.168.3.188, install the CentOS6.5 operating system. Detailed installation steps can be seen in the CentOS installation article. There is no more to be found here. Network Configuration

1. Open the installed CentOS virtual machine CDHNode1


2. Log in to the CentOS system

3. Enter the Ifconfig command to view the IP address first


4, this time we found that in addition to the loopback address, we can not communicate with the outside world, such as we can use the ping command to test.

Note: When you ping 127.0.0.1, end the ICMP message and use the CTRL + C command

Ping Baidu for the first time, Ping does not pass, indicating that the virtual machine can not connect the external network

Ping the virtual machine Nat Gateway for the second time, Ping does not pass

Note: Virtual machine Gateway View method


Click Virtual Machine Network editor, click VMnet8

Click Nat Settings


Ping the physical machine IP address for the third time, Ping does not pass

Note: View the physical machine IP address, turn on Cmd.exe, enter ipconfig



Ping the virtual machine's loopback address for the fourth time, Ping succeeds, indicating that the virtual machine's network protocol is correct

5, modify the network card configuration file

You can see that the virtual machine NIC is not turned on, so modify the Onboot=yes, then save the exit (press ESC, then enter: WQ)

6. Restart the Network Service



7. Enter the Ifconfig command again to view the IP address.

Note: My virtual machine is set up with a bridging mode, So the IP address is 192.168.2.X network segment, or 192.168.3.X network segment, because the bridge mode is the direct use of physical network card, and my physical host gateway is 192.168.0.1, subnet mask is 255.255.252.0, so my virtual machine IP address can be in 192.168.0.2-1 Any choice between 92.168.3.255 (except for the IP of the physical host). If your virtual machine is using NAT mode, it may be that, for example, in my virtual machine, the Nat gateway is 192.168.117.2, The subnet mask is 255.255.255.0, so the IP address of the virtual machine can be arbitrarily selected between 192.168.117.3-192.168.117.255.


The NIC has been successfully turned on at this time.

8. Ping the IP or domain name of step 4 again to view the specific situation

Check the native network protocol


Check the network card link


Check NAT Gateway


Check out the network

At this point the virtual machine is connected to the Internet successfully, but using DHCP (Dynamic Host Configuration Protocol) to configure the IP address, the IP address at this time dynamically generated, inconvenient to build the Hadoop cluster environment later. So we also need to configure static IP address, configuration details, described below.

9. Use the Ifconfig command to view the dynamic IP address as 192.168.3.188, so let's take this IP as a static IP address for CDHNode1. Note: You can use your dynamic IP as the static IP of your current host. Then the next few IP addresses can be set up immediately, such as 192.168.3.189. The DHCP generated IP address is random, you can specifically analyze the problem.

10, modify the network card configuration information, the BOOTPROTO=DHCP modified to bootproto=static, and add the IP address set, subnet mask, and gateway.

Note: Since I am configuring the cluster environment on two PCs, I am using bridging mode. If you are on a single host, it is recommended that you use NAT (Network address translation) mode. Because NAT mode gateways are different on different computers, virtual machines VMware virtual network segments. It is not convenient to use Xshell connection.

Here is the configuration of the bridging mode, IPADDR is the IP address set, NETMASK (subnet mask) and gateway (gateway) can be set to the same as the physical host NETMASK (subnet mask) and gateway (gateway). Note: The physical host IP configuration is detailed, see step 4 above.

The following is the configuration of NAT mode, IPADDR is set IP address, NETMASK (subnet mask) and gateway (gateway) can be set to the same as the physical host NETMASK (subnet mask) and gateway (gateway). Note: Nat mode IP configuration is detailed, see step 4 above.

In the above step we can see that the gateway for NAT mode is 192.168.117.2 and the subnet mask is 255.255.255.0

So the specific configuration can be

Bootproto=static

ipaddr=192.168.117.40

netmask=255.255.255.0

gateway=192.168.117.2

Finally press ESC and then: Wq Save to exit. (Note that edit press I or a to enter the editing mode, specific operation to see the VI command instructions)

11. Restart the Network Service


This completes the network configuration. Download prerequisite software

Note: 1, installed on the CDHNODE1 node, using the Yum command, parameter-y, the download process of the automatic answer Yes, if you are interested, you can try not to add the situation; install means to download the installation from the Internet.

2. Installing the software using the Yum command must be the root user.

1, installation Lrzsz, can be conveniently on the Xshell, upload and download files, enter the RZ command, you can upload files, sz command can download files from the remote host to the local.

2, install the SSH server.


3, install the SSH client.


user-created account

1. Use useradd command to add user Hadoop, and simultaneously create user's home directory, about Useradd parameters can use Useradd-h view parameters

2, can switch to the/home directory to view, whether to create a successful


3, create a password for the Hadoop user, this is for the next use Xshell Software remote connection CDHNode1 node preparation, appear successfully to create a password success, note: Password creation must be root user.

4, can switch to Hadoop users, using the SU command, you can see, at this time Root@cdhnode1 has been changed to Hadoop@cdhnode1.

5. Exit from Hadoop user, use Exit command

cloning a virtual machine

Because we use the CentOS virtual machines created by VMware, we can clone virtual machines directly, reducing the time to install and improve efficiency.

If you are on a PC to configure the cluster environment, you can follow the following steps to clone a continuous four VMS are CDHNode2, CDHNode3, CDHNode4, CDHNODE5; I configured it on two PCs so, I need to re-install CDHNode1 on the first PC on the other PC, install CDHNode2, and clone CDHNode4, CDHNode5 from CDHNode2.

Below I take the example of cloning a CDHNODE5 virtual machine on CDHNode2 to demonstrate the following steps of cloning.

1, right-click CDHNode2 Virtual Machine--"snapshot--" Take a snapshot

2, click Take snapshot, snapshot shooting success

3, then right-click CDHNode2 Virtual Machine--"management--" clone

4. Next step

5, select the existing snapshot--"next

6, choose to create a complete clone--"next

7. Enter the name of the virtual machine, click Finish, and wait for the clone to complete.

8. We have completed the task of cloning a virtual machine.


9, Next is to modify the configuration of the virtual machine network card information, below we in CDHNODE5 for example, the other nodes themselves according to the following self-configuration.

First open CDHNODE5, and the host name is CDHNode2, because CDHNODE5 is cloned from CDHNode2, so the hostname is still CDHNode2.

10, temporarily should not host name, we first check, this time show no network card

11, the network card after cloning became eth1, if you want to change back to eth0, you need to modify the configuration file 70-persistent-net.rules configuration file



11, we first set the line number input: Set numbers, we need to modify lines 8th and 11th, and then enter I or a into the editing mode, using the # comment 8th line, and the 10th line eth1 to eth0, you can remember the second network card MAC hardware address


12, we first remove the NIC e1000, using the modprobe-r e1000 command

13. Re-install the NIC e000

14. Modify NIC configuration information


15, change the device number to Device=eth0, first comment out the MAC address (hardware address), in modifying the IP address.

16. Restart the Network Service

Note: If it is not correct, IP is already in use and can be reset to another IP address as configured above.

17, the next is to modify the hostname, the CDHNode2 changed to CDHNODE5

18, restart the host, you can see the hostname becomes CDHNODE5.

19, because we annotated the MAC address, so we opened to a new MAC address, first use ifconfig to view the new MAC address, remember the following address,

20. Enter the Ifcfg-eth0 file, modify the hwaddr, and change to the MAC address you just viewed



Then use the Service Network Restart command to restart the network service. At this point the configuration is complete, press ESC, and then: Wq Save to exit.

This is followed by the corresponding configuration on the other nodes.


Configuring the Host file

Configure the Hosts file on 5 nodes, and note that the root user configuration is used

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.