Hadoop fully Distributed Build

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

---on Friday, November 6, 2015

Preparatory work
1. Hardware and Software Environment

Host operating system: Processor: i5, frequency: 3.2G, Memory: 8G,WINDOWS64
Virtual machine software: VMware Workstation 10
Virtual operating system: CentOs-6.5 64-bit
JDK:1.8.0_65 64-bit
hadoop:1.2.1

Cluster network environment

The cluster consists of 3 nodes, 1 Namenode, and 2 datanode, where nodes can ping each other. The node IP address and host name are as follows:

ordinal	IP address	Machine name	type	User name
1?	192.168.1.127	Master.hadoop?	Namenode	Hadoop?
2?	192.168.1.128	Slave1.hadoop	Datanode	Hadoop?
3?	192.168.1.129	Slave2.hadoop	Datanode	Hadoop?

All nodes are CentOS systems, firewalls are disabled, a Hadoop user is created on all nodes, and the user home directory is/home/hadoop. A directory/usr/hadoop is created on all nodes, and the owner is a Hadoop user. Because the directory is used to install Hadoop, the user must have RWX permissions on it. (The general practice is to create a Hadoop directory under/usr under root, and modify the directory owner to be Hadoop (chown–r Hadoop:/usr/hadoop), or you may have insufficient permissions to distribute Hadoop files via SSH to other machines.)

Attention

Since Hadoop requires the same deployment directory structure for Hadoop on all machines (because other task nodes are started at startup in the same directory as the primary node), there is an identical user name account. Referring to various documents, it is said that all machines are built with a Hadoop user, using this account to achieve no password authentication. For convenience, a Hadoop user is re-established on three machines.

Environment construction
1. Operating system Installation

For datanode types of systems, you can first install a system and then clone multiple identical systems using VMware's cloning capabilities. As shown in.

???? Tutorials for installing the CentOS system under VMware are available online. In particular, the network selection for all systems is bridge mode, and since the machine is online on the wireless network, it should also set the VMNET0 information: in the Editor--Virtual network editor ... Such as:

Bridge network refers to the local physical network card and virtual network card through the VMnet0 Virtual Switch Bridge, physical network card and virtual network card in the topology map on the same status, then the physical network card and virtual network card is equivalent to the same network segment, Virtual switch is equivalent to a real network switch, Therefore, the IP address of the two network card should also be set to the same network segment.

???? In the "Bridge to" column, select the type of network card used.

Local Environment configuration

Network configuration

The use of bridging to connect the network (suitable for routing, switch users), configure static IP to achieve Internet access, intra-LAN communication.

Vim/etc/sysconfig/network-scripts/ifcfg-eth0

device=eth0?? # describe the device alias for the NIC

? ? Bootproto=static # set the network card to get the IP address of the way, for the static ??

? ? Hwaddr= "00:23:54:de:01:69"??

? ? onboot= "yes" # is this network interface set to Yes when the system starts ?

Type= "Ethernet"??

? ? Userctl=no??

? ? Ipv6init=no??

? ? Peerdns=yes??

? ? netmask=255.255.255.0?? # Network mask corresponding to the NIC

? ? ipaddr=192.168.1.127?? # This field is required only if the NIC is set to static

? ? gateway=192.168.1.1?? # Set as router address, this is usually the

? ? dns1=202.112.17.33? # set up for this network, or 8.8.8.8 #google domain Name server

The configuration diagram for this operation is as follows:

(Can be set directly on the master machine, and then through the SCP command to pass the file to all the slave, and then modify the corresponding ipaddr in the slave, others unchanged)

Note: It is important to note that there is one thing to change, because the previous hardware address of the SCP has to be modified under the following:

First: Modify Vim/etc//etc/udev/rules.d/70-persistent-net.rules

Delete the NIC with the name Eth0. Also modify the eth1 NIC name to Eth0

Second: Modify Vim/etc/sysconfig/network-scripts/ifcfg-eth0

Modify the HWADDR to the address of the eth1 you just saw.

Configuring the Hosts File

The "/etc/hosts" file is used to configure the DNS server information that the host will use, which is the corresponding [HostName IP] for each host that is recorded in the LAN. When the user is in the network connection, first look for the file, look for the corresponding host name corresponding IP address.

In a Hadoop cluster configuration, the IP and hostname of all machines in the cluster need to be added to the "/etc/hosts" file, so that master and all slave machines can communicate not only through IP, but also through host names.

So in the "/etc/hosts" file on all the machines, add the following:

192.168.1.127 Master.hadoop

192.168.1.128 Slave1.hadoop

192.168.1.129 Slave2.hadoop

(Again, you can set it directly on the master machine, and then pass the file to all slave via the SCP command)

Operating system settings

The firewall and SELinux need to be turned off during Hadoop installation, or an exception will occur.

Shutting down the firewall

Service iptables Status View the firewall status as shown below to indicate that Iptables is turned on:

Turn off firewall: Chkconfig iptables off

Turn off SELinux

Use the Getenforce command to see if closing
Modify the/etc/selinux/config file

Note: You must restart the system after the modification to be effective.

SSH No password Authentication configuration

Preparatory work:

1. Set up a Hadoop user on three virtual machines:

AddUser Hadoop #在root用户下

passwd Hadoop #输入两次密码

2. Create an. ssh folder under Hadoop users

mkdir ~/.ssh

The remote Hadoop daemon needs to be managed during Hadoop operation, and after Hadoop is started, Namenode starts and stops various daemons on each datanode through SSH (Secure Shell). This must be executed between the nodes when the command is not required to enter the form of a password, we need to configure SSH to use the form of non-password public key authentication, so that namenode use SSH without password login and start the dataname process, the same principle, Datanode can also log on to NameNode using SSH without a password.

SSH guarantees security because it uses public-key cryptography. The process is as follows:

(1) The remote host receives the user's login request and sends its own public key to the user.

(2) The user uses this public key to encrypt the login password and send it back.

(3) The remote host with its own private key, decrypt the login password, if the password is correct, consent to user login.

Note: If your Linux does not have SSH installed, please install SSH first.

Configure master No password login all salve

Execute the following command on the master node:

Ssh-keygen–t Rsa–p '

When asked to save the path after running, the direct carriage return takes the default path. The generated key pair: Id_rsa (private key) and id_rsa.pub (public key), which are stored by default in the/home/user name/.ssh directory.

View "/home/ User name /" under whether there is ". SSH" folder, and ". SSH" There are two newly-produced no-password key pairs under the file.

Then make the following configuration on the master node and append the id_rsa.pub to the authorized key.

Cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

To view the ". SSH" folder:

View the permissions under Authorized_keys. (Very important!) ）

If the permissions are not correct, use the following command to set permissions for the file:

chmod ~/.ssh #注意: These two privilege settings are particularly important to determine success or failure.

chmod ~/.ssh/authorized_keys

On the master machine, enter: SSH localhost command test to see if you can login without password.

Send public key to Slave

In master, the public key id_rsa.pub is sent through the SCP command to the same place in each slave (that is, the/home/hadoop/.ssh folder), and the permissions are set (very important)

SCP ~/.ssh/authorized_keys [Email protected]:~/.ssh/

Set permissions in slave (root user settings):

Chown–r Hadoop:hadoop/home/hadoop/.ssh

Chmod–r 700/home/hadoop/.ssh

chmod 600/home/hadoop/.ssh/authorized_keys

Test

Under Master, enter:

SSH Slave1.hadoop

If you don't use a password, it means success!

Focus: Set permissions!!!

Software Installation and Environment configuration

The following software installation is first installed on master, all installed, and then by copying to slave.

Java installation and its environment configuration

The JDK is installed on all machines, and the version is the same. Install the master server now and then pass the installed files to slave. Installing the JDK and configuring the environment variables needs to be done as "root".

First log in as root "Master.hadoop" after "/usr" to create the "Java" folder, and then "jdk-8u25-linux-x64. GZ "Into the"/usr/java "folder and unzip it.
TAR-ZXVF jdk-8u25-linux-x64. Gz

Looking at "/usr/java" below you will find a folder called "Jdk1.8.0_65", stating that our JDK installation is complete, remove the installation package, and go to the next "Configure Environment variables" link.

Configure environment variables. Edit the "/etc/profile" file and add the following JAVA "Java_home", "CLASSPATH" and "PATH" as follows:
# Set Java environment
Export java_home=/usr/java/jdk1.8.0_65/
Export JRE_HOME=/USR/JAVA/JDK1.8.0_65/JRE

Export classpath=.: $JAVA _home/lib: $JRE _home/lib: $CLASSPATH

Export path= $JAVA _home/bin: $PATH

As shown in the following:

Save and exit, and execute the following command to make its configuration take effect immediately.

Source/etc/profile or. /etc/profile

Key notes: The path variable first places the $java_home in the first place, so that the newly installed JDK will be the first choice, otherwise the system will be selected as the original JDK.

Verify

Java–version

Hadoop installation and its environment configuration

First, log on to the "Master.hadoop" machine with the root user and copy the downloaded "hadoop-1.2.1.tar.gz" to the/usr directory. Then go to the "/usr" directory, use the following command to extract "hadoop-1.1.2.tar.gz", and rename it to "Hadoop", the folder read access to the normal user Hadoop, and then delete "hadoop-1.2.1.tar.gz" Install the package.
Cd/usr
TAR–XZVF hadoop-1.2.1.tar.gz
MV hadoop-1.2.1 Hadoop
Chown–r Hadoop:hadoop Hadoop
RM-RF hadoop-1.2.1.tar.gz
?
Add the installation path of Hadoop to "/etc/profile".
# Set Hadoop path
Export Hadoop_home=/usr/hadoop
Export path= $PATH: $HADOOP _home/bin

Configure the hadoop-env.sh and confirm that it takes effect

The "hadoop-env.sh" file is located under the "/usr/hadoop/conf" directory.

Modify the following in the file:

Export java_home=/usr/java/jdk1.8.0_65/

(The java_home here is the same as in the previous Java environment)

SOURCE hadoop-env.sh

Hadoop version

Create subdirectories under the/usr/hadoop directory

Cd/usr/hadoop

mkdir tmp

mkdir HDFs

Configuring the Core-site.xml File

Modify the Hadoop core profile Core-site.xml, which is configured with the address and port number of HDFs master (that is, Namenode).

<name>hadoop.tmp.dir</name>

<value>/usr/hadoop/tmp</value>

(Note: Please create the TMP folder in the/usr/hadoop directory first)

<description>a base for other temporary directories.</description>

</property>

<name>fs.default.name</name>

</property>

</configuration>

Configuring the Hdfs-site.xml File

<name>dfs.replication</name>

</property>

<value>/usr/local/hadoop/hdfs/name</value>

</property>

<value>/usr/local/hadoop/hdfs/data</value>

</property>

</configuration>

Configuring the Mapred-site.xml File

Modify the configuration file for MapReduce in Hadoop, configured with the address and port of Jobtracker.

<name>mapred.job.tracker</name>

</property>

</configuration>

Configuring the Masters File

There are two types of scenarios: D

(1) The first kind

Modify localhost to Master.hadoop

(2) The second type

Remove "localhost" and join the master machine's ip:192.168.1.127

For the sake of insurance, enable the second, because if you forget to configure the "/etc/hosts" LAN DNS failure, so there will be unexpected errors, but once the IP pairing, network unblocked, you can find the corresponding host through IP.

???? Vim/usr/hadoop/conf/masters

Configuring the Slaves file (master host specific)
Similar to configuring Masters files:
Vim/usr/hadoop/conf/slaves
?

File installation and configuration on the slave server

After master has finished loading the JDK and Hadoop and configured the environment, proceed to the next step:

Copy the/usr/java folder to a different slave
?
In Master, enter:

???? Scp–r/usr/java [email protected]:/usr/

Scp–r/usr/java [email protected]:/usr/

Copy/etc/profile to other slave
?
In Master, enter:
scp/etc/profile [email protected]:/etc/
scp/etc/profile [email protected]:/etc/
?
Copy the/usr/hadoop folder to a different slave
?
In Master, enter:
Scp–r/usr/hadoop [email protected]:/usr/
Scp–r/usr/hadoop [email protected]:/usr/
?
Change permissions
Change/usr/java,/usr/hadoop user group to Hadoop user, set permission to 755
1. Start-up and verification
  1. Start
Format HDFs File system

Use Hadoop for ordinary users on "Master.hadoop". (Note: Only one time, the next boot no longer need to format, just start-all.sh)

Hadoop Namenode–format

Start Hadoop

Shut down the firewall of all the machines in the cluster before starting up, or the Datanode will turn off automatically. Start with the following command.

start-all.sh

Turn off Hadoop
stop-all.sh
1. Validation test
Test with the "JPS" command

On master, use the Java-brought gadget JPS to view the process.

View the process on Slave1 with a Java-brought gadget JPS.

Note: The top two pictures show success!

View cluster status with "Hadoop Dfsadmin-report"

Viewing a cluster from a Web page

Visit jobtracker:http://192.168.1.127:50030

Visit namenode:http://192.168.1.127:50070

The problems encountered and the solving methods
1. About Warning: $HADOOP _home is deprecated.

This warning is always prompted when Hadoop is typed into Hadoop after the installation is complete:

Warning: $HADOOP _home is deprecated.

Solution One: Edit the "/etc/profile" file, remove the hadoop_home variable settings, reenter the Hadoop FS command, and the warning disappears.

Solution Two: Edit the "/etc/profile" file, add an environment variable, and then the warning disappears:

Export Hadoop_home_warn_suppress=1

SSH Settings not successful

Most likely because of the wrong permissions settings!

Datanode not connected.

It is possible that the master and slave firewalls are not turned off.

Hadoop fully Distributed Build

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop fully Distributed Build

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop fully Distributed Build

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support