Build Hadoop2 HA

Source: Internet
Author: User

First, Introduction 1.1 Background:

In the case of a possible Namenode single point of failure (SPOF) or short-term inability in Hadoop 1.x, Hadoop 2.x has been improved by adding a namenode, and after adding a namenode, there is no real problem Only one namenode is required, so two Namenode one is in standby state and one is in active state. Standby does not provide services, synchronizing only the state of the active Namenode so that active Namenode is switched to the active state in a timely manner when a problem occurs.

1.2 Architecture:

The two namenode of the Hadoop 2.x are typically configured on two separate machines, and the active Namenode responds to the cluster client, while standby namenode is simply a backup of the active Namenode, ensuring that the active Namenode can quickly replace a problem when it is present.

Standby Namenode is synchronized with the active Namenode through journalnodes communication.

At which node the Active Namenode and standby namenode are determined by the zookeeper through the main-standby election mechanism.

1.3 HDFS ha configuration:

NameNode: Two physical machines with the same configuration, running active NameNode and Standby NameNode, respectively.

Journalnode:journalnode does not consume too much resources and can be deployed together with other processes, such as Namenode, Datanode, ResourceManager, etc., and requires at least 3 cardinality, which allows (N-1)/ 2 JNS process failed.

DataNode: Based on the size of the data and the resources required to process the data to configure, the general number of practical applications, and distributed on more machines.

Second, installation configuration version information for each installation file

Centos-7-x86_64-minimal-1511.iso

Jdk-8u101-linux-x64.tar.gz

Zookeeper-3.4.8.tar.gz

Hadoop-2.6.0.tar.gz

2.1 Installing CentOS 7

2.1.1 Installing virtual machines and Linux systems:

I am using the CentOS 7 installed by VMware Workstation, which installs VMware first, and this step is not described.

Installing a minimal CentOS is simpler and does not describe it in detail. I did not set up the account, so the account after the boot is the root account, the password to set their own.

2.1.2 Configuring, connecting to the network

Manual networking is required after installation of CentOS:

Log on after the root, modify the file for networking:

cd/etc/sysconfig/network-scripts/

VI ifcfg-eno16777736

and add the host name and IP address, respectively:

ipaddr=192.168.152.153//Other nodes add 1 to the last number, set according to the IP of the cluster plan

Restart Network Service

Service Network restart

Modify Host Name:

Hostnamectl Set-hostname Host Name

2.1.3 Setting IP address and hostname mappings

Su Root

Vim/etc/hosts

192.168.152.155 Hadoop-namenode1

192.168.152.153 Hadoop-namenode2

192.168.152.154 Hadoop-datanode1

Shutting down firewalls and SELinux

Systemctl Stop Firewalld.service

Systemctl Disable Firewalld.service

Vim/etc/selinux/config

Selinux=disabled

Reboot to view selinux status

Gentenforce

2.2 Hadoop Pre-Installation Preparation:

2.2.1 Create groups and users and add permissions:

Groupadd Hadoop//Create group Hadoop

Useradd-g Hadoop Hadoop//Create group user Hadoop under Hadoop

passwd Hadoop//Modify password for user Hadoop

Yum install VIM//installation Vim

Vim/etc/sudoers//Modify the configuration file sudoers add sudo permissions to the Hadoop user, adding the following:

Hadoop all= (All) all

2.2.2 Configuring SSH Password-free login:

To generate an SSH key pair on the NAMENODE1 node

Su Hadoop

$ ssh-keygen-t RSA

Copy the public key to the cluster all nodes machine

$ Ssh-copy-id Hadoop-namenode1

$ Ssh-copy-id Hadoop-namenode2

$ Ssh-copy-id Hadoop-datanode1

Log on to each node via SSH to test for password-free login success

2.3 Hadoop installation, configuration

2.3.1 Installing the JDK

Uninstall the OPENJDK (CENTOS7 does not have its own openjdk, so install the JDK directly)

To create the installation path:

mkdir Apache

TAR-XVF jdk-8u101-linux-x64.tar.gz/home/hadoop/apache/

To configure environment variables:

Vim ~/.bash_profile

Add the following content:

Export java_home=/home/hadoop/apache/jdk1.8.0_101
Export path= $PATH: $JAVA _home/bin

Save and make the environment variable effective with the following directives:

SOURCE ~/.bash_profile

To test whether the JDK was installed successfully:

Java-version

2.3.2 Installing the Zookeeper cluster

Unzip the Zookeeper installation package

TAR-XVF zookeeper3.4.8.tar.gz/home/hadoop/apache/

To delete an installation package:

RM zookeeper3.4.8.tar.gz

Configure Hadoop User rights:

Chown-r Hadoop:hadoop zookeeper-3.4.8

To modify the Zookeeper configuration file:

CD apache/zookeeper-3.4.8/conf

CP Zoo_sample.cfg Zoo.cfg

Vim Zoo.cfg

Settings are as follows:

ticktime=2000//Client heartbeat time (MS)

Maximum time of initlimit=10//sequential heartbeat interval

synclimit=5//Sync time limit

Datadir=/home/hadoop/apache/zookeeper3.4.8/data//Data store directory

Datalogdir=/home/hadoop/apache/zookeeper3.4.8/data/log//Data Log storage directory

clientport=2181//Port number

maxclientcnxns=2000//Maximum number of connections zookeeper

server.1=hadoop-namenode1:2888:3888//Set Zookeeper node

server.2=hadoop-namenode2:2888:3888

server.3=hadoop-datanode1:2888:3888

Create a Zookeeper data store directory and log store directory:

Cd..

Mkdir-p Data/log

To modify the permissions of the data store file and log file:

Chown-r Hadoop:hadoop Data

CD data

Chown-r Hadoop:hadoop Log

Create the file myID in the data directory with the input content 1

echo "1" >> Data/myid//to the working directory after synchronization to the other two nodes, respectively, modified content is 2 and 3

Synchronizing the Zookeeper working directory to other nodes in the cluster

Scp-r zookeeper-3.4.8 [Email protected]:/home/hadoop/apache/

Scp-r zookeeper-3.4.8 [Email protected]:/home/hadoop/apache/

Modify the myID values to 2 and 3, respectively, and configure the environment variables for all nodes.

Vim ~/.bash_profile

Export zookeeper_home=/home/hadoop/apache/zookeeper-3.4.8
Export path= $PATH: $ZOOKEEPER _home/bin

Here the zookeeper cluster has been set up, the following to start:

zkserver.sh start

To view a process:

JPs

2.3.3 installation and configuration of Hadoop

Extracting the installation files under the Namenode1 node

TAR-XVF hadoop-2.6.0.tar.gz/home/hadoop/apache/

deleting installation files

RM hadoop2.6.0.tar.gz

Set user Permissions

CD Apache

Chown-r Hadoop:hadoop hadoop-2.6.0/

Configuration file

CD hadoop-2.6.0/etc/hadoop/

Vim hadoop-env.sh

After saving, create the directory you just set:

cd/home/hadoop/apache/hadoop-2.6.0

mkdir PIDs

Mkdir-p Data/logs

Configuration core-site.xml:

CD Etc/hadoop

Vim Core-site.xml

Configuration hdfs-site.xml:

Vim Hdfs-site.xml

Configuration mapred-site.xml:

CP Mapred-site.xml.template Mapred-site.xml

Vim Mapred-site.xml

To configure the Yarn-site.xml file:

Vim Yarn-site.xml

To configure the slave file:

Vim Slaves

Hadoop-datanode1

Create the directory that is involved in the configuration file:

Cd.. /..

Mkdir-p data/tmp

Mkdir-p data/journal

Mkdir-p Data/namenode

Mkdir-p Data/datanode

Synchronizing the Hadoop working directory to other nodes in the cluster

Scp-r hadoop-2.6.0 [Email protected]:/home/hadoop/apache/

Scp-r hadoop-2.6.0 [Email protected]:/home/hadoop/apache/

To configure environment variables on all nodes:

Vim ~/.bash_profile

Export hadoop_home=/home/hadoop/apache/hadoop-2.6.0
Export path= $PATH: $HADOOP _home/bin: $HADOOP _home/sbin

Make the modified environment variable effective:

SOURCE ~/.bash_profile

Hadoop Cluster initialization

Start the zookeeper cluster on all nodes:

zkserver.sh start

Format the ZKFC on Hadoop-namenode1:

HDFs Zkfc-formatzk

Start Journalnode (on Namenode1,namenode2 and Datanode1):

hadoop-daemon.sh Start Journalnode

Format HDFs (on Hadoop-namenode1):

Hadoop Namenode-format

Copy the metadata directory from the NAMENODE1 node Hadoop working directory to the NAMENODE2 node after formatting

Scp-r/home/hadoop/apache/hadoop-2.6.0/data/namenode/* [Email protected]:/home/hadoop/apache/hadoop-2.6.0/data/ namenode/

Start a Hadoop cluster

To start DFS on Hadoop-namenode1:

start-dfs.sh

The start-dfs.sh command will open the following process:

Namenode

Journalnode

Dfszkfailovercontroller

Datanode

Start yarn (operate on Namenode2)

start-yarn.sh

Start another ResourceManager on yarn

yarn-daemon.sh start ResourceManager

Start the security agent for yarn

yarn-daemon.sh Start ProxyServer

Note: ProxyServer acts as a firewall to improve security of access to the cluster

Start yarn's history task service

mr-jobhistory-daemon.sh Start Historyserver

At this point, the Hadoop cluster installation configuration is complete.

Build Hadoop2 HA

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.