First, Introduction
1.1 Background:
In the case of a possible Namenode single point of failure (SPOF) or short-term inability in Hadoop 1.x, Hadoop 2.x has been improved by adding a namenode, and after adding a namenode, there is no real problem Only one namenode is required, so two Namenode one is in standby state and one is in active state. Standby does not provide services, synchronizing only the state of the active Namenode so that active Namenode is switched to the active state in a timely manner when a problem occurs.
1.2 Architecture:
The two namenode of the Hadoop 2.x are typically configured on two separate machines, and the active Namenode responds to the cluster client, while standby namenode is simply a backup of the active Namenode, ensuring that the active Namenode can quickly replace a problem when it is present.
Standby Namenode is synchronized with the active Namenode through journalnodes communication.
At which node the Active Namenode and standby namenode are determined by the zookeeper through the main-standby election mechanism.
1.3 HDFS ha configuration:
NameNode: Two physical machines with the same configuration, running active NameNode and Standby NameNode, respectively.
Journalnode:journalnode does not consume too much resources and can be deployed together with other processes, such as Namenode, Datanode, ResourceManager, etc., and requires at least 3 cardinality, which allows (N-1)/ 2 JNS process failed.
DataNode: Based on the size of the data and the resources required to process the data to configure, the general number of practical applications, and distributed on more machines.
Second, installation configuration
version information for each installation file
Centos-7-x86_64-minimal-1511.iso
Jdk-8u101-linux-x64.tar.gz
Zookeeper-3.4.8.tar.gz
Hadoop-2.6.0.tar.gz
2.1 Installing CentOS 7
2.1.1 Installing virtual machines and Linux systems:
I am using the CentOS 7 installed by VMware Workstation, which installs VMware first, and this step is not described.
Installing a minimal CentOS is simpler and does not describe it in detail. I did not set up the account, so the account after the boot is the root account, the password to set their own.
2.1.2 Configuring, connecting to the network
Manual networking is required after installation of CentOS:
Log on after the root, modify the file for networking:
cd/etc/sysconfig/network-scripts/
VI ifcfg-eno16777736
and add the host name and IP address, respectively:
ipaddr=192.168.152.153//Other nodes add 1 to the last number, set according to the IP of the cluster plan
Restart Network Service
Service Network restart
Modify Host Name:
Hostnamectl Set-hostname Host Name
2.1.3 Setting IP address and hostname mappings
Su Root
Vim/etc/hosts
192.168.152.155 Hadoop-namenode1
192.168.152.153 Hadoop-namenode2
192.168.152.154 Hadoop-datanode1
Shutting down firewalls and SELinux
Systemctl Stop Firewalld.service
Systemctl Disable Firewalld.service
Vim/etc/selinux/config
Selinux=disabled
Reboot to view selinux status
Gentenforce
2.2 Hadoop Pre-Installation Preparation:
2.2.1 Create groups and users and add permissions:
Groupadd Hadoop//Create group Hadoop
Useradd-g Hadoop Hadoop//Create group user Hadoop under Hadoop
passwd Hadoop//Modify password for user Hadoop
Yum install VIM//installation Vim
Vim/etc/sudoers//Modify the configuration file sudoers add sudo permissions to the Hadoop user, adding the following:
Hadoop all= (All) all
2.2.2 Configuring SSH Password-free login:
To generate an SSH key pair on the NAMENODE1 node
Su Hadoop
$ ssh-keygen-t RSA
Copy the public key to the cluster all nodes machine
$ Ssh-copy-id Hadoop-namenode1
$ Ssh-copy-id Hadoop-namenode2
$ Ssh-copy-id Hadoop-datanode1
Log on to each node via SSH to test for password-free login success
2.3 Hadoop installation, configuration
2.3.1 Installing the JDK
Uninstall the OPENJDK (CENTOS7 does not have its own openjdk, so install the JDK directly)
To create the installation path:
mkdir Apache
TAR-XVF jdk-8u101-linux-x64.tar.gz/home/hadoop/apache/
To configure environment variables:
Vim ~/.bash_profile
Add the following content:
Export java_home=/home/hadoop/apache/jdk1.8.0_101
Export path= $PATH: $JAVA _home/bin
Save and make the environment variable effective with the following directives:
SOURCE ~/.bash_profile
To test whether the JDK was installed successfully:
Java-version
2.3.2 Installing the Zookeeper cluster
Unzip the Zookeeper installation package
TAR-XVF zookeeper3.4.8.tar.gz/home/hadoop/apache/
To delete an installation package:
RM zookeeper3.4.8.tar.gz
Configure Hadoop User rights:
Chown-r Hadoop:hadoop zookeeper-3.4.8
To modify the Zookeeper configuration file:
CD apache/zookeeper-3.4.8/conf
CP Zoo_sample.cfg Zoo.cfg
Vim Zoo.cfg
Settings are as follows:
ticktime=2000//Client heartbeat time (MS)
Maximum time of initlimit=10//sequential heartbeat interval
synclimit=5//Sync time limit
Datadir=/home/hadoop/apache/zookeeper3.4.8/data//Data store directory
Datalogdir=/home/hadoop/apache/zookeeper3.4.8/data/log//Data Log storage directory
clientport=2181//Port number
maxclientcnxns=2000//Maximum number of connections zookeeper
server.1=hadoop-namenode1:2888:3888//Set Zookeeper node
server.2=hadoop-namenode2:2888:3888
server.3=hadoop-datanode1:2888:3888
Create a Zookeeper data store directory and log store directory:
Cd..
Mkdir-p Data/log
To modify the permissions of the data store file and log file:
Chown-r Hadoop:hadoop Data
CD data
Chown-r Hadoop:hadoop Log
Create the file myID in the data directory with the input content 1
echo "1" >> Data/myid//to the working directory after synchronization to the other two nodes, respectively, modified content is 2 and 3
Synchronizing the Zookeeper working directory to other nodes in the cluster
Scp-r zookeeper-3.4.8 [Email protected]:/home/hadoop/apache/
Scp-r zookeeper-3.4.8 [Email protected]:/home/hadoop/apache/
Modify the myID values to 2 and 3, respectively, and configure the environment variables for all nodes.
Vim ~/.bash_profile
Export zookeeper_home=/home/hadoop/apache/zookeeper-3.4.8
Export path= $PATH: $ZOOKEEPER _home/bin
Here the zookeeper cluster has been set up, the following to start:
zkserver.sh start
To view a process:
JPs
2.3.3 installation and configuration of Hadoop
Extracting the installation files under the Namenode1 node
TAR-XVF hadoop-2.6.0.tar.gz/home/hadoop/apache/
deleting installation files
RM hadoop2.6.0.tar.gz
Set user Permissions
CD Apache
Chown-r Hadoop:hadoop hadoop-2.6.0/
Configuration file
CD hadoop-2.6.0/etc/hadoop/
Vim hadoop-env.sh
After saving, create the directory you just set:
cd/home/hadoop/apache/hadoop-2.6.0
mkdir PIDs
Mkdir-p Data/logs
Configuration core-site.xml:
CD Etc/hadoop
Vim Core-site.xml
Configuration hdfs-site.xml:
Vim Hdfs-site.xml
Configuration mapred-site.xml:
CP Mapred-site.xml.template Mapred-site.xml
Vim Mapred-site.xml
To configure the Yarn-site.xml file:
Vim Yarn-site.xml
To configure the slave file:
Vim Slaves
Hadoop-datanode1
Create the directory that is involved in the configuration file:
Cd.. /..
Mkdir-p data/tmp
Mkdir-p data/journal
Mkdir-p Data/namenode
Mkdir-p Data/datanode
Synchronizing the Hadoop working directory to other nodes in the cluster
Scp-r hadoop-2.6.0 [Email protected]:/home/hadoop/apache/
Scp-r hadoop-2.6.0 [Email protected]:/home/hadoop/apache/
To configure environment variables on all nodes:
Vim ~/.bash_profile
Export hadoop_home=/home/hadoop/apache/hadoop-2.6.0
Export path= $PATH: $HADOOP _home/bin: $HADOOP _home/sbin
Make the modified environment variable effective:
SOURCE ~/.bash_profile
Hadoop Cluster initialization
Start the zookeeper cluster on all nodes:
zkserver.sh start
Format the ZKFC on Hadoop-namenode1:
HDFs Zkfc-formatzk
Start Journalnode (on Namenode1,namenode2 and Datanode1):
hadoop-daemon.sh Start Journalnode
Format HDFs (on Hadoop-namenode1):
Hadoop Namenode-format
Copy the metadata directory from the NAMENODE1 node Hadoop working directory to the NAMENODE2 node after formatting
Scp-r/home/hadoop/apache/hadoop-2.6.0/data/namenode/* [Email protected]:/home/hadoop/apache/hadoop-2.6.0/data/ namenode/
Start a Hadoop cluster
To start DFS on Hadoop-namenode1:
start-dfs.sh
The start-dfs.sh command will open the following process:
Namenode
Journalnode
Dfszkfailovercontroller
Datanode
Start yarn (operate on Namenode2)
start-yarn.sh
Start another ResourceManager on yarn
yarn-daemon.sh start ResourceManager
Start the security agent for yarn
yarn-daemon.sh Start ProxyServer
Note: ProxyServer acts as a firewall to improve security of access to the cluster
Start yarn's history task service
mr-jobhistory-daemon.sh Start Historyserver
At this point, the Hadoop cluster installation configuration is complete.
Build Hadoop2 HA