Configure a highly available Hadoop Platform
1. Overview
In Versions later than Hadoop2.x, the HA (High Available High availability) solution for solving single point of failure is proposed ). This blog explains how to build high-availability HDFS and YARN. The steps are as follows:
- Create a hadoop user
- Install JDK
- Configure hosts
- Install SSH
- Disable Firewall
- Modify Time Zone
- ZK (installation, startup, verification)
- Structure of HDFS + HA
- Role Assignment
- Environment variable configuration
- Core File Configuration
- Slave
- Startup command (commands related to hdfs and yarn)
- HA Switching
- Effect
The download package URL is as follows:
Hadoop2.x
Zookeeper
JDK
NOTE: If JDK cannot be downloaded, go to the official Oracle website to download JDK.
Now the installation package is ready, and we will start to build and configure it.
2. Build 2.1 to create a Hadoop user
- Useradd hadoop
- Passwd hadoop
Set the password as prompted. Then, I set the password-free permission for the hadoop user, or you can add other permissions on your own.
- Chmod + w/etc/sudoers
- Hadoop ALL = (root) NOPASSWD: ALL
- Chmod-w/etc/sudoers
2.2 install JDK
Decompress the downloaded installation package to/usr/java/jdk1.7 and set the environment variables. The command is as follows:
- Sudo vi/etc/profile
Edit the configuration as follows:
- Export JAVA_HOME =/usr/java/jdk1.7
- Export PATH = $ PATH: $ JAVA_HOME/bin
Then make the environment variable take effect immediately. The command is as follows:
- Source/etc/profile
Then verify that JDK is configured successfully. The command is as follows:
- Java-version
If the corresponding version number is displayed, JDK configuration is successful. Otherwise, the configuration is invalid!
2.3 configure hosts
The hosts configurations of all machines in the cluster must be the same (recommended ). It can avoid unnecessary troubles and replace the IP address with the domain name to facilitate configuration. The configuration information is as follows:
- 10.211.55.12 nna # NameNode Active
- 10.211.55.13 nns # NameNode Standby
- 10.211.55.14 dn1 # DataNode1
- 10.211.55.15 dn2 # DataNode2
- 10.211.55.16 dn3 # DataNode3
Then, use the scp command to distribute the hosts configuration to each node. The command is as follows:
- # Here we use the NNS node as an Example
- Scp/etc/hosts hadoop @ nns:/etc/
2.4 Install SSH
Run the following command:
- Ssh-keygen-t rsa
Press enter all the way, and write id_rsa.pub to authorized_keys. The command is as follows:
- Cat ~ /. Ssh/id_rsa.pub> ~ /. Ssh/authorized_keys
For a hadoop user, grant authorized_keys the 600 permission. Otherwise, password-free login is invalid. On other nodes, you only need to use the ssh-keygen-t rsa command to generate the corresponding public key, and then append id_rsa.pub of each node to the authorized_keys of the nna node. Finally, the authorized_keys file under the nna node is distributed to each node through the scp command ~ /. Ssh/directory. The directory is as follows:
- # Here we use the NNS node as an Example
- Scp ~ /. Ssh/authorized_keys hadoop @ nns :~ /. Ssh/
Then, use the ssh command to log on to each other and check whether password-free logon is enabled. The logon command is as follows:
- # Here we use the nns node as an Example
- Ssh nns
If you are prompted to enter a password during logon, the password is successfully configured.
2.5 disable Firewall
As hadoop nodes need to communicate with each other (RPC mechanism), the corresponding port needs to be monitored. Here I close the firewall directly. The command is as follows:
- Chkconfig iptables off
Note: For production environments, directly disabling the firewall poses a security risk. We can configure the firewall's filtering rules to configure the ports that hadoop needs to listen to in the firewall acceptance rules. For more information about firewall rule configuration, see "linux firewall configuration", or notify the company's O & M personnel to help with configuration management.
You also need to disable SELinux to modify the/etc/selinux/config file and change SELINUX = enforcing to SELINUX = disabled.
2.6 modify the time zone
If the time of each node is not synchronized, a startup exception or other reasons may occur. Set the time to the Shanghai time zone. The command is as follows:
- # Cp/usr/share/zoneinfo/Asia/Shanghai/etc/localtime
- Cp: overwrite '/etc/localtime '? Yes
- Change to UTC + 8 in China
- # Vi/etc/sysconfig/clock
- ZONE = "Asia/Shanghai"
- UTC = false
- ARC = false
2.7ZK (installation, start, verification) 2.7.1 Installation
Decompress the downloaded installation package to the specified location. The command is as follows:
- Tar-zxvf zk-1_version).tar.gz
Modify the zk configuration, rename conf/zoo_sample.cfg In the zk installation directory, and modify the content:
- # The number of milliseconds of each tick
- # Basic time unit for interaction between the server and the client (MS)
- TickTime = 2000
-
- # The number of ticks that the initial
- # Synchronization phase can take
- # Number of clients that zookeeper can accept
- InitLimit = 10
-
- # The number of ticks that can pass
- # Sending a request and getting an acknowledgement
- # Interval between requests and responses between the server and the client
- SyncLimit = 5
-
- # The directory where the snapshot is stored.
- # Do not use/tmp for storage,/tmp here is just
- # Example sakes.
- # Path for saving zookeeper data and logs
- DataDir =/home/hadoop/data/zookeeper
-
- # The port at which the clients will connect
- # Port on which the client interacts with zookeeper
- ClientPort = 2181
- Server.1 = dn1: 2888: 3888
- Server.2 = dn2: 2888: 3888
- Server.3 = dn3: 2888: 3888
-
- # Server. A = B: C: D
# Here, A is A number indicating the number of the server; B is the IP address of the server;
# C indicates the port on which the server exchanges information with the "Leader" in the cluster. When the leader fails, "D" indicates the port on which the Server communicates with each other during the election.
Next, create a myid file under the configured dataDir directory, and write a random number between 0 and. The numbers in this file on each zk are different, these numbers start from 1 and are written to each server in sequence. The serial number in the file must be consistent with the zk configuration serial number under the dn node, for example, server.1 = dn1: 2888: 3888, then the myid configuration file under the dn1 node should be written with 1.
2.7.2 start
Run the following command to start the zk process on each dn node:
- Bin/zkServer. sh start
Then, enter the jps command on each node and the following process will appear:
- QuorumPeerMain
2.7.3 Verification
Enter the jps command above. If the corresponding process is displayed, it indicates that the startup is successful. You can also enter the zk STATUS Command to view it. The command is as follows:
- Bin/zkServer. sh status
One leader and two follower will appear.
2.8 structure of HDFS + HA
The structure of HDFS configuration HA is as follows:
The general architecture includes:
1. Use shared storage to synchronize edits information between two NN instances. In the past, HDFS was share nothing but NN, And now NN shares storage, which actually transfers the location of single point of failure, but the high-end storage devices have various RAID and redundant hardware, including power supplies and NICs, which is slightly more reliable than servers. The data consistency is ensured through the flush operation after each metadata change in the NN and the NFS close-to-open operation.
2. DN reports block information to both NN at the same time. This is a required step to keep Standby NN up-to-date.
3. The FailoverController process used to monitor and control the NN process. Obviously, we cannot synchronize the heartbeat and other information within the NN process. The simplest reason is that a FullGC can hold the NN for more than 10 minutes. Therefore, there must be an independent and concise watchdog dedicated for monitoring. This is also a loosely coupled design that facilitates expansion or modification. Currently, ZooKeeper (ZK for short) is used for synchronization locks, but you can easily put this Zookeeper FailoverController (ZKFC for short) replace it with other HA or leader election schemes.
4. Fencing, to prevent split-brain, is to ensure that there is only one primary NN at any time, including three aspects:
Shared storage fencing ensures that only one NN can be written to edits.
Client fencing ensures that only one NN can respond to client requests.
DN fencing ensures that only one NN sends commands to the DN, such as deleting blocks and copying blocks.
2.9 Role Assignment
Name |
Host |
Responsibilities |
NNA |
10.211.55.12 |
Zkfc |
NNS |
10.211.55.13 |
Zkfc |
DN1 |
10.211.55.14 |
Zookeeper |
DN2 |
10.211.55.15 |
Zookeeper |
DN3 |
10.211.55.16 |
Zookeeper |
2.10 environment variable configuration
All configurations are listed here, and other components are configured later. For details, refer to the configuration here. After the configuration is complete, enter./etc/profile (or source/etc/profile) to make it take effect immediately. If the environment variable is successfully configured or not, input echo $ HADOOP_HOME. If the corresponding configuration path is output, the configuration is successful.
Note: The conf folder after hadoop2.x is changed to the etc folder.
The configuration is as follows:
- Export JAVA_HOME =/usr/java/jdk1.7
- Export HADOOP_HOME =/home/hadoop/hadoop-2.6.0
- Export ZK_HOME =/home/hadoop/zookeeper-3.4.6
- Export PATH = $ PATH: $ JAVA_HOME/bin: $ HADOOP_HOME/bin: $ HADOOP_HOM
Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04
Install and configure Hadoop2.2.0 on CentOS
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition
Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)
For more details, please continue to read the highlights on the next page: