Configure a highly available Hadoop Platform

Source: Internet
Author: User
Tags scp command

Configure a highly available Hadoop Platform
1. Overview

In Versions later than Hadoop2.x, the HA (High Available High availability) solution for solving single point of failure is proposed ). This blog explains how to build high-availability HDFS and YARN. The steps are as follows:

  1. Create a hadoop user
  2. Install JDK
  3. Configure hosts
  4. Install SSH
  5. Disable Firewall
  6. Modify Time Zone
  7. ZK (installation, startup, verification)
  8. Structure of HDFS + HA
  9. Role Assignment
  10. Environment variable configuration
  11. Core File Configuration
  12. Slave
  13. Startup command (commands related to hdfs and yarn)
  14. HA Switching
  15. Effect

The download package URL is as follows:

Hadoop2.x

Zookeeper

JDK

NOTE: If JDK cannot be downloaded, go to the official Oracle website to download JDK.

Now the installation package is ready, and we will start to build and configure it.

2. Build 2.1 to create a Hadoop user
  1. Useradd hadoop
  2. Passwd hadoop

Set the password as prompted. Then, I set the password-free permission for the hadoop user, or you can add other permissions on your own.

  1. Chmod + w/etc/sudoers
  2. Hadoop ALL = (root) NOPASSWD: ALL
  3. Chmod-w/etc/sudoers
2.2 install JDK

Decompress the downloaded installation package to/usr/java/jdk1.7 and set the environment variables. The command is as follows:

  1. Sudo vi/etc/profile

Edit the configuration as follows:

  1. Export JAVA_HOME =/usr/java/jdk1.7
  2. Export PATH = $ PATH: $ JAVA_HOME/bin

Then make the environment variable take effect immediately. The command is as follows:

  1. Source/etc/profile

Then verify that JDK is configured successfully. The command is as follows:

  1. Java-version

If the corresponding version number is displayed, JDK configuration is successful. Otherwise, the configuration is invalid!

2.3 configure hosts

The hosts configurations of all machines in the cluster must be the same (recommended ). It can avoid unnecessary troubles and replace the IP address with the domain name to facilitate configuration. The configuration information is as follows:

  1. 10.211.55.12 nna # NameNode Active
  2. 10.211.55.13 nns # NameNode Standby
  3. 10.211.55.14 dn1 # DataNode1
  4. 10.211.55.15 dn2 # DataNode2
  5. 10.211.55.16 dn3 # DataNode3

Then, use the scp command to distribute the hosts configuration to each node. The command is as follows:

  1. # Here we use the NNS node as an Example
  2. Scp/etc/hosts hadoop @ nns:/etc/
2.4 Install SSH

Run the following command:

  1. Ssh-keygen-t rsa

Press enter all the way, and write id_rsa.pub to authorized_keys. The command is as follows:

  1. Cat ~ /. Ssh/id_rsa.pub> ~ /. Ssh/authorized_keys

For a hadoop user, grant authorized_keys the 600 permission. Otherwise, password-free login is invalid. On other nodes, you only need to use the ssh-keygen-t rsa command to generate the corresponding public key, and then append id_rsa.pub of each node to the authorized_keys of the nna node. Finally, the authorized_keys file under the nna node is distributed to each node through the scp command ~ /. Ssh/directory. The directory is as follows:

  1. # Here we use the NNS node as an Example
  2. Scp ~ /. Ssh/authorized_keys hadoop @ nns :~ /. Ssh/

Then, use the ssh command to log on to each other and check whether password-free logon is enabled. The logon command is as follows:

  1. # Here we use the nns node as an Example
  2. Ssh nns

If you are prompted to enter a password during logon, the password is successfully configured.

2.5 disable Firewall

As hadoop nodes need to communicate with each other (RPC mechanism), the corresponding port needs to be monitored. Here I close the firewall directly. The command is as follows:

  1. Chkconfig iptables off

Note: For production environments, directly disabling the firewall poses a security risk. We can configure the firewall's filtering rules to configure the ports that hadoop needs to listen to in the firewall acceptance rules. For more information about firewall rule configuration, see "linux firewall configuration", or notify the company's O & M personnel to help with configuration management.

You also need to disable SELinux to modify the/etc/selinux/config file and change SELINUX = enforcing to SELINUX = disabled.

2.6 modify the time zone

If the time of each node is not synchronized, a startup exception or other reasons may occur. Set the time to the Shanghai time zone. The command is as follows:

  1. # Cp/usr/share/zoneinfo/Asia/Shanghai/etc/localtime
  2. Cp: overwrite '/etc/localtime '? Yes
  3. Change to UTC + 8 in China
  4. # Vi/etc/sysconfig/clock
  5. ZONE = "Asia/Shanghai"
  6. UTC = false
  7. ARC = false

 

2.7ZK (installation, start, verification) 2.7.1 Installation

Decompress the downloaded installation package to the specified location. The command is as follows:

  1. Tar-zxvf zk-1_version).tar.gz

Modify the zk configuration, rename conf/zoo_sample.cfg In the zk installation directory, and modify the content:

  1. # The number of milliseconds of each tick
  2. # Basic time unit for interaction between the server and the client (MS)
  3. TickTime = 2000
  4.  
  5. # The number of ticks that the initial
  6. # Synchronization phase can take
  7. # Number of clients that zookeeper can accept
  8. InitLimit = 10
  9.  
  10. # The number of ticks that can pass
  11. # Sending a request and getting an acknowledgement
  12. # Interval between requests and responses between the server and the client
  13. SyncLimit = 5
  14.  
  15. # The directory where the snapshot is stored.
  16. # Do not use/tmp for storage,/tmp here is just
  17. # Example sakes.
  18. # Path for saving zookeeper data and logs
  19. DataDir =/home/hadoop/data/zookeeper
  20.  
  21. # The port at which the clients will connect
  22. # Port on which the client interacts with zookeeper
  23. ClientPort = 2181
  24. Server.1 = dn1: 2888: 3888
  25. Server.2 = dn2: 2888: 3888
  26. Server.3 = dn3: 2888: 3888
  27.  
  28. # Server. A = B: C: D
    # Here, A is A number indicating the number of the server; B is the IP address of the server;
    # C indicates the port on which the server exchanges information with the "Leader" in the cluster. When the leader fails, "D" indicates the port on which the Server communicates with each other during the election.

Next, create a myid file under the configured dataDir directory, and write a random number between 0 and. The numbers in this file on each zk are different, these numbers start from 1 and are written to each server in sequence. The serial number in the file must be consistent with the zk configuration serial number under the dn node, for example, server.1 = dn1: 2888: 3888, then the myid configuration file under the dn1 node should be written with 1.

2.7.2 start

Run the following command to start the zk process on each dn node:

  1. Bin/zkServer. sh start

Then, enter the jps command on each node and the following process will appear:

  1. QuorumPeerMain
2.7.3 Verification

Enter the jps command above. If the corresponding process is displayed, it indicates that the startup is successful. You can also enter the zk STATUS Command to view it. The command is as follows:

  1. Bin/zkServer. sh status

One leader and two follower will appear.

2.8 structure of HDFS + HA

The structure of HDFS configuration HA is as follows:

The general architecture includes:

1. Use shared storage to synchronize edits information between two NN instances. In the past, HDFS was share nothing but NN, And now NN shares storage, which actually transfers the location of single point of failure, but the high-end storage devices have various RAID and redundant hardware, including power supplies and NICs, which is slightly more reliable than servers. The data consistency is ensured through the flush operation after each metadata change in the NN and the NFS close-to-open operation.

2. DN reports block information to both NN at the same time. This is a required step to keep Standby NN up-to-date.

3. The FailoverController process used to monitor and control the NN process. Obviously, we cannot synchronize the heartbeat and other information within the NN process. The simplest reason is that a FullGC can hold the NN for more than 10 minutes. Therefore, there must be an independent and concise watchdog dedicated for monitoring. This is also a loosely coupled design that facilitates expansion or modification. Currently, ZooKeeper (ZK for short) is used for synchronization locks, but you can easily put this Zookeeper FailoverController (ZKFC for short) replace it with other HA or leader election schemes.

4. Fencing, to prevent split-brain, is to ensure that there is only one primary NN at any time, including three aspects:

Shared storage fencing ensures that only one NN can be written to edits.

Client fencing ensures that only one NN can respond to client requests.

DN fencing ensures that only one NN sends commands to the DN, such as deleting blocks and copying blocks.

2.9 Role Assignment

Name

Host

Responsibilities

NNA

10.211.55.12

Zkfc

NNS

10.211.55.13

Zkfc

DN1

10.211.55.14

Zookeeper

DN2

10.211.55.15

Zookeeper

DN3

10.211.55.16

Zookeeper

2.10 environment variable configuration

All configurations are listed here, and other components are configured later. For details, refer to the configuration here. After the configuration is complete, enter./etc/profile (or source/etc/profile) to make it take effect immediately. If the environment variable is successfully configured or not, input echo $ HADOOP_HOME. If the corresponding configuration path is output, the configuration is successful.

Note: The conf folder after hadoop2.x is changed to the etc folder.

The configuration is as follows:

  1. Export JAVA_HOME =/usr/java/jdk1.7
  2. Export HADOOP_HOME =/home/hadoop/hadoop-2.6.0
  3. Export ZK_HOME =/home/hadoop/zookeeper-3.4.6
  4. Export PATH = $ PATH: $ JAVA_HOME/bin: $ HADOOP_HOME/bin: $ HADOOP_HOM

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

For more details, please continue to read the highlights on the next page:

  • 1
  • 2
  • Next Page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.