The NameNode in the hdfs cluster has SPOF ). For clusters with only one NameNode, if an unexpected downtime occurs on the NameNode machine, the entire cluster will not be available until the NameNode restarts. Hdfs ha function implements hot backup of NameNode in the cluster by configuring two NameNodes ActiveStandby.
The NameNode in the hdfs cluster has SPOF ). For clusters with only one NameNode, if an unexpected downtime occurs on the NameNode machine, the entire cluster will not be available until the NameNode restarts. Hdfs ha function implements hot backup of NameNode in the cluster by configuring Active/Standby two NameNodes
The NameNode in the hdfs cluster has SPOF ). For clusters with only one NameNode, if an unexpected downtime occurs on the NameNode machine, the entire cluster will not be available until the NameNode restarts. Hdfs ha implements hot backup of NameNode in the cluster by configuring Active/Standby NameNodes. If the downtime of Active NN appears, the system switches to Standby so that the NN service is uninterrupted. Hdfs ha depends on zookeeper. The following is the test process.
The environment is as follows:
HOST: debugo0 [1-3], CentOS 6.5
Hadoop 2.4.1
ZooKeeper 3.4.6
HDFS |
ZooKeeper |
Debugo01 |
NN, ZKFC, JournalNode, DN |
Server |
Debugo02 |
NN, ZKFC, JournalNode, DN |
Server |
Debugo03 |
NN, JournalNode, DN |
Server |
1. Start ZooKeeper
Edit zookeeper configuration file
$ mkdir -p /home/hadoop/zooKeeper /home/hadoop/log/zoolog$ cd $ZOOKEEPER_HOME/conf$ cp zoo_sample.cnf zoo.cnf$ vim zoo.cnftickTime=2000initLimit=10syncLimit=5dataDir=/home/hadoop/zookeeperdataLogDir=/home/hadoop/log/zoologclientPort=2181
Copy the configuration file to the other two nodes, create a myid and start zookeeper
$ echo "1" > /home/hadoop/zookeeper/myid$ zkServer start...$ zkServer statuszkServer.sh statusJMX enabled by defaultUsing config: /opt/zookeeper/bin/../conf/zoo.cfgMode: leader
2. Modify Hadoop Configuration
In core-site, you must use ha. zookeeper. quorum to set the ZooKeeper server node. In addition, fs. defaultFS needs to be set to the logical service name of HDFS (must be consistent with dfs. nameservices in the hdfs-site.xml ).
$ core-site.xml fs.defaultFS hdfs://myhdfs hadoop.tmp.dir /home/hadoop/tmp hadoop.logfile.size 104857600 hadoop.logfile.count 10 io.file.buffer.size 131072 ha.zookeeper.quorum debugo01,debugo02,debugo03
The hdfs-site.xml needs to add more settings:
dfs.nameservices
-- Hdfs nn logical name, using the myhdfs set above
dfs.ha.namenodes.myhdfs
-- List of nodes with the Service Logic name myhdfs
dfs.namenode.rpc-address.myhdfs.nn1
-- RPC address for external services of the nn1 node in myhdfs
dfs.namenode.http-address.myhdfs.nn1
-- The http address of the external service of the nn1 node in myhdfs
dfs.namenode.shared.edits.dir
-- Sets the URI address of a group of journalnodes. active NN writes the edit log to these journalnodes, while standby NameNode reads the edit log and acts in the directory tree in the memory. If journalNode has multiple nodes, use semicolons to separate them. This attribute value must comply with the following format: qjournal: // host1: port1; host2: port2; host3: port3/journalId
dfs.journalnode.edits.dir
-- A directory on the node where JournalNode is located, used to store editlog and other status information.
dfs.ha.automatic-failover.enabled
-- Enable automatic failover. The automatic failover depends on the zookeeper cluster and ZKFailoverController (ZKFC). The latter is a zookeeper client that monitors the NN status information. Each node running NN must run an zkfc. Zkfs provides the following functions:
Health monitoringZkfc regularly runs the health-check command on the local NN. If the NN returns the correct result, this NN is considered OK. Otherwise, the node is considered invalid.
ZooKeeper session managementWhen the local NN is healthy, zkfc will hold a session in zk. If the local NN is active again, zkfc also has a "ephemeral" node as the lock. Once the local NN becomes invalid, the node will be automatically deleted.
ZooKeeper-based electionIf the local NN is healthy and zkfc finds that no other NN holds the exclusive lock. Then he will try to obtain the lock. Once the lock succeeds, it will need to execute Failover and then become an active NN node. The Failover process is: the first step is to execute fence on the previous NN, if necessary. Step 2: Convert the local NN to the active state.
The method to start zkfc is as follows: hadoop-daemon.sh start zkfc. The process is automatically started through the start-dfs.sh, generally without manual start and stop.
dfs.client.failover.proxy.provider.myhadoop
-- Java Implementation class for interaction between the client and active NameNode. the DFS client searches for the current active NN through this class.
dfs.ha.fencing.methods
-- Solve the split-brain problem of the HA cluster (that is, two master nodes provide external services at the same time, resulting in inconsistent systems ). In hdfs ha, JournalNode only allows one NameNode to write data. There are no two active NameNode issues,
However, when switching between the master and slave nodes, the previous active NameNode may still process RPC requests from the client. Therefore, you need to add an isolation mechanism (fencing) to kill the previous active NameNode. The common fence method is sshfence. You must specify the key dfs. ha. fencing. ssh. private-key-files used for ssh communication and the connection timeout time.
$ hdfs-site.xml dfs.nameservices myhdfs dfs.ha.namenodes.myhdfs nn1,nn2 dfs.namenode.rpc-address.myhdfs.nn1 debugo01:8020 dfs.namenode.rpc-address.myhdfs.nn2 debugo02:8020 dfs.namenode.http-address.myhdfs.nn1 debugo01:50070 dfs.namenode.http-address.myhdfs.nn2 debugo02:50070 dfs.namenode.shared.edits.dir qjournal://debugo01:8485;debugo02:8485;debugo03:8485/hadoop-journal dfs.ha.automatic-failover.enabled true dfs.journalnode.edits.dir /home/hadoop/journal dfs.client.failover.proxy.provider.myhadoop org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence how to communicate in the switch process dfs.ha.fencing.ssh.private-key-files /home/hadoop/.ssh/id_rsa the location stored ssh key dfs.ha.fencing.ssh.connect-timeout 5000 dfs.datanode.data.dir /home/hadoop/data dfs.namenode.name.dir /home/hadoop/namenode dfs.namenode.handler.count 8 dfs.replication2
3. Start NameNode HA
Initialize zkfc
mkdir /home/hadoop/journal /home/hadoop/data /home/hadoop/namenodehdfs zkfc -formatZK14/09/13 21:17:03 INFO zookeeper.ClientCnxn: Opening socket connection to server debugo02/192.168.46.202:2181. Will not attempt to authenticate using SASL (unknown error)14/09/13 21:17:03 INFO zookeeper.ClientCnxn: Socket connection established to debugo02/192.168.46.202:2181, initiating session14/09/13 21:17:03 INFO zookeeper.ClientCnxn: Session establishment complete on server debugo02/192.168.46.202:2181, sessionid = 0x2487208163e0000, negotiated timeout = 500014/09/13 21:17:03 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/myhdfs in ZK.
Start formatting HDFS for the first time. During HDFS formatting, HA will communicate with the journalnode. Therefore, you must first start the journalnode of the three nodes.
hdfs journalnode
hdfs namenode -format
Start all services directly through start-dfs.sh
$ start-dfs.sh Starting namenodes on [debugo01 debugo02]debugo01: starting namenode, logging to /opt/hadoop/logs/hadoop-hadoop-namenode-debugo01.outdebugo02: starting namenode, logging to /opt/hadoop/logs/hadoop-hadoop-namenode-debugo02.outdebugo01: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-debugo01.outdebugo02: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-debugo02.outdebugo03: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-debugo03.outStarting journal nodes [debugo01 debugo02 debugo03]debugo01: starting journalnode, logging to /opt/hadoop/logs/hadoop-hadoop-journalnode-debugo01.outdebugo03: starting journalnode, logging to /opt/hadoop/logs/hadoop-hadoop-journalnode-debugo03.outdebugo02: starting journalnode, logging to /opt/hadoop/logs/hadoop-hadoop-journalnode-debugo02.outStarting ZK Failover Controllers on NN hosts [debugo01 debugo02]debugo01: starting zkfc, logging to /opt/hadoop/logs/hadoop-hadoop-zkfc-debugo01.outdebugo02: starting zkfc, logging to /opt/hadoop/logs/hadoop-hadoop-zkfc-debugo02.out$ jps11562 Jps11031 NameNode11494 DFSZKFailoverController11324 JournalNode11136 DataNode7657 QuorumPeerMain
Access debugo01: 50070 in a browser and the node becomes active.
The namenode started first becomes active. You can see the regular replication in the standby log.
2014-09-13 21:25:46,132 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds2014-09-13 21:25:46,132 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning because of pending operations2014-09-13 21:25:46,132 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).....
You need to synchronize the metadata once below:
$ hdfs namenode -bootstrapStandby......About to bootstrap Standby ID nn1 from: Nameservice ID: myhdfs Other Namenode ID: nn2 Other NN's HTTP address: http://debugo02:50070 Other NN's IPC address: debugo02/192.168.46.202:8020 Namespace ID: 863538584 Block pool ID: BP-351445905-192.168.46.202-1410670136650 Cluster ID: CID-c98eb846-66b5-4663-9a35-a091eb1718d1 Layout version: -56=====================================================Re-format filesystem in Storage Directory /home/hadoop/namenode ? (Y or N) Y
Access
Kill the active NN process on debugo01, and standby NN becomes active.
Note: The following warning is prompted during manual switchover. Therefore, when zkfc is started, no switchover is required.
$ hdfs haadmin -transitionToActive nn1Automatic failover is enabled for NameNode at debugo01/192.168.46.201:8020. Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state.If you are very sure you know what you are doing, please specify the forcemanual flag.
Reference
Http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-High-Availability-Guide/cdh4hag_topic_2_3.html
Http://blog.csdn.net/u010967382/article/details/30976935
Http://blog.csdn.net/chenpingbupt/article/details/7922089
The NameNode in the hdfs cluster has SPOF ). For clusters with only one NameNode, if an unexpected downtime occurs on the NameNode machine, the entire cluster will not be available until the NameNode restarts. Hdfs ha implements hot backup of NameNode in the cluster by configuring Active/Standby NameNodes. If the downtime of Active NN appears, the system switches to Standby so that the NN service is uninterrupted. Hdfs ha depends on zookeeper. The following is the test process.
The environment is as follows:
HOST: debugo0 [1-3], CentOS 6.5
Hadoop 2.4.1
ZooKeeper 3.4.6
1. Start ZooKeeper
$ Mkdir-p/home/hadoop/zooKeeper/home/hadoop/log/zoolog
$ Cd $ ZOOKEEPER_HOME/conf
$ Cp zoo_sample.cnf zoo. cnf
$ Vim zoo. cnf
TickTime = 2000
InitLimit = 10
SyncLimit = 5
DataDir =/home/hadoop/zookeeper
DataLogDir =/home/hadoop/log/zoolog
ClientPort = 2181
Copy the configuration file to the other two nodes, create a myid and start zookeeper
$ Echo "1">/home/hadoop/zookeeper/myid
$ ZkServer start
...
$ ZkServer status
ZkServer. sh status
JMX enabled by default
Using config:/opt/zookeeper/bin/../conf/zoo. cfg
Mode: leader
2. Modify Hadoop Configuration
In core-site, you must use ha. zookeeper. quorum to set the ZooKeeper server node. In addition, fs. defaultFS needs to be set to the logical service name of HDFS (must be consistent with dfs. nameservices in the hdfs-site.xml ).
$ Core-site.xml
Fs. defaultFS
Hdfs: // myhdfs Hadoop. tmp. dir
/Home/hadoop/tmp Hadoop. logfile. size
104857600 Hadoop. logfile. count
10 Io. file. buffer. size
131072 Ha. zookeeper. quorum
Debugo01, debugo02, debugo03
The hdfs-site.xml needs to add more settings:
Dfs. nameservices -- Logical name of hdfs nn, use the myhdfs
Dfs. ha. namenodes. myhdfs -- List of nodes with the given service logic name myhdfs
Dfs. namenode. rpc-address.myhdfs.nn1 -- RPC address for external service of nn1 node in myhdfs
Dfs. namenode. http-address.myhdfs.nn1 -- the http address of the external service of the nn1 node in myhdfs
Dfs. namenode. shared. edits. dir -- sets the URI address of a group of journalnodes. active NN writes the edit log to these journalnodes, while standby NameNode reads the edit log and acts in the directory tree in the memory. If journalNode has multiple nodes, use semicolons to separate them. This attribute value must comply with the following format: qjournal: // host1: port1; host2: port2; host3: port3/journalId
Dfs. journalnode. edits. dir -- a directory on the node where JournalNode is located, used to store editlog and other status information.
Dfs. ha. automatic-failover.enabled -- enable automatic failover. The automatic failover depends on the zookeeper cluster and ZKFailoverController (ZKFC). The latter is a zookeeper client that monitors the NN status information. Each node running NN must run an zkfc. Zkfs provides the following functions:
Health monitoring
Zkfc regularly runs the health-check command on the local NN. If the NN returns the correct result, this NN is considered OK. Otherwise, the node is considered invalid.
ZooKeeper session management
When the local NN is healthy, zkfc will hold a session in zk. If the local NN is active again, zkfc also has a "ephemeral" node as the lock. Once the local NN becomes invalid, the node will be automatically deleted.
ZooKeeper-based election
If the local NN is healthy and zkfc finds that no other NN holds the exclusive lock. Then he will try to obtain the lock. Once the lock succeeds, it will need to execute Failover and then become an active NN node. The Failover process is: the first step is to execute fence on the previous NN, if necessary. Step 2: Convert the local NN to the active state.
The method to start zkfc is as follows: hadoop-daemon.sh start zkfc. The process is automatically started through the start-dfs.sh, generally without manual start and stop.
Dfs. client. failover. proxy. provider. myhadoop -- Java Implementation class for interaction between the client and active NameNode. the DFS client searches for the current active NN through this class.
Dfs. ha. fencing. methods -- solves the split-brain problem of the HA cluster (that is, when two masters provide external services at the same time, the system is in an inconsistent state ). In hdfs ha, JournalNode only allows one NameNode to write data. There are no two active NameNode issues,
However, when switching between the master and slave nodes, the previous active NameNode may still process RPC requests from the client. Therefore, you need to add an isolation mechanism (fencing) to kill the previous active NameNode. The common fence method is sshfence. You must specify the key dfs. ha. fencing. ssh. private-key-files used for ssh communication and the connection timeout time.
$ Hdfs-site.xml
Dfs. nameservices
Myhdfs Dfs. ha. namenodes. myhdfs
Nn1, nn2 Dfs. namenode. rpc-address.myhdfs.nn1
Debugo01: 8020 Dfs. namenode. rpc-address.myhdfs.nn2
Debugo02: 8020 Dfs. namenode. http-address.myhdfs.nn1
Debugo01: 50070 Dfs. namenode. http-address.myhdfs.nn2
Debugo02: 50070 Dfs. namenode. shared. edits. dir
Qjournal: // debugo01: 8485; debugo02: 8485; debugo03: 8485/hadoop-journal Dfs. ha. automatic-failover.enabled
True Dfs. journalnode. edits. dir
/Home/hadoop/journal Dfs. client. failover. proxy. provider. myhadoop
Org. apache. hadoop. hdfs. server. namenode. ha. ConfiguredFailoverProxyProvider Dfs. ha. fencing. methods
Sshfence
How to communicate in the switch process Dfs. ha. fencing. ssh. private-key-files
/Home/hadoop/. ssh/id_rsa
The location stored ssh key Dfs. ha. fencing. ssh. connect-timeout
5000 Dfs. datanode. data. dir
/Home/hadoop/data Dfs. namenode. name. dir
/Home/hadoop/namenode Dfs. namenode. handler. count
8 Dfs. replication
2
Mkdir/home/hadoop/journal/home/hadoop/data/home/hadoop/namenode
Hdfs zkfc-formatZK
14/09/13 21:17:03 INFO zookeeper. ClientCnxn: Opening socket connection to server debugo02/192.168.46.202: 2181. Will not attempt to authenticate using SASL (unknown error)
14/09/13 21:17:03 INFO zookeeper. ClientCnxn: Socket connection established to debugo02/192.168.46.202: 2181, initiating session
14/09/13 21:17:03 INFO zookeeper. ClientCnxn: Session establishment complete on server debugo02/192.168.46.202: 2181, sessionid = 0x2487208163e0000, negotiated timeout = 5000
14/09/13 21:17:03 INFO ha. ActiveStandbyElector: Successfully created/hadoop-ha/myhdfs in ZK.
Start formatting HDFS for the first time. During HDFS formatting, HA will communicate with the journalnode. Therefore, you must first start the journalnode of the three nodes.
Hdfs journalnode
Hdfs namenode-format
Start all services directly through start-dfs.sh
$ Start-dfs.sh
Starting namenodes on [debugo01 debugo02]
Debugo01: starting namenode, loglogging to/opt/hadoop/logs/hadoop-hadoop-namenode-debugo01.out
Debugo02: starting namenode, logging to/opt/hadoop/logs/hadoop-hadoop-namenode-debugo02.out
Debugo01: starting datanode, logging to/opt/hadoop/logs/hadoop-hadoop-datanode-debugo01.out
Debugo02: starting datanode, logging to/opt/hadoop/logs/hadoop-hadoop-datanode-debugo02.out
Debugo03: starting datanode, logging to/opt/hadoop/logs/hadoop-hadoop-datanode-debugo03.out
Starting journal nodes [debugo01 debugo02 debugo03]
Debugo01: starting journalnode, logging to/opt/hadoop/logs/hadoop-hadoop-journalnode-debugo01.out
Debugo03: starting journalnode, logging to/opt/hadoop/logs/hadoop-hadoop-journalnode-debugo03.out
Debugo02: starting journalnode, logging to/opt/hadoop/logs/hadoop-hadoop-journalnode-debugo02.out
Starting ZK Failover Controllers on NN hosts [debugo01 debugo02]
Debugo01: starting zkfc, logging to/opt/hadoop/logs/hadoop-hadoop-zkfc-debugo01.out
Debugo02: starting zkfc, logging to/opt/hadoop/logs/hadoop-hadoop-zkfc-debugo02.out
$ Jps
Jps 11562
11031 NameNode
11494 DFSZKFailoverController
11324 JournalNode
11136 DataNode
7657 QuorumPeerMain
Access debugo01: 50070 in a browser and the node becomes active.
The namenode started first becomes active. You can see the regular replication in the standby log.
21:25:46, 132 INFO org. apache. hadoop. hdfs. server. blockmanagement. CacheReplicationMonitor: Starting CacheReplication
Monitor with interval 30000 milliseconds
21:25:46, 132 INFO org. apache. hadoop. hdfs. server. blockmanagement. CacheReplicationMonitor: Rescanning because of pen
Ding operations
21:25:46, 132 INFO org. apache. hadoop. hdfs. server. blockmanagement. CacheReplicationMonitor: Scanned 0 directive (s)
D 0 block (s) in 1 millisecond (s ).
....
You need to synchronize the metadata once below:
Hdfs namenode-bootstrapStandby
......
About to bootstrap Standby ID nn1 from:
Nameservice ID: myhdfs
Other Namenode ID: nn2
Other NN's HTTP address: http: // debugo02: 50070
Other NN's IPC address: debugo02/192.168.46.202: 8020
Namespace ID: 863538584
Block pool ID: BP-351445905-192.168.46.202-1410670136650
Cluster ID: CID-c98eb846-66b5-4663-9a35-a091eb1718d1
Layout version:-56
========================================================== ==================
Re-format filesystem in Storage Directory/home/hadoop/namenode? (Y or N) Y
Access
Kill the active NN process on debugo01, and standby NN becomes active.
Note: The following warning is prompted during manual switchover. Therefore, when zkfc is started, no switchover is required.
Hdfs haadmin-transitionToActive nn1
Automatic failover is enabled for NameNode at debugo01/192.168.46.201: 8020. Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state.
If you are very sure you know what you are doing, please specify the forcemanual flag.
Reference
Http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-High-Availability-Guide/cdh4hag_topic_2_3.html
Http://blog.csdn.net/u010967382/article/details/30976935
Http://blog.csdn.net/chenpingbupt/article/details/7922089
Original article address: NameNode HA configuration details, thanks to the original author for sharing.