Hadoop 2.2.0 ha Configuration

Source: Internet
Author: User
Tags failover

Distributed configuration Hadoop-2.2.0 in Ubuntu and centos introduces the most basic configuration of hadoop 2.2.0. Hadoop 2.2.0 provides the HA function. This article introduces the configuration of hadoop 2.2.0ha based on the previous article.

Note:

The following two namenode machines are named namenode1 and namenode2. among them, namenode1 is active node and namenode2 is standby namenode.

There are three journalnode machines (at least three): journalnode1, journalnode2, and journalnode3. (The number of machines of journalnode can be 3, 5, 7 ...)

In addition, pay attention to the consistency of the two namenode. Most of the operations performed on namenode1 must also be performed on namenode2.

Configuration File

The related configuration of core-site.xml and hdfs-site.xml is as follows:

1. core-site.xml

<configuration>        <property>                <name>fs.defaultFS</name>                <value>hdfs://mycluster</value>        </property>        <property>                <name>hadoop.tmp.dir</name>                <value>/tmp/hadoop2.2.0</value>        </property></configuration>
2. hdfs-site.xml

<configuration>        <property>                <name>dfs.replication</name>                <value>1</value>        </property>        <property>                <name>dfs.namenode.name.dir</name>                <value>/dfs/name</value>        </property>        <property>                <name>dfs.datanode.data.dir</name>                <value>/dfs/data</value>        </property>        <property>                <name>dfs.permissions</name>                <value>false</value>        </property>        <property>                <name>dfs.nameservices</name>                <value>mycluster</value>        </property>        <property>                <name>dfs.ha.namenodes.mycluster</name>                <value>nn1,nn2</value>        </property>        <property>                <name>dfs.namenode.rpc-address.mycluster.nn1</name>                <value>namenode1:8020</value>        </property>        <property>                <name>dfs.namenode.rpc-address.mycluster.nn2</name>                <value>namenode2:8020</value>        </property>        <property>                <name>dfs.namenode.http-address.mycluster.nn1</name>                <value>namenode1:50070</value>        </property>        <property>                <name>dfs.namenode.http-address.mycluster.nn2</name>                <value>namenode2:50070</value>        </property>        <property>                <name>dfs.namenode.shared.edits.dir</name>                <value>qjournal://journalnode1:8485;journalnode2:8485;journalnode3:8485/mycluster</value>        </property>        <property>                <name>dfs.journalnode.edits.dir</name>                <value>/home/dfs/journal</value>        </property>        <property>                <name>dfs.client.failover.proxy.provider.mycluster</name>                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>        </property>        <property>                <name>dfs.ha.fencing.methods</name>                <value>sshfence</value>        </property>        <property>                <name>dfs.ha.fencing.ssh.private-key-files</name>                <value>/home/root/.ssh/id_rsa</value>        </property>        <property>                <name>dfs.ha.fencing.ssh.connect-timeout</name>                <value>6000</value>        </property>        <property>                <name>dfs.ha.automatic-failover.enabled</name>                <value>false</value>        </property></configuration>
Restart to make the configuration take effect.

Startup Process:

1. Start journalnode on the journalnode Machine

sbin/hadoop-daemon.sh start journalnode
2. Start namenode on the namenode machine (assuming that namenode1 is active and namenode2 is standby)

A) if it is the first time to start, run the format command on namenode1:

bin/hadoop namenode -format
B) if it is not started for the first time, run the following command on namenode1.
bin/hdfs namenode  -initializeSharedEdits
C) Start namenode on namenode1:

sbin/hadoop-daemon.sh start namenode
D) run the following command on namenode2:

sbin/hadoop-daemon.sh start  namenode -bootstrapStandby
If it fails, copy the data in the DFS. namenode. Name. dir directory of namenode1 directly to the DFS. namenode. Name. dir directory of namenode2.

Then start namenode on namenode2:

sbin/hadoop-daemon.sh start namenode
Now both namenode1 and namenode2 are started and are in the "standby" status.

E) run the following command on namenode1:

bin/hdfs haadmin -transitionToActive nn1
In this way, the namenode1 State becomes "active ".

3. Start datanode on the datanode Machine

sbin/hadoop-daemon.sh start datanode
By now, HDFS of hadoop2.0 can be used normally and the HA function is available.

Check

You can view the status of active namenode (namenode1) and standby namenode (namenode2) on the following page.

Http: // namenode1: 50070/dfshealth. jsp

Http: // namenode2: 50070/dfshealth. jsp

In addition, you can run common HDFS shell commands to test whether HDFS is normal.

Ha failover Test

Stop namenode1 (simulate namenode1), and HDFS will be unavailable.

Run the following command on namenode2:

bin/hdfs haadmin -transitionToActive nn2
After the command is run successfully, the status of namenode2 changes to "active", and HDFS returns to normal.

On namenode1, run the following command:

bin/hdfs haadmin -failover nn1 nn2
At this time, namenode2 changes to "active", and namenode1 changes to "standby ".

Reprinted please indicate the source:Http://blog.csdn.net/iAm333

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.