Hadoop2 ha Introduction

Source: Internet
Author: User
Tags failover

This article describes the principles of the HA mechanism and the Hadoop2 ha configuration process.

———————————————————————————————————————————————————————————————————— the principle of HA mechanismThere are two namenode:active NameNode and standby NameNode in Ha. Where active NN is the primary node, and standby nn as the primary node of the backup, the standby NN can be switched to the primary node when the active nn is collapsed. The metadata information between the active NN and the standby nn is synchronized through the third-party service Journalnode process. If the active nn crashes, you can manually switch the standby NameNode to active NameNode, or you can switch automatically through the Zookeeper service. As shown in the following examples:

HADOOP2 Architecture

There is a reason for the appearance of HADOOP2. We know that Namenode is the core node that maintains the metadata information in the entire HDFS, so its capacity is limited and is subject to the memory space of the server. When the Namenode server's memory does not fit the data, then the HDFS cluster will not fit the data, the life will end. Therefore, its extensibility is limited. HDFs Federation refers to the simultaneous operation of multiple HDFs clusters, then the capacity of the theory is not limited, exaggerated point is infinite expansion. You can understand that, in a collective group, can be virtual out of two or more than two separate small clusters, the data between the small clusters is real-time sharing. Because the concept of namenode and Datanode is not already present in the Hadoop cluster alone. When one of the small clusters fails, you can start the Namenode node in another small cluster and continue working. Because the data is shared in real time, even if Namenode or datanode die together, it will not affect the entire cluster's normal operation.

Hadoop2 ha Configuration

1. File Hdfs-site.xml

1<configuration>2<property>3<name>dfs.replication</name>4<value>2</value>5</property>//Specifies the number of copies of the Datanode storage block6<property>7<name>dfs.permissions</name>8<value>false</value>9</property>Ten<property> One<name>dfs.permissions.enabled</name> A<value>false</value> -</property> -<property> the<name>dfs.nameservices</name> -<value>cluster1</value> -</property>//Name the HDFs set -<property> +<name>dfs.ha.namenodes.cluster1</name> -<value>hadoop1,hadoop2</value> +</property>//Specifies the Namenode when Nameservice is Cluster1 A<property> at<name>dfs.namenode.rpc-address.cluster1.hadoop1</name> -&LT;VALUE&GT;HADOOP1:9000</value> -</property>//Specify RPC address for hadoop101 -<property> -<name>dfs.namenode.http-address.cluster1.hadoop1</name> -&LT;VALUE&GT;HADOOP1:50070</value> in</property>//Specifies the HTTP address of the hadoop101 -<property> to<name>dfs.namenode.rpc-address.cluster1.hadoop2</name> +&LT;VALUE&GT;HADOOP2:9000</value> -</property> the<property> *<name>dfs.namenode.http-address.cluster1.hadoop2</name> $&LT;VALUE&GT;HADOOP2:50070</value>Panax Notoginseng</property> -<property> the<name>dfs.namenode.servicerpc-address.cluster1.hadoop1</name> +&LT;VALUE&GT;HADOOP1:53310</value> A</property> the<property> +<name>dfs.namenode.servicerpc-address.cluster1.hadoop2</name> -&LT;VALUE&GT;HADOOP2:53310</value> $</property> $<property> -<name>dfs.ha.automatic-failover.enabled.cluster1</name> -<value>true</value> the</property>//Specifies whether Cluster1 initiates automatic recovery -<property>Wuyi<name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485;hadoop4:8485;hadoop5:8485/cluster1</value> the</property>//journalnode cluster information used when specifying the Cluster1 two Namenode shared edits file directory -<property> Wu<name>dfs.client.failover.proxy.provider.cluster1</name> <value> Org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value> -</property>//Specifies which implementation class is responsible for performing a failover when the cluster1 fails About<property> $<name>dfs.journalnode.edits.dir</name> -<value>/home/muzili/yarn/yarn_data/tmp/journal</value> -</property>//Specifies the disk path where the Journalnode cluster stores data when it shares the Namenode directory -<property> A<name>dfs.ha.fencing.methods</name> +<value>sshfence</value> the</property> -<property> $<name>dfs.ha.fencing.ssh.Private-key-files</name> the<value>/home/muzili/.ssh/id_rsa</value> the</property> the<property> the<name>dfs.ha.fencing.ssh.connect-timeout</name> -<value>10000</value> in</property> the<property> the<name>dfs.namenode.handler.count</name> About<value> -</value> the</property> the</configuration>

2. File Mapred-site.xml

1      <configuration>2        <property>3            <name>mapreduce.framework.name</ Name>4            <value>yarn</value>5       </property>6      </ configuration>  // Specifies that the environment running MapReduce is yarn, different from HADOOP1

3. File Yarn-site.xml

1     <configuration> 2       <property>     3          <name> Yarn.resourcemanager.hostname</name>     4          <value>hadoop1</value>     5        </property>  // custom ResourceManager address, or single point 6       < property> 7          <name>yarn.nodemanager.aux-services</name> 8          < Value>mapreduce.shuffle</value> 9       </property>    </ Configuration>
4. Add Environment Variables

Environment variables are added in approximately the same way, the following configuration is for reference only

java_home=/usr/lib/jvm/jdk1.7. 0_51    Export path= $PATH: $JAVA _home/bin Export hbase_home=/home/muzili/ hadoop-2.2.0/app/hbase-0.94.6-cdh4.4.0 export hive_home=/home/muzili/hadoop-2.2.0/app/hive-0.12.0/ Export hadoop_home=/home/muzili/hadoop-2.2.0 export path= $PATH: $HBASE _home/bin: $HIVE _home/bin: $HADOOP _home/bin:$ hadoop_home/sbin export classpath=.: $JAVA _home/lib/tools.jar: $JAVA _home/lib/Dt.jar export zookeeper_home=/ home/muzili/yarn/hadoop-2.2.0/app/zookeeper-3.4.5 export path= $PATH: $ZOOKEEPER _home/bin    

Summarize

By introducing standby Namenode, HA solves the single point fault of HDFs on HADOOP1. If the reader is interested, you can refer to the blog for HA configuration installation.

Hadoop2 ha Introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.