Configuring HDFs HA and shell scripts in CDH

Source: Internet
Author: User
Tags failover mkdir zookeeper ssh port number

Recently, a Hadoop cluster was installed, so the HA,CDH4 that configured HDFS supported the quorum-based storage and shared storage using NFS two HA scenarios, while CDH5 only supported the first scenario, the Qjm ha scenario.

About the installation deployment process for Hadoop clusters You can refer to the process of installing CDH Hadoop clusters using Yum or manually installing Hadoop clusters. Cluster Planning

I have installed a total of three nodes of the cluster, for the HA scenario, three nodes are ready to install the following services: Cdh1:hadoop-hdfs-namenode (primary), Hadoop-hdfs-journalnode, HADOOP-HDFS-ZKFC Cdh2:hadoop-hdfs-namenode (Standby), Hadoop-hdfs-journalnode, HADOOP-HDFS-ZKFC cdh3: Hadoop-hdfs-journalnode

According to the plan above, install the corresponding service on the corresponding node. Installation Steps shut down the cluster

Stop all services on the cluster.

$ Sh/opt/cmd.sh ' for x in ' Ls/etc/init.d/|grep Spark '; Do service $x stop; Done '
$ sh/opt/cmd.sh ' for x in ' Ls/etc/init.d/|grep Impala ', do service $x stop, done '
$ sh/opt/cmd.sh ' F or x in ' Ls/etc/init.d/|grep hive '; Do service $x stop;  Done '
$ sh/opt/cmd.sh ' for x in ' Ls/etc/init.d/|grep hbase ', do service $x stop; "
$ Sh/opt/cmd.sh ' for X in ' Ls/etc/init.d/|grep Hadoop '; Do service $x stop; Done

Cmd.sh code content is described in the summary of deployment permissions for Hadoop cluster/opt/shell/cmd.sh. Stopping the client program

Stop all client programs for the service cluster, including scheduled tasks. Back up HDFs metadata

A, find the locally configured file directory (property named Dfs.name.dir or Dfs.namenode.name.dir or Hadoop.tmp.dir)

GREP-C1 hadoop.tmp.dir/etc/hadoop/conf/hdfs-site.xml

#或者
grep-c1 dfs.namenode.name.dir/etc/hadoop/conf/ Hdfs-site.xml

With the above command, you can see information similar to the following:

<property>
<name>hadoop.tmp.dir</name>
<value>/data/dfs/nn</value>
</property>

b, backup the HDFS data

Cd/data/dfs/nn
Tar-cvf/root/nn_backup_data.tar.
Installation Services

Install Hadoop-hdfs-journalnode on CDH1, CDH2, CDH3

$ ssh cdh1 ' yum install hadoop-hdfs-journalnode-y '
$ ssh cdh2 ' yum install hadoop-hdfs-journalnode-y '
$ ssh CD H3 ' Yum install hadoop-hdfs-journalnode-y '

Install the HADOOP-HDFS-ZKFC on CDH1, CDH2:

SSH cdh1 "yum install hadoop-hdfs-zkfc-y"
ssh cdh2 "yum install hadoop-hdfs-zkfc-y"
Modifying a configuration file

Modify the/etc/hadoop/conf/core-site.xml to make the following changes:

<property>
	<name>fs.defaultFS</name>
	<value>hdfs://mycluster:8020</value >
</property>
<property>
	<name>ha.zookeeper.quorum</name>
	<value >cdh1:21088,cdh2:21088,cdh3:21088</value>
</property>

Modify/etc/hadoop/conf/hdfs-site.xml, remove some of the original Namenode configuration, add the following:

<!--Hadoop HA--<property> <name>dfs.nameservices</name> <value>mycluster</value > </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2
	</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>cdh1:8020</value> </property> <property> <name> Dfs.namenode.rpc-address.mycluster.nn2</name> <value>cdh2:8020</value> </property> < Property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>cdh1:50070</value > </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value >cdh2:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</ name> <value>qjournal://cdh1:8485,cdh2:8485,cdh3:8485/mycluster</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/data/dfs/jn</value> </ property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value> Org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value> </property> < Property> <name>dfs.ha.fencing.methods</name> <value>sshfence (HDFS) </value> </ property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/var/lib /hadoop-hdfs/.ssh/id_rsa</value> </property> <property> <name>
 Dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>
Synchronizing Profiles

To synchronize the configuration files to other nodes in the cluster:

$ sh/opt/syn.sh/etc/hadoop/conf/etc/hadoop/

Create a directory on the Journalnode three nodes:

$ ssh cdh1 ' mkdir-p/data/dfs/jn; Chown-r hdfs:hdfs/data/dfs/jn '
$ ssh cdh2 ' mkdir-p/data/dfs/jn; chown-r hdfs:hdfs/data/dfs/jn '
$ ssh cdh3 ' Mkdir-p/data/dfs/jn; Chown-r Hdfs:hdfs/data/dfs/jn '
Configure login without password

Configuration of HDFs users with no password login on two nn:

For CDH1:

$ passwd HDFs
$ su-hdfs
$ ssh-keygen
$ ssh-copy-id  CDH2

For CDH2:

$ passwd HDFs
$ su-hdfs
$ ssh-keygen
$ ssh-copy-id   cdh1
Start Journalnode

Start the Hadoop-hdfs-journalnode service on Cdh1, CDH2, CDH3

$ ssh cdh1 ' service hadoop-hdfs-journalnode start '
$ ssh cdh2 ' service hadoop-hdfs-journalnode start '
$ ssh cdh3 ' Service Hadoop-hdfs-journalnode start '
Initializing shared storage

The shared storage is initialized on Namenode, and if it is not formatted, it is formatted first:

HDFs namenode-initializesharededits

Start Namenode:

$ service Hadoop-hdfs-namenode Start
Synchronous Standby NameNode

CDH2 as Standby NameNode, install the NameNode service on this node first

$ yum Install Hadoop-hdfs-namenode-y

Run again:

$ sudo-u HDFs Hadoop namenode-bootstrapstandby

If Kerberos is used, obtain the ticket of HDFs before executing:

$ kinit-k-t/etc/hadoop/conf/hdfs.keytab hdfs/cdh1@javachem.com
$ Hadoop namenode-bootstrapstandby

Then, start Standby NameNode:

$ service Hadoop-hdfs-namenode Start
Configuring automatic Switching

On two namenode, i.e. CDH1 and CDH2, install the HADOOP-HDFS-ZKFC

$ ssh cdh1 ' yum install hadoop-hdfs-zkfc-y '
$ ssh cdh2 ' yum install hadoop-hdfs-zkfc-y '

On either Namenode, the following command creates a Znode for automatic failover.

$ HDFs Zkfc-formatzk

If you want to encrypt access to ZooKeeper, refer to the section enabling HDFS HA securing access to ZooKeeper.

Then start ZKFC on two NameNode nodes:

$ ssh cdh1 "service HADOOP-HDFS-ZKFC start"
$ ssh cdh2 "service HADOOP-HDFS-ZKFC start"
Test

Visit http://cdh1:50070/and http://cdh2:50070/respectively to see who is active Namenode, who is standyby Namenode.

To view the status of a namenode:

#查看cdh1状态
$ sudo-u hdfs hdfs haadmin-getservicestate nn1
active

#查看cdh2状态
$ sudo-u HDFs HDFs haadmin-g Etservicestate nn2
Standby

To perform a manual switchover:

$ sudo-u HDFs hdfs haadmin-failover nn1 nn2
failover to NameNode at cdh2/192.168.56.122:8020 successful

Visit http://cdh1:50070/and http://cdh2:50070/again to see who is active Namenode, who is standyby Namenode. Configure HBase HA

First stop hbase and then modify the/etc/hbase/conf/hbase-site.xml to make the following changes:

<!--Configure HBase to use the HA NameNode nameservice--
<property>
    <name>hbase.rootdir </name>
    <value>hdfs://mycluster:8020/hbase</value>       
  </property>

Run/usr/lib/zookeeper/bin/zkcli.sh on the Zookeeper node

$ ls/hbase/splitlogs
$ rmr/hbase/splitlogs

Finally, start the HBase service. Configuring Hive HA

Run the following command to metastore the root address of the hive to HDFs Nameservice.

$/usr/lib/hive/bin/metatool-listfsroot 
Initializing hivemetatool.
Listing FS Roots.
Hdfs://cdh1:8020/user/hive/warehouse  

$/usr/lib/hive/bin/metatool-updatelocation hdfs://mycluster hdfs://cdh1 -tablepropkey avro.schema.url 
-serdepropkey schema.url  

$ metatool-listfsroot 
Listing FS Roots.
Initializing Hivemetatool.
Hdfs://mycluster:8020/user/hive/warehouse
Configure Impala

There is no need to make any changes, but be sure to keep in mind that the Fs.defaultfs parameter value in Core-site.xml is with the upper port number, which is 8020 in CDH. Configure YARN

For the time being unused, please refer to MapReduce (MRV1) and YARN (MRV2) high availability for details. Configure Hue

For the time being unused, please refer to Hue high availability for more information. Configure Llama

For the time being unused, please refer to Llama high availability for detailed instructions.
Original articles, reproduced please specify: Reproduced from Javachen Blog, author: javachen
This article link address: http://blog.javachen.com/2014/07/18/install-hdfs-ha-in-cdh.html
This article is based on the attribution 2.5 China mainland license agreement, which is reproduced, interpreted or used for commercial purposes, but must retain the attribution and link to this article. If you have any questions or authorization to negotiate, please contact me by email.

Configuring HDFs HA in CDH

Hadoop Hadoop

Like to cancel like

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.