HDFS Federation and namenode ha

Last Update:2014-09-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Background of HDFS Federation

In hadoop 1.0, the single namenode Design of HDFS brings about many problems, including single point of failure (spof), memory restriction, and cluster scalability and lack of isolation mechanisms (different businesses use the same namenode to affect each other) to solve these problems, hadoop 2.0 introduces the HA solution and HDFS Federation based on shared storage, which focuses on HDFS Federation.

HDFS Federation means that a HDFS cluster can have multiple namenode at the same time. These namenode manage a part of data separately and share all the storage resources of datanode. This design solves the following problems in a single namenode:

(1) scalability of HDFS clusters. Multiple namenode sub-directories allow a cluster to expand to more nodes, which no longer limits the number of file storage as in 1.0 due to memory restrictions.

(2) more efficient performance. Multiple namenode manage different data and provide external services at the same time, which will provide users with higher read/write throughput.

(3) Good isolation. You can assign different service data to different namenode for management as needed, so that there is little impact between different services.

Note that HDFS Federation cannot solve single point of failure (spof). That is to say, each namenode has a single point of failure (spof) problem, you need to deploy a backup namenode for each namenode to cope with the impact of namenode disconnection on the business.

2. installation environment

Since HDFS Federation still has the single point of failure (spof) problem, we should consider building together ha and HDFS Federation. Each node role is assigned as follows:

User Name	IP address	Namenode (Active)	Secondarynamenode	Namenode (Standby)	Journalnode	Datanode	Zookeeper	Owner Group
Centos94	192.168.1.94	Y	Y			Y		Cloud-1
Centos105	192.168.1.105		Y	Y		Y		Cloud-1
Centos95	192.168.1.95	Y	Y		Y	Y	Y	Cloud-2
Centos112	192.168.1.112		Y	Y	Y	Y	Y	Cloud-2
Centos111	192.168.1.111				Y	Y	Y

Software Version:

Hadoop: hadoop-2.2.0.tar.gz (applicable to 64-bit systems after source code self-compilation)

Zookeeper: zookeeper-3.4.6.tar.gz

For more information about the installation environment preparations, see hadoop, hbase, and hive Integrated Installation documents.

The following are some parameters:

Ha + Federation, All nodes common part hdfs-site.xml

<Name> DFS. namenode. Name. dir </Name>

& Lt; Value & gt;/home/admin/hadoop-2.2.0/dfs/Name & lt;/value & gt;

</Property>

<Name> DFS. datanode. Data. dir </Name>

<Value>/home/Administrator/hadoop-2.2.0/dfs/Data </value>

</Property>

<Name> DFS. nameservices </Name>

<Value> cloud-1, cloud-2 </value>

</Property>

<Name> DFS. Replication </Name>

</Property>

<Name> DFS. Ha. namenodes. Cloud-1 </Name>

<Value> centos94, centos105 </value>

</Property>

<Name> DFS. Ha. namenodes. Cloud-2 </Name>

<Value> centos95, centos112 </value>

</Property>

<Name> DFS. namenode. rpc-address.cloud-1.centos94 </Name>

<Value> centos94: 9000 </value>

</Property>

<Name> DFS. namenode. http-address.cloud-1.centos94 </Name>

<Value> centos94: 50070 </value>

</Property>

<Name> DFS. namenode. rpc-address.cloud-1.centos105 </Name>

<Value> centos105: 9000 </value>

</Property>

<Name> DFS. namenode. http-address.cloud-1.centos105 </Name>

<Value> centos105: 50070 </value>

</Property>

<Name> DFS. namenode. rpc-address.cloud-2.centos95 </Name>

<Value> centos95th: 9000 </value>

</Property>

<Name> DFS. namenode. http-address.cloud-2.centos95 </Name>

<Value> centos95th: 50070 </value>

</Property>

<Name> DFS. namenode. rpc-address.cloud-2.centos112 </Name>

<Value> centos112: 9000 </value>

</Property>

<Name> DFS. namenode. http-address.cloud-2.centos112 </Name>

<Value> centos112: 50070 </value>

</Property>

<Name> DFS. journalnode. edits. dir </Name>

<Value>/home/admin/hadoop-2.2.0/tmp/journal </value>

</Property>

<Name> DFS. Ha. Fencing. methods </Name>

<Value> sshfence </value>

</Property>

<Name> DFS. Ha. Fencing. Ssh. Private-key-files </Name>

<Value>/home/admin/. Ssh/id_rsa </value>

</Property>

In cloud-1And cloud-2Different Configurations:

Cloud-1

<Name> DFS. namenode. Shared. edits. dir </Name>

<Value> qjournal: // centos95: 8485; centos111: 8485; centos112: 8485/cloud-1 </value>

<Description> when two namenode of cloud-1 share the edits file directory, the journalnode cluster is used for maintenance. </description>

</Property>

<Name> DFS. Ha. automatic-failover.enabled.cloud-1 </Name>

</Property>

<Name> DFS. Client. failover. Proxy. provider. Cloud-1 </Name>

<Value> org. Apache. hadoop. HDFS. server. namenode. Ha. configuredfailoverproxyprovider </value>

</Property>

Cloud-2

<Name> DFS. namenode. Shared. edits. dir </Name>

<Value> qjournal: // centos95: 8485; centos111: 8485; centos112: 8485/cloud-2 </value>

<Description> when two namenode of cloud-2 Share the edits file directory, the journalnode cluster is used for maintenance. </description>

</Property>

<Name> DFS. Ha. automatic-failover.enabled.cloud-2 </Name>

</Property>

<Name> DFS. Client. failover. Proxy. provider. Cloud-2 </Name>

<Value> org. Apache. hadoop. HDFS. server. namenode. Ha. configuredfailoverproxyprovider </value>

</Property>

Configuration:Core-site.xml (All nodes)

<Property>
<Name> fs. defaultfs </Name>
<Value> HDFS: // cloud-1 </value>

<Description> This is the default HDFS path. Cloud-1 is used in centos94 and centos105 nodes, and cloud-2 is used in centos95 and centos112 nodes. </description>

</Property>
<Property>
<Name> hadoop. tmp. dir </Name>
<Value>/home/admin/hadoop-2.2.0/tmp </value>
</Property>
<Property>
<Name> HA. zookeeper. Quorum </Name>
<Value> centos95: 2181, centos111: 2181, centos112: 2181 </value>

<Description> zookeeper cluster <description>

</Property>

</Configuration>

Configure slaves

VI slaves

Centos94

Centos95

Centos111

Centos112

Centos105

Configure mapred-site.xml (all nodes)

<Name> mapreduce. Framework. Name </Name>

</Property>

</Configuration>

Configure yarn-site.xml (all nodes)

<! -- Site specific yarn configurationproperties -->

<Name> yarn. nodemanager. Aux-services </Name>

<Value> mapreduce_shuffle </value>

</Property>

<Name> yarn. nodemanager. aux-services.mapreduce.shuffle.class </Name>

<Value> org. Apache. hadoop. mapred. shufflehandler </value>

</Property>

</Configuration>

Start:

1,StartZookeeper

Run the command on centos95, centos111, and centos112.

Bin/zkserver. Sh start

2,StartJournalnode

Run the following command on centos95, centos111, and centos112:

Sbin/hadoop-daemon.shstart journalnode

3,InZookeeperInitialize ha in ClusterStatus (Only needs to be executed for the first time)

Run on centos95 and centos112 (on the namenodes node ):

Bin/hdfszkfc-formatzk

When creating the Federation environment, you must maintain the value of $ {cluster_id} to ensure that all NN resources in the same cluster can be shared. The specific method is to format the first NN, obtain the value of $ {cluster_id}, and format other NN with the following command:

HDFS namenode-format-clusterid $ {cluster_id}

4,In cloud-1Centos94Run the following command on the node ):

./HDFS namenode-format-clusterid hadoop (name specified by yourself or generated by the cluster itself)

Sbin/hadoop-daemon.sh start namenode

The generated hadoop-cluster ID is shared by the entire cluster. Ensure that the two nameservice can share all the datanodes. Otherwise, the clusterids generated after the two nameservice are inconsistent, and datanode is randomly registered to different namenode.

Synchronize the metadata of the master nn on the centos105 (standby NN) node:

Bin/HDFS namenode-bootstrapstandby

Start slave NN:

Sbin/hadoop-daemon.sh start namenode

Start zkfc on centos94 and centos105:

Sbin/hadoop-daemon.sh start zkfc

After the command is executed, one node of hadoop0 and hadoop1 will become active.

5,In cloud-2Centos95Execute the format on the node:

./Hdfsnamenode-format-clusterid hadoop

Sbin/hadoop-daemon.sh start namenode

Synchronize the metadata of the master nn on the centos112 (standby NN) node:

Bin/HDFS namenode-bootstrapstandby

Start slave NN:

Sbin/hadoop-daemon.sh start namenode

Start zkfc on centos95 and centos112:

Sbin/hadoop-daemon.shstart zkfc

6,Start all datanode

Run: sbin/hadoop-daemons.sh startdatanode on the active namenode Node

7,Effect after startup:

We can see that the clusterid of the four is consistent.

8. Start Yarn

Run the following command on centos 94:

Sbin/start-yarn.sh

9. Disable the Cluster

Run the following command on the master node where Rm and NN are located:

Stop yarn:

Stop-yarn.sh

Stop HDFS:

Stop-dfs.sh

Stop zookeeper:

Zkserver. Sh stop

10. Summary

Question 1: After formatting, The CIDS of the two namespaces are inconsistent.

Solution: delete all TMP and DFS files and reformat them to start.

11. Install hbase cluster in hadoop cluster after Federation and HA Configuration

When you configure hbase clusters in each federation, several federations have several hbase clusters, while hbase clusters are isolated in parallel. You need to place the hadoop configuration file hdfs-site.xml files in each federation under the conf directory of hbase, and then configure hbase. rootdir to the federated prefix.

12. Solution

After converting NN from non-ha to Ha, clear/hbase/splitwal

Perform the following steps on zookeeper node:

1. Run/usr/lib/zookeeper/bin/zkcli. Sh.

2. ls/hbase/splitwal. If yes, run step 3.

3. RMR/hbase/splitwal

4. Restart hbase

Put the hadoop hdfs-site.xml and core-site.xml under hbase/Conf, and then restart hbase.

13. Handling of hmaster startup failure in the traditional hadoop mode (master-slave structure) (if no data exists)

1. Run ~ /Zookeeper/bin/zkcli. Sh

2. ls/hbase/splitwal. If yes, run step 3.

3. RMR/hbase/splitwal

4. Restart hbase

HDFS Federation and namenode ha

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HDFS Federation and namenode ha

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

HDFS Federation and namenode ha

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support