HDFS Federation and namenode ha

Source: Internet
Author: User
1. Background of HDFS Federation

In hadoop 1.0, the single namenode Design of HDFS brings about many problems, including single point of failure (spof), memory restriction, and cluster scalability and lack of isolation mechanisms (different businesses use the same namenode to affect each other) to solve these problems, hadoop 2.0 introduces the HA solution and HDFS Federation based on shared storage, which focuses on HDFS Federation.

HDFS Federation means that a HDFS cluster can have multiple namenode at the same time. These namenode manage a part of data separately and share all the storage resources of datanode. This design solves the following problems in a single namenode:

(1) scalability of HDFS clusters. Multiple namenode sub-directories allow a cluster to expand to more nodes, which no longer limits the number of file storage as in 1.0 due to memory restrictions.

(2) more efficient performance. Multiple namenode manage different data and provide external services at the same time, which will provide users with higher read/write throughput.

(3) Good isolation. You can assign different service data to different namenode for management as needed, so that there is little impact between different services.

Note that HDFS Federation cannot solve single point of failure (spof). That is to say, each namenode has a single point of failure (spof) problem, you need to deploy a backup namenode for each namenode to cope with the impact of namenode disconnection on the business.

2. installation environment

Since HDFS Federation still has the single point of failure (spof) problem, we should consider building together ha and HDFS Federation. Each node role is assigned as follows:

User Name

IP address

Namenode

(Active)

Secondarynamenode

Namenode

(Standby)

Journalnode

Datanode

Zookeeper

Owner Group

Centos94

192.168.1.94

Y

Y

 

 

Y

 

Cloud-1

Centos105

192.168.1.105

 

Y

Y

 

Y

 

Centos95

192.168.1.95

Y

Y

 

Y

Y

Y

Cloud-2

Centos112

192.168.1.112

 

Y

Y

Y

Y

Y

Centos111

192.168.1.111

 

 

 

Y

Y

Y

 

 

Software Version:

Hadoop: hadoop-2.2.0.tar.gz (applicable to 64-bit systems after source code self-compilation)

Zookeeper: zookeeper-3.4.6.tar.gz

 

For more information about the installation environment preparations, see hadoop, hbase, and hive Integrated Installation documents.

The following are some parameters:

Ha + Federation, All nodes common part hdfs-site.xml

<Property>

<Name> DFS. namenode. Name. dir </Name>

& Lt; Value & gt;/home/admin/hadoop-2.2.0/dfs/Name & lt;/value & gt;

</Property>

<Property>

<Name> DFS. datanode. Data. dir </Name>

<Value>/home/Administrator/hadoop-2.2.0/dfs/Data </value>

</Property>

<Property>

<Name> DFS. nameservices </Name>

<Value> cloud-1, cloud-2 </value>

</Property>

<Property>

<Name> DFS. Replication </Name>

<Value> 3 </value>

</Property>

<Property>

<Name> DFS. Ha. namenodes. Cloud-1 </Name>

<Value> centos94, centos105 </value>

</Property>

<Property>

<Name> DFS. Ha. namenodes. Cloud-2 </Name>

<Value> centos95, centos112 </value>

</Property>

<Property>

<Name> DFS. namenode. rpc-address.cloud-1.centos94 </Name>

<Value> centos94: 9000 </value>

</Property>

<Property>

<Name> DFS. namenode. http-address.cloud-1.centos94 </Name>

<Value> centos94: 50070 </value>

</Property>

<Property>

<Name> DFS. namenode. rpc-address.cloud-1.centos105 </Name>

<Value> centos105: 9000 </value>

</Property>

 

<Property>

<Name> DFS. namenode. http-address.cloud-1.centos105 </Name>

<Value> centos105: 50070 </value>

</Property>

<Property>

<Name> DFS. namenode. rpc-address.cloud-2.centos95 </Name>

<Value> centos95th: 9000 </value>

</Property>

<Property>

<Name> DFS. namenode. http-address.cloud-2.centos95 </Name>

<Value> centos95th: 50070 </value>

</Property>

<Property>

<Name> DFS. namenode. rpc-address.cloud-2.centos112 </Name>

<Value> centos112: 9000 </value>

</Property>

<Property>

<Name> DFS. namenode. http-address.cloud-2.centos112 </Name>

<Value> centos112: 50070 </value>

</Property>

<Property>

<Name> DFS. journalnode. edits. dir </Name>

<Value>/home/admin/hadoop-2.2.0/tmp/journal </value>

</Property>

<Property>

<Name> DFS. Ha. Fencing. methods </Name>

<Value> sshfence </value>

</Property>

<Property>

<Name> DFS. Ha. Fencing. Ssh. Private-key-files </Name>

<Value>/home/admin/. Ssh/id_rsa </value>

</Property>

In cloud-1And cloud-2Different Configurations:

Cloud-1

<Property>

<Name> DFS. namenode. Shared. edits. dir </Name>

<Value> qjournal: // centos95: 8485; centos111: 8485; centos112: 8485/cloud-1 </value>

<Description> when two namenode of cloud-1 share the edits file directory, the journalnode cluster is used for maintenance. </description>

</Property>

<Property>

<Name> DFS. Ha. automatic-failover.enabled.cloud-1 </Name>

<Value> true </value>

</Property>

<Property>

<Name> DFS. Client. failover. Proxy. provider. Cloud-1 </Name>

<Value> org. Apache. hadoop. HDFS. server. namenode. Ha. configuredfailoverproxyprovider </value>

</Property>

Cloud-2

<Property>

<Name> DFS. namenode. Shared. edits. dir </Name>

<Value> qjournal: // centos95: 8485; centos111: 8485; centos112: 8485/cloud-2 </value>

<Description> when two namenode of cloud-2 Share the edits file directory, the journalnode cluster is used for maintenance. </description>

</Property>

<Property>

<Name> DFS. Ha. automatic-failover.enabled.cloud-2 </Name>

<Value> true </value>

</Property>

<Property>

<Name> DFS. Client. failover. Proxy. provider. Cloud-2 </Name>

<Value> org. Apache. hadoop. HDFS. server. namenode. Ha. configuredfailoverproxyprovider </value>

</Property>

Configuration:Core-site.xml (All nodes)

<Configuration>

<Property>
<Name> fs. defaultfs </Name>
<Value> HDFS: // cloud-1 </value>

<Description> This is the default HDFS path. Cloud-1 is used in centos94 and centos105 nodes, and cloud-2 is used in centos95 and centos112 nodes. </description>

</Property>
<Property>
<Name> hadoop. tmp. dir </Name>
<Value>/home/admin/hadoop-2.2.0/tmp </value>
</Property>
<Property>
<Name> HA. zookeeper. Quorum </Name>
<Value> centos95: 2181, centos111: 2181, centos112: 2181 </value>

<Description> zookeeper cluster <description>

</Property>

</Configuration>

Configure slaves

VI slaves

Centos94

Centos95

Centos111

Centos112

Centos105

Configure mapred-site.xml (all nodes)

<Configuration>

<Property>

<Name> mapreduce. Framework. Name </Name>

<Value> yarn </value>

</Property>

</Configuration>

Configure yarn-site.xml (all nodes)

<Configuration>

<! -- Site specific yarn configurationproperties -->

<Property>

<Name> yarn. nodemanager. Aux-services </Name>

<Value> mapreduce_shuffle </value>

</Property>

<Property>

<Name> yarn. nodemanager. aux-services.mapreduce.shuffle.class </Name>

<Value> org. Apache. hadoop. mapred. shufflehandler </value>

</Property>

</Configuration>

Start:

1,StartZookeeper

Run the command on centos95, centos111, and centos112.

Bin/zkserver. Sh start

2,StartJournalnode

Run the following command on centos95, centos111, and centos112:

Sbin/hadoop-daemon.shstart journalnode

3,InZookeeperInitialize ha in ClusterStatus (Only needs to be executed for the first time)

Run on centos95 and centos112 (on the namenodes node ):

Bin/hdfszkfc-formatzk

When creating the Federation environment, you must maintain the value of $ {cluster_id} to ensure that all NN resources in the same cluster can be shared. The specific method is to format the first NN, obtain the value of $ {cluster_id}, and format other NN with the following command:

HDFS namenode-format-clusterid $ {cluster_id}

4,In cloud-1Centos94Run the following command on the node ):

./HDFS namenode-format-clusterid hadoop (name specified by yourself or generated by the cluster itself)

Sbin/hadoop-daemon.sh start namenode

The generated hadoop-cluster ID is shared by the entire cluster. Ensure that the two nameservice can share all the datanodes. Otherwise, the clusterids generated after the two nameservice are inconsistent, and datanode is randomly registered to different namenode.

Synchronize the metadata of the master nn on the centos105 (standby NN) node:

Bin/HDFS namenode-bootstrapstandby

Start slave NN:

Sbin/hadoop-daemon.sh start namenode

Start zkfc on centos94 and centos105:

Sbin/hadoop-daemon.sh start zkfc

After the command is executed, one node of hadoop0 and hadoop1 will become active.

5,In cloud-2Centos95Execute the format on the node:

./Hdfsnamenode-format-clusterid hadoop

Sbin/hadoop-daemon.sh start namenode

Synchronize the metadata of the master nn on the centos112 (standby NN) node:

Bin/HDFS namenode-bootstrapstandby

Start slave NN:

Sbin/hadoop-daemon.sh start namenode

Start zkfc on centos95 and centos112:

Sbin/hadoop-daemon.shstart zkfc

6,Start all datanode

Run: sbin/hadoop-daemons.sh startdatanode on the active namenode Node

7,Effect after startup:

We can see that the clusterid of the four is consistent.

 

8. Start Yarn

Run the following command on centos 94:

Sbin/start-yarn.sh

9. Disable the Cluster

Run the following command on the master node where Rm and NN are located:

Stop yarn:

Stop-yarn.sh

Stop HDFS:

Stop-dfs.sh

Stop zookeeper:

Zkserver. Sh stop

10. Summary

Question 1: After formatting, The CIDS of the two namespaces are inconsistent.

Solution: delete all TMP and DFS files and reformat them to start.

11. Install hbase cluster in hadoop cluster after Federation and HA Configuration

When you configure hbase clusters in each federation, several federations have several hbase clusters, while hbase clusters are isolated in parallel. You need to place the hadoop configuration file hdfs-site.xml files in each federation under the conf directory of hbase, and then configure hbase. rootdir to the federated prefix.

12. Solution

After converting NN from non-ha to Ha, clear/hbase/splitwal

Perform the following steps on zookeeper node:

1. Run/usr/lib/zookeeper/bin/zkcli. Sh.

2. ls/hbase/splitwal. If yes, run step 3.

3. RMR/hbase/splitwal

4. Restart hbase

Put the hadoop hdfs-site.xml and core-site.xml under hbase/Conf, and then restart hbase.

13. Handling of hmaster startup failure in the traditional hadoop mode (master-slave structure) (if no data exists)

1. Run ~ /Zookeeper/bin/zkcli. Sh

2. ls/hbase/splitwal. If yes, run step 3.

3. RMR/hbase/splitwal

4. Restart hbase

 

 

HDFS Federation and namenode ha

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.