HDFS Federation (HDFS Federation) (Hadoop2.3)

Source: Internet
Author: User

The term Federation was the first company to use the DB2 federal database.

First generation Hadoop HDFS:

 

The structure consists of a namenode and multiple datanode.

The functions are divided into namespace and block storage service.

 

HDFS Federation involves multiple namenode (or namespace ).

 

Here we have the concept of block pool. Each namespace has a pool. datanodes stores all the pools in the cluster, and the management between block pools is independent, when a namespace generates a block id, it does not need to be coordinated with other namespaces. The failure of a namenode does not affect the service of datanode to other namenodes.

A namespace and Its blockpool are used as a management unit. After deletion, the pool corresponding to datanodes will also be deleted. The Management Unit is also upgraded independently when the cluster is upgraded.

ClusterID is introduced to indicate all nodes in the cluster. After a namenode format, this id is generated, and other namenode formats in the Cluster also use this id.

Benefits of multiple namenode:

1. namespace scalability. Previously, only hdfs storage can be horizontally expanded, and namenode can also be used to reduce the memory and service pressure of a single namenode.

2. Performance. Multiple namenode can increase the read/write throughput.

3. Isolation. Isolate different types of programs to control resource allocation to a certain extent.

Federation Configuration:

The federated configuration is backward compatible and allows the current single-node environment to be converted to the federated environment without changing any configuration. The new configuration scheme ensures that the configuration files of all nodes in the cluster environment are the same.

The concept of NameServiceID is introduced here as the suffix of namenodes.

Step 1: configure the dfs. nameservices attribute for datanodes to recognize namenodes.

Step 2: add this suffix to each namenode.

Example:

 

<configuration>  <property>    <name>dfs.nameservices</name>    <value>ns1,ns2</value>  </property>  <property>    <name>dfs.namenode.rpc-address.ns1</name>    <value>nn-host1:rpc-port</value>  </property>  <property>    <name>dfs.namenode.http-address.ns1</name>    <value>nn-host1:http-port</value>  </property>  <property>    <name>dfs.namenode.secondaryhttp-address.ns1</name>    <value>snn-host1:http-port</value>  </property>  <property>    <name>dfs.namenode.rpc-address.ns2</name>    <value>nn-host2:rpc-port</value>  </property>  <property>    <name>dfs.namenode.http-address.ns2</name>    <value>nn-host2:http-port</value>  </property>  <property>    <name>dfs.namenode.secondaryhttp-address.ns2</name>    <value>snn-host2:http-port</value>  </property>  .... Other common configuration ...</configuration>

 

 

Management Cluster:

Start and Stop with start-dfs.sh and stop-dfs.sh

Different from the first generation of hadoop: This allows any valid node in the cluster to run these two commands, and starts namenode and datanode according to the configuration, the first generation of hadoop uses the node that runs the startup script as a single namenode.

Balancer:

Because multiple namenode are available, the balancer is also changed. Run the following command:

 

"$HADOOP_PREFIX"/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs start balancer [-policy <policy>]

The policy can be a node or a block pool, which is balanced at both the datanode level and the block pool level.

Deprecate a node:

Similar to the previous version, add the nodes to be deprecated to the exclude file of each namenode.

Step 1:

"$HADOOP_PREFIX"/bin/distributed-exclude.sh <exclude_file>
Step 2:
"$HADOOP_PREFIX"/bin/refresh-namenodes.sh
 
Cluster console:
 
http://<any_nn_host:port>/dfsclusterhealth.jsp

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.