Hadoop 2.0 NameNode HA and Federation practices

Source: Internet
Author: User
Tags failover zookeeper port number hdfs dfs hadoop fs advantage

This article is partially transferred from Hadoop 2.0 NameNode HA and Federation practices
This article is part of a tutorial on automatic ha+federation+yarn configuration in detail hadoop2

A Hadoop 20 HA implementation 1 uses shared storage to synchronize the edits information between two NN 2 Datanode hereinafter referred to as DN simultaneously to two NN reporting block information 3 Failovercontroller process for monitoring and controlling the NN process 4 Isolation Fencing Two Hadoop 20 Federation implementation Mode 1 Federation work Step 2 Federation Advantage three test environment 1 HDFs HA separate configuration 2 HDFs Federation individually configured 3 HDFs Federation and HA configuration four ha and Federation start-up process 1 distribution of each host 2 START process 3 Verify that HDFS works five HA test scenario and Results 1 system failure 2 Client Connection 3 test result 4 extended test Six ha recommended configuration and other 1 ha recommended configuration 2 Client retry mechanism seven outstanding issues

I. How to implement HA in Hadoop 2.0

The HDFS ha solution in Hadoop 2.0 reads: "Hadoop 2.0 NameNode ha and Federation practices", two HA scenarios are currently available in HDFS2, one based on the NFS shared storage scenario, a Paxo-based s algorithm of the scheme Quorum Journal Manager (QJM), its basic principle is to use 2n+1 station Journalnode storage Editlog, each write data operation has most (>=n+1) return successful when the write is considered successful, the data is not lost. The community is currently trying to use Bookeeper as a shared storage system, specifically for reference.

The composition of the HDFS HA frame given by HDFS-1623 is as follows:

1.1 using shared storage to synchronize edits information between two NN

The former HDFS is share nothing but nn, now NN and share storage, this is actually transferred a single point of failure location, but in the high-end storage device has a variety of RAID and redundant hardware including power and network card, etc., than the server reliability or slightly improved. The data consistency is ensured through the flush operation after each metadata change within the NN, plus the close-to-open of NFS. The community is now also trying to put the metadata store on bookkeeper to remove the dependency on shared storage, Cloudera also provides Quorum Journal Manager implementation and code. 1.2 DataNode (hereinafter referred to as DN) reporting block information to two NN at the same time

This is the necessary step to keep the standby nn up-to-date in the cluster, not to repeat. 1.3 Failovercontroller process for monitoring and controlling the NN process

Obviously, we can not in the NN process heartbeat and other information synchronization, the simplest reason, a FULLGC will be able to suspend NN for more than 10 minutes, so, must have a separate and short watchdog to be dedicated to monitor. This is also a loosely coupled design, easy to expand or change, the current version is used ZooKeeper (hereinafter referred to as ZK) to do the synchronization lock, but users can conveniently put this ZooKeeper Failovercontroller (hereinafter referred to as ZKFC) replaced with other HA schemes or L Eader election programme. 1.4 Isolation (Fencing)

Preventing brain fissures is guaranteed at any time with only one master NN, including three aspects: shared storage fencing, ensuring that only one nn can write to edits client fencing, ensuring that only one NN can respond to client requests DataNode fencing, ensuring that only one NN can send commands to the DN, such as deleting blocks, copying blocks, etc.


two. How Federation is implemented in Hadoop 2.0

2.1 Federation Work steps

Multiple NN share a storage resource on the DN of a cluster, each NN can be served separately

Each NN defines a storage pool with a separate ID and each DN provides storage for all storage pools

The DN reports the block information to its corresponding NN according to the storage pool ID, and the DN reports the local storage of available resources to all NN

If you need to easily access resources on several NN on the client, you can use the Client Mount table to map different directories to different NN, but there must be a corresponding directory on the NN 2.2 Federation Advantage

1. Minimal changes, forward compatibility with existing NN without any configuration changes if an existing client is connected to only one NN, the code and configuration need not be changed

2. Separating namespace management and block storage Management provides good extensibility while allowing other file systems or applications to use block storage Pool Unified block Storage management ensures resource utilization can only be achieved through firewall configuration to achieve certain file access isolation without the use of complex Kerberos authentication

3. Client Mount table automatically corresponds to NN by path to make Federation configuration changes transparent to the application


three. Test environment

The above is an introduction to HA and Federation, and for friends who are already familiar with HDFS, this information should already help you to understand the architecture and implementation quickly, and if you need to know more about the details, you can read the design document or code in detail. The main purpose of this article is to summarize the results of our tests, so it is now the beginning of the text.

To thoroughly understand the HA and Federation configuration, we directly one step the following test scenarios, combining HA and Federation:


There is a concept in this picture that is not stated earlier, that is nameservice. The NN is abstracted in Hadoop 2.0, and the service is no longer the NN itself, but the Nameservice (hereinafter referred to as NS). The Federation is made up of multiple NS, each of which is made up of one or two (HA) NN. In the next test configuration there will be more intuitive examples.

Figure DN-1 to DN-6 is six datanode,nn-1 to NN-4 is four NameNode, respectively comprising two HA NS, and then through the Federation combination of external services. The Storage Pool 1 and Storage Pool 2 correspond to these two NS respectively. We have a mount table mapping on the client, map/share to NS1, map/user to NS2, this mapping is not only to specify the NS, but also need to specify a directory on it, later configuration you can see.

Let's take a look at what changes need to be made in the config file, for the sake of understanding, we introduce the HA and Federation separately, and then we describe how to configure the HA and Federation at the same time, first we look at the HA configuration: 3.1 HDFS ha separate configuration /c1>

For all nodes in HA, including NN and DN and client, the following changes need to be made:

1. HA, all nodes, Hdfs-site.xml

<property>
    <name>dfs.nameservices</name>
    <value>ns1</value>
    < description> NS logical name of the service provided, corresponding to Core-site.xml </description>      
</property>

<property>
    <name>dfs.ha.namenodes.${NS_ID}</name>
    <value>nn1,nn3</value>
    < description> lists Namenode logical names under this logical name </description>      
</property>

<property>
    < Name>dfs.namenode.rpc-address.${ns_id}.${nn_id}</name>
    <value>host-nn1:9000</value>
    <description> Specify RPC location for Namenode </description>      
</property>

<property>
    <name>dfs.namenode.http-address.${NS_ID}.${NN_ID}</name>
    <value>host-nn1:50070 </value>
    <description> Specify the Web server location for Namenode </description>      
</property>

In the above example, we use ${} to represent the value of the variable, and its expanded content is roughly as follows:

<property>
    <name>dfs.ha.namenodes.ns1</name>
    <value>nn1,nn3</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.ns1.nn1</name>
    < value>host-nn1:9000</value>
</property>

<property>
    <name> dfs.namenode.http-address.ns1.nn1</name>
    <value>host-nn1:50070</value>
</ property>

<property>
    <name>dfs.namenode.rpc-address.ns1.nn3</name>
    < value>host-nn3:9000</value>
</property>

<property>
    <name> dfs.namenode.http-address.ns1.nn3</name>
    <value>host-nn3:50070</value>
</ Property>

At the same time, the NameNode or client in the HA cluster needs to make the following configuration changes:


2. Ha,namenode,hdfs-site.xml

<property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>file:///nfs/ Ha-edits</value>
    <description> Specifies shared storage for HA storage edits, typically NFS mount points </description>
</ property>

<property>
    <name>ha.zookeeper.quorum</name>
    <value>host-zk1 :2181,host-zk2:2181,host-zk3:2181,</value>
    <description> Specifies a list of zookeeper cluster machines for HA </description >
</property>

<property>
    <name>ha.zookeeper.session-timeout.ms</name>
    <value>5000</value>
    <description> Specify the zookeeper timeout interval in milliseconds </description>
</property>

<property>
    <name>dfs.ha.fencing.methods</name>
    <value >sshfence</value>
    <description> Specify ha to isolate the method, default is SSH, can be set as shell, detail later </description>
</property>


3. HA, client, Hdfs-site.xml

<property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</ value>
    <description> or false</description>
</property>

<property>
    <name>dfs.client.failover.proxy.provider.${NS_ID}</name>
    <value> Org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value>
    <description > Specifies the proxy class that the client uses for the HA switchover, different NS can use different proxy classes above example for Hadoop 2.0 comes with default proxy class </description>
</property>

Finally, to facilitate the use of relative paths rather than using hdfs://ns1 as a prefix for file paths each time, we also need to modify core-site.xml on each of the role nodes:


4. HA, all nodes, Core-site.xml

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://ns1</value>
    <description> The default File service protocol and NS logical name, and the corresponding configuration in Hdfs-site replaces the 1.0-in-fs.default.name</description>      
</ Property>
3.2 HDFS Federation separate configuration

Let's take a look at what we should do if we use Federation alone, and here we assume that instead of using HA, we are using NN1 and nn2 to make up the Federation cluster, with their corresponding NS logical names NS1 and NS2 respectively. For ease of understanding, we start with the core-site.xml and mount tables used by the client:

1. Federation, all nodes, Core-site.xml

<xi:include href= "Cmt.xml"/>
<property>
    <name>fs.defaultFS</name>
    <value >viewfs://nsX</value>
    <description> NS logical name of the entire federation cluster for external service, note that the protocol here is no longer hdfs, but the newly introduced VIEWFS This logical name will be used in the following Mount table </description>
</property>

We included a cmt.xml file in the above Core-site, the Client Mount table, which is the mapping of the virtual path to a specific NS and its physical subdirectory, such as/share mapping to ns1/real_ Share,/user maps to the/real_user of NS2, as shown in the following example:


2. Federation, all nodes, Cmt.xml

<configuration>
    <property>
        <name>fs.viewfs.mounttable.nsX.link./share</name>
        <value>hdfs://ns1/real_share</value>
    </property>
    <property>
        <name >fs.viewfs.mounttable.nsX.link./user</name>
        <value>hdfs://ns2/real_user</value>
    </property>
</configuration>

Note that the NSX in this corresponds to the NSX in the Core-site.xml. And for each NS, you can create multiple virtual paths that map to different physical paths. At the same time, specific information for each NS needs to be given in Hdfs-site.xml:

<property>
    <name>dfs.nameservices</name>
    <value>ns1,ns2</value>
    < Description> the NS logical name of the service provided, corresponding to Core-site.xml or cmt.xml </description>      
</property>

< property>
    <name>dfs.namenode.rpc-address.ns1</name>
    <value>host-nn1:9000</ value>
</property>

<property>
    <name>dfs.namenode.http-address.ns1</name >
    <value>host-nn1:50070</value>
</property>

<property>
    <name >dfs.namenode.rpc-address.ns2</name>
    <value>host-nn2:9000</value>
</property >

<property>
    <name>dfs.namenode.http-address.ns2</name>
    <value> Host-nn2:50070</value>
</property>

As you can see, in the case of only Federation and no HA, the configuration name only needs to give ${ns_id} directly, then value is the actual machine name and port number, no more. ${nn_id}.

Here is a situation where the NN itself is configured. As you can see from the above, NN is the target physical path that needs to be established in advance to set up the Client Mount table mapping, such as/real_share, before it can be accessed through the above mappings, but if you do not specify the full path, instead of the "Map + relative path", The client can only operate under the virtual directory of the mount point, thereby failing to create a physical directory of the mapping directory itself. Therefore, in order to establish the mount point mapping directory on the NN, we must use the HDFS protocol and the absolute path on the command line:

HDFs Dfs-mkdir Hdfs://ns1/real_share

Do not use viewfs://on the NN to configure, but use hdfs://, that is to solve the problem, but is not the best solution, and did not make the root of the problem clear. 3.3 HDFS Federation and HA configuration

Finally, we combine the HA and Federation to really build the same example of a test environment diagram at the beginning of this section. Through the previous description, an experienced friend should have guessed, in fact, the key to ha+federation configuration is the combination of Hdfs-site.xml dfs.nameservices and dfs.ha.namenodes.${ns_id}, and then follow ${NS_ID} and ${nn_id} to combine name to list all NN information. The rest of the configuration.

1. HA + Federation, all nodes, Hdfs-site.xml

<property> <name>dfs.nameservices</name> <value>ns1, ns2</value> </property> <property> <name>dfs.ha.namenodes.ns1</name> <value>nn1,nn3</value> &LT;/PROPERTY&G

T <property> <name>dfs.ha.namenodes.ns2</name> <value>nn2,nn4</value> &LT;/PROPERTY&G

T <property> <name>dfs.namenode.rpc-address.ns1.nn1</name> <value>host-nn1:9000</value&
Gt </property> <property> <name>dfs.namenode.http-address.ns1.nn1</name> <value>host-n n1:50070</value> </property> <property> <name>dfs.namenode.rpc-address.ns1.nn3</name > <value>host-nn3:9000</value> </property> <property> <name>dfs.namenode.http-ad dress.ns1.nn3</name> <value>host-nn3:50070</value> </property> <property> <name& Gt;dfs.namenode.rpc-address.ns2.nn2</name> <value>host-nn2:9000</value> </property> <property> &LT;NAME&G T;dfs.namenode.http-address.ns2.nn2</name> <value>host-nn2:50070</value> </property> < Property> <name>dfs.namenode.rpc-address.ns2.nn4</name> <value>host-nn4:9000</value> & lt;/property> <property> <name>dfs.namenode.http-address.ns2.nn4</name> <value>host-nn 4:50070</value> </property>

For No. ${ns_id}, which is an NS-insensitive project, each NN needs to be configured separately with different values, especially for NFS locations (DFS.NAMENODE.SHARED.EDITS.DIR), because different NS must use different NFS directories to do their own The internal HA (unless the mount is the same locally, but it is different on the server side of NFS, but this is a very bad practice), and like the ZK location and isolation method can actually use the same configuration.

In addition to the configuration, the initialization of the cluster has some additional steps, for example, to create an HA environment, you need to format an NN, and then synchronize its name.dir the following data to the second, and then start the cluster (we did not test from a single upgrade to HA, but the same should be the case). When creating a Federation environment, care must be taken to maintain the value of ${cluster_id} to ensure that all NN can share the storage resources of the same cluster, by obtaining the value of its ${cluster_id} after the first NN is formatted. Then format the other nn with the following command:

Hadoop Namenode-format-clusterid ${cluster_id}

Of course, you can also use your own defined ${cluster_id} values from the beginning of the first set.

If it is a ha + Federation scenario, you need to initialize two units in Federation format, one for each HA environment, to ensure that the ${cluster_id} is consistent, and then synchronize the meta-data under Name.dir to another stage in the HA environment, then start the set Group.

The HDFs client and API in Hadoop 2.0 are also slightly changed, and the command line introduces the new HDFs command, which is equivalent to the previous Hadoop FS command. The API introduces a new Viewfilesystem class that can be used to get the contents of a mounted table, and if you do not need to read the contents of the Mount table, but simply use the file system, you can open or create the file directly from the path by ignoring the Mount table. The code examples are as follows:

Viewfilesystem FSView = (Viewfilesystem) viewfilesystem.get (conf);
mountpoint[] m = fsview.getmountpoints ();
for (Mountpoint m1:m)
    System.out.println (M1.GETSRC ());

Create files directly using/share/test.txt/
/If you follow the previous configuration, the client will automatically find the NS1
////And then through the failover proxy class that the NN1 is active NN and communicates
with it. Path p = new Path ("/share/test.txt");
Fsdataoutputstream fos = fsview.create (p);


four. HA and Federation start-up process 4.1 distribution of each host

?
Host name is it NameNode? is it DataNode? is it journalnode? is it ZooKeeper? is it ZKFC
Nn1 Yes, belonging to the cluster ns1 No

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.