HDFS ha Introduction and configuration understanding

Source: Internet
Author: User
Tags failover ssh free ssh ssh port
1. HDFS ha Introduction

Compared to HDFs in Hadoop1.0,hadoop 2.0, two significant features were added, Ha and federaion. HA is the high availability, used to solve the Namenode single point of failure problem, the feature is a hot spare way to provide a backup for the main Namenode, once the main namenode failure, you can quickly switch to standby namenode, So as to achieve uninterrupted external service delivery. Federation is "Federated", which allows multiple namenode in an HDFS cluster to provide services concurrently, and these namenode are part of the directory (horizontal slicing), isolated from each other, but share the underlying Datanode storage resources.

In a typical HDFs ha scenario, typically consists of two namenode, one in the active state and the other in the standby state. Active Namenode provides services externally, such as processing RPC requests from clients, while standby Namenode does not provide services to the outside, synchronizing only the state of the active namenode so that it can switch quickly when it fails.

In order to be able to synchronize active and standby two Namenode metadata information in real time (actually Editlog), a shared storage system can be provided, either NFS, QJM (Quorum Journal Manager) or Bookeeper, The active Namenode writes the data to the shared storage system, and standby listens to the system and, once new data is found to be written, reads the data and loads it into its own memory to ensure that its memory state remains basically consistent with the active Namenode, Standby can be quickly cut into active namenode in emergency situations.

Prior to the Hadoop0.23.2 version, Namenode was a single point of failure for the HDFs cluster, with only one namenode per cluster, and if the machine or process was unavailable, the entire cluster could not be used until the Namenode was restarted or a Namenode node was newly started. The main factors that affect HDFS cluster unavailability are the following two scenarios:

1) The first scenario, such as a machine outage, will cause the cluster to be unavailable and can only be used after restarting Namenode.

2) The second scenario is a planned software or hardware upgrade (Namenode node) that will cause the cluster to be unavailable within a short timeframe.

The HA in HDFs is designed to address these problems by providing a "master/standby" two redundant namenodes that chooses a hot spare that runs in the same cluster. This allows for rapid transfer to another namenode during machine downtime or system maintenance.

A typical ha cluster, two separate machines configured for Namenodes, at any time, one namenode is active, the other is in standby, the active Namenode is responsible for handling all client operations in the cluster, and is only used as a slave when the machine is Maintain sufficient state if necessary to provide a quick failover.

To keep the standby node synchronized with the active node state, the current implementation requires two nodes to access a shared storage device (for example, from Nasnfs) to a directory. It will be possible to loosen this limit in a future release. When the active node makes any modifications to the namespace, it writes the modification record to a log file in the shared directory, and the alternate node listens to the directory, which synchronizes the changes to its own namespace when the change is found. When the standby node fails over, it guarantees that the change records in all shared directories have been read, ensuring that the state before the failure remains exactly the same as the active node.

In order to provide a fast failover, it is necessary to ensure that the standby node has the latest block location information in the cluster, in order to achieve this, the Datanode node needs to configure two Namenode locations, while sending the location information and heartbeat information of the block to two Namenode.

At any time only one namenode is active, the operation of the HA cluster is critical, otherwise the state of the two nodes will produce conflicts, data loss or other incorrect results, in order to achieve this goal or so-called "split brain Scene" appears, The administrator must configure at least one (fencing) method for shared storage. In the event of an outage, if the active node is not determined to have been abandoned, the fencing process is responsible for interrupting the shared access of the previous active node edit store. This prevents any further modifications to the namespace, allowing the new active node to fail over safely.

Note: Currently, only manual failover is supported. This means that the HA Namenode cannot automatically detect the failure of the active namenode, but rather by manually initiating the failover. Automatic failure detection and failover will be implemented in future releases. 2. Ha Deployment Hardware Resources

In order to deploy an HA cluster environment, you need to prepare the following resources:

1) NameNode Machine: Machines and non-HA environments that run active nodes and standby nodes need to have a hardware configuration in the same phase.

2) shared storage: A primary and standby Namenode node is required to be a read-write shared directory, typically a remote file manager that supports the use of NFS mounts to each namenode node. Currently only one editable directory is supported. Therefore, the availability of the system will be limited to whether the shared directory is available, so in order to eliminate all single points of failure, you need to add redundancy to the shared directory, specifically, the storage of multiple network paths needs to implement the storage system itself redundancy. For this reason, it is recommended that the shared storage server be a high-quality dedicated NAS device rather than a simple Linux server. 、

Note: In an HA cluster environment, the standby NameNode also performs detection of the state of the namespace, so there is no need to run secondary namenode,checkpointnode,backupnode again. In fact, this will give an error, which also allows the reuse of the previous secondary Namenode hardware resources when reconfiguring from a non-clustered environment to a clustered environment. 3. Introduction to HA Deployment Configuration

Like a federated configuration, the HA configuration is backwards compatible, allowing configuration to be clustered without changing the current single node, and the new configuration scheme ensures that the configuration files for all nodes in the clustered environment are the same and there is no need to configure different files because of the different nodes.

As with the federated configuration, the HA cluster environment reuses the name Service ID to identify a single instance of HDFs, which may actually include multiple ha Namenodes. In addition, a new Namenode is added to ha, each namenode in the cluster has a different ID to identify it, and in order to support all namenode have the same profile, all configuration parameters are suffixed with the named Service ID and Namenodeid. To configure the HA namenodes, you need to add some configuration options to the Hdfs-site.xml configuration file.

The order of configuration options is not important, but dfs.federation.nameservices and dfs.ha.namenodes. The value of [Nameserviceid] determines the key value configured below. Therefore, before you configure other options, you need to determine the values for both options.

1) dfs.federation.nameservices The logical name of a new name service. Select a logical name for the name service, such as "mycluster", using this logical name as the value of this configuration item. This name can be arbitrary, and it will be used to configure and authenticate components for the absolute path of HDFS in the clustered environment.

Note: If you also use HDFs federation, this configuration item should include a list of other name services, ha or otherwise, separated by commas.

<property>

<name>dfs.federation.nameservices</name>

<value>mycluster</value>

</property>

2) Dfs.ha.namenodes. [nameservice ID] The unique identifier for each namenode in the name service configures a comma-delimited list of Namenode IDs, and Datanode uses it to determine all namenode in the cluster. If we use "Mycluster" earlier As the ID of our name service, if you use NN1 and NN2 as the ID of Namenode, you should configure this:

<property>

<name>dfs.ha.namenodes.mycluster</name>

<value>nn1,nn2</value>

</property>

Note: Currently a name service allows only two namenode to be configured.

3) dfs.namenode.rpc-address. [Nameservice ID]. [Name node ID] Each Namenode listener standard RPC address for the previous configured two Namenode ID, set the Namenode node detailed address and port, note that this configuration item will be configured in two separate configuration items, such as:

<property>

<name>dfs.namenode.rpc-address.mycluster.nn1</name>

<value>machine1.example.com:8020</value>

</property>

<property>

<name>dfs.namenode.rpc-address.mycluster.nn2</name>

<value>machine2.example.com:8020</value>

</property>

Note: If you wish, you can configure the same RPC address.

4) dfs.namenode.http-address. [Nameservice ID]. [Namenode ID] Each namenode listens to the standard HTTP address and RPC address, set two Namenode listening HTTP address, such as:

<property>

<name>dfs.namenode.http-address.mycluster.nn1</name>

<value>machine1.example.com:50070</value>

</property>

<property>

<name>dfs.namenode.http-address.mycluster.nn2</name>

<value>machine2.example.com:50070</value>

</property>

Note: If you enable the security features of Hadoop, you can also set the HTTPS address as well.

5) Dfs.namenode.shared.edits.dir The location of the shared storage directory This is a remote shared directory where the backup node needs to keep synchronizing the changes made by the active node at any time, and you can only configure a directory that mounts to two namenode must be read-write and must be an absolute path. Such as:

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>file:///mnt/filer1/dfs/ha-name-dir-shared</value>

</property>

6) Dfs.client.failover.proxy.provider. [Nameserviceid] The Java classes configured by the HDFS client to communicate with the active Namenode program are used to determine to the HDFS client which Namenode node is active and which namenode is currently processing the client's request. The current Hadoop only implementation class is Configuredfailoverproxyprovider, unless you define a class yourself, otherwise you will use this class. Such as:

<property>

<name>dfs.client.failover.proxy.provider.mycluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

7) Dfs.ha.fencing.methods the list of scripts or Java classes that will be used to stop the active Namenode node during failover any time only one namenode is active, the operation of the HA cluster is critical, so during failover, before starting the backup node, we To make sure that the active node is waiting, or that the process is aborted, you must configure at least one method that is forcibly aborted, or a return-delimited list for this purpose, for one attempt to abort until one of the returns succeeds, indicating that the active node is stopped. Hadoop offers two methods: Shell and sshfence, to implement your own approach, see the Org.apache.hadoop.ha.NodeFencer class.

Sshfence the active Namenode node through SSH connection, kill the process. In order to implement the SSH login kill process, you also need to configure the password-free SSH key information as follows:

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/exampleuser/.ssh/id_rsa</value>

</property>

Alternatively, you can configure a user name and SSH port. You can also set a timeout for SSH, in milliseconds, which can be configured like this:

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence ([[username][:p ORT]]) </value>

</property>

<property>

<name>dfs.ha.fencing.ssh.connect-timeout</name>

<value>34</value>

</property>

Shell-Executes any shell command to terminate the active Namenode node, configured as follows:

<property>

<name>dfs.ha.fencing.methods</name>

<value>shell (/path/to/my/script.sharg1arg2 ...) </value>

</property>

The shell script will run in an environment that includes all of the Hadoop configuration parameter variables, with the configuration key only. such as Dfs_namenode_rpc-address. In addition, the following variables are included:

$target _host server that needs to be aborted hostname

$target _port server ports that need to be aborted

$target _address A combination of the above two parameters

$target _nameserviceid Name Service ID that needs to be aborted

$target _namenodeid

ID of the namenode that needs to be aborted

These variables can also be used directly in shell scripts, such as:

<property>

<name>dfs.ha.fencing.methods</name>

<value>shell (/path/to/my/script.sh--nameservice= $target _nameserviceid$target_host: $target _port) </ Value>

</property>

If the script returns 0 for the Abort to succeed, the return value represents a failure, and the other abort methods in the list are attempted.

Note: This method does not implement a timeout setting, and all timeout settings depend on the script itself.

8) Fs.defaultfs The default path prefix when the FS client is not set. In addition, you can configure the URI of the HA cluster as the default path for Hadoop clients. If you use "Mycluster" as the name Service ID, the Core-site.xml file will be configured as:

<property>

<name>fs.defaultFS</name>

<value>hdfs://mycluster</value>

</property>

4. Ha Deployment Details

Once all configurations are complete, you must initially synchronize the metadata on the two hanamenode disks. If you are installing a new HDFs cluster, you need to run the Format command (Hdfsnamenode-format) in one of the Namenode, if you have already formatted Namenode or are transitioning from a non-ha environment to an HA environment, Then you need to use SCP or similar commands to copy the metadata directory on the Namenode to another namenode. By configuring the Dfs.namenode.name.dir and Dfs.namenode.edits.dir two options to the location that contains the Namenode metadata directory. At this time, you must ensure that the previously configured shared directory includes the most recent edit file information in your Namenode metadata directory. Then, you can launch the two ha namenode like Namenode when you start the peace.

You can access two Namenode through the configured HTTP Web address, you will see two Hanamenode current status (Master/standby), and when Namenode is started, its state is "standby".

5. Ha Management Command

Now that you have finished configuring and booting, you can execute other commands to manage your HA cluster. Specifically, you should be familiar with all the commands under all HDFs haadmin. Without any parameters, the following information is displayed:

Usage:dfshaadmin [-ns<nameserviceid>]

[-transitiontoactive<serviceid>]

[-transitiontostandby<serviceid>]

[-failover [--forcefence][--forceactive]<serviceid> <serviceid>]

[-getservicestate<serviceid>]

[-checkhealth<serviceid>]

[-help <command>]

This guide describes the high-level usage of this subcommand, the detailed use Help for each command, and you can run the following command to view:
HDFs haadmin-help<command> transitiontoactive and Transitiontostandby are the conversion of the specified Namenode to the master/standby state.

Note: This command converts the specified namenode to the master/standby state, but this command does not attempt to stop the active node, so use it sparingly and use HDFs haadmin-failover instead.

failover-failover between the two namenode. This command will transfer the fault from the first to the second Namenode, if the first namenode is "ready", the command will successfully convert the second namenode to the "primary" state, and if the first Namenode is "primary", it will attempt to convert it to "standby" State, it will use the Dfs.ha.fencing.methods all methods list previously configured, starting from the first and only until it succeeds, it will convert the second namenode to the "primary" state. If none of the methods can convert the first namenode to a "ready" state, then the second namenode cannot be converted to the "primary" state, which returns an error message.

Getservicestate-gets the state of the specified namenode. Connect to the specified Namenode to determine its current state and print out its state (master or standby). Depending on the state of the Namenode, this command is executed using a cron job or a monitoring script.

checkhealth-Check the health status of the specified namenode. Connecting to the specified Namenode to check its health, Namenode is able to diagnose itself, including checking that the internal service is running as expected. If Namenode is running normally, it returns 0, otherwise it returns a value other than 0. The only use of this command is to monitor it.

Note: The current command has not been implemented and will always return to success unless Namenode completes the outage.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.