HDFs Federation and HDFs High Availability detailed

Last Update:2015-08-02 Source: Internet

Author: User

Tags failover

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

  HDFS Federation
Namenode saves the reference relationship for each file in the file system and each block of data in memory, which means that for an oversized cluster with a large number of files, memory becomes the bottleneck that limits the scale of the system. The Federation HDFS introduced in the 2.0 release series allows
The system is extended by adding namenode, where each namenode manages a portion of the file system's namespace. In the Federation environment, each Namenode maintains a namespace volume (NameSpace Volume), including the metadata for the namespace and the name of the empty
The data block pool for all data blocks under the file. Namespace volumes are independent of each other and do not communicate, and even one of the Namenode failures does not affect the availability of namespaces maintained by other Namenode. Data block pools are not sliced, so datanode in the cluster
You need to register to each namenode and store chunks of data from multiple data block pools.
　　　　The main reason for adopting federation is simple, federation can solve most single namenode problem quickly. Federation the entire core design was implemented for about 4 months. Most of the changes are in Datanode, config, and tools, and the Namenode itself
The changes are minimal so that Namenode's original robustness will not be affected. This makes the scheme compatible with previous versions of HDFs. For horizontal expansion, Namenode,federation uses multiple independent namenode/namespace. These namenode are combined,
That is, they are independent of each other and do not need to coordinate each other, their own division of labor, management of their own areas. The distributed Datanode is used as a universal block storage device. Each datanode is registered to all Namenode in the cluster and periodically to all Namenode
Sends Heartbeat and block reports, and executes commands from all Namenode. A block pool consists of a block of data belonging to the same namespace, eachDataNodeData blocks for all block pool in the cluster may be stored. Each block pool has internal autonomy, which means that the respective
Each block will not communicate with the other block pool. A namenode hangs off, and does not affect the otherNameNode。 ANameNodeThe namespace and its corresponding block pool are called namespace volume. It is the basic unit of management. When a
ANameNode/nodespace was removed after all itsDataNodeThe corresponding block pool will also be deleted. Each namespace volume is upgraded as a base unit when the cluster is upgraded.

HDFS High Availability
Through jointuse inBacking up Namenode metadata in multiple file systems and creating a monitoring point from an alternate namenode prevents data loss, but still does not achieve high availability of the file system. Namenode still has a single point of failure (SPOF). If Namenode
Fails, all clients including mapreduce jobs cannot read, write, or list files because Namenode is the only place where metadata and file-to-block mappings are stored, in which case the Hadoop system is unable to provide services until a new Namenode is launched.
In such a case, to recover from a failed namenode, the system administrator has to start a new namenode with a copy of the filesystem metadata and configure Datanode and the client to use this new namenode. The new Namenode until satisfied with the
To respond to the service: 1) Import the image of the namespace into memory, 2) redo the edit log, 3) receive enough database reports from Datanode and launch Safe mode. For a large cluster with large numbers of files and blocks, the cold start of Namenode requires
At least 30 minutes or longer. If the system recovery time is too long, it will also affect the daily maintenance. In fact, the probability of namenode failure is very low, so it is particularly important to plan the system failure time in practical applications.
The above issues are supported with high availability (HI availability) in the Hadoop-2.x series release. In this implementation, a pair of active-standby (Active-standby) Namenode is configured. When the active Namenode fails, the standby Namenode will
Take over the task of the failed Namenode and start serving the request from the client without any noticeable interruption. Sharing of edit logs is required between 1:namenode through highly available shared storage. An NFS is required in an earlier, highly available implementation version
(Network File System) filter to aid implementation, but more choices are available in subsequent releases, such as systems built on zookeeper bookkeeper. When the standby namenode takes over, it will read the shared edit log until the end to achieve
Synchronizes with the active Namenode state and continues to read new entries written by the active Namenode, 2:datanode needs to send a chunk of data to two namenode simultaneously, because the data block's mapping information is stored in Namenode memory, not the disk. 3: Guest
The client needs to use a specific mechanism to deal with namenode failures, a mechanism that is transparent to the user.
After the active Namenode fails, the standby Namenode can implement the task takeover quickly (for a few 10 seconds) because the latest state is stored in memory: includes the latest edit log entries and the latest block mapping information. The actual observed failure time will be a little longer (1 points
Clock), because the system needs to be conservative to determine if the activity namenode is really dead.
In the event that the active Namenode fails and the standby Namenode also fails, the administrator can still declare a standby namenode for cold booting. This kind of situation is not worse than the non-high availability (no-ha) situation, and it is an improvement from an operational standpoint because the above
Processing is already a standard process and is embedded in Hadoop.
Failover and evasion: a new entity in a system called a Failover controller (Failover_controller) manages the conversion process that transfers an active namenode to a standby namenode. The failover controller is pluggable, but its initial implementation is based on the
Zookeeper and thus ensures that there is only one active namenode. Each of the Namenode runs a lightweight failover controller (Dfszkfailovercontroller) that monitors the host Namenode for failure (through a simple heartbeat mechanism
Fail-over when the namenode fails. Administrators can also initiate failover manually, such as during routine maintenance. This is called a "smooth failover" (the failover controller organizes two Namenode ordered switches). In the case of nonstationary failover, it is not possible to know exactly
Whether the Namenode has stopped running. For example, if the network is very slow or the network is partitioned, the failover can also be triggered, but the namenode of the previous activity still runs and remains the active namenode. High-availability implementations are further optimized for
The previous activity Namenode does not perform operations that compromise the system and cause the system to crash, which is known as "circumvention" (fencing). The system introduces a series of circumvention mechanisms, including killing the Namenode process and recovering access to the shared storage directory (typically using the vendor-specified NFS
command) to block the appropriate network port by remote administration commands. Recourse to the last resort is the previous activity Namenode can be circumvented by a fairly image of the technology that becomes stonith (shoot the node in the head), which is mainly through a specific power supply unit to the phase
The host to be powered off. Client-side failover is transparently handled through the client class library. The simplest implementation is the control of failover through the client's configuration file. The HDFS URI uses a logical hostname that maps to a pair of Namenode addresses (in the configuration file
Settings), the Client class library accesses each namenode address until the processing is complete.

HDFs Federation and HDFs High availability detailed

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More