Solution analysis of single point Fault in Hadoop2.0

Last Update:2015-03-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The Hadoop 1.0 kernel consists mainly of two branches: MapReduce and HDFs , it is well known that the design flaws of these two systems are single point of failure , i.e. The Jobtracker of Mr and the namenode of HDFs Two core services have a single point of issue that has not been resolved for a long time, which makes Hadoop suitable for offline storage and offline computing for quite a long period of time.

Thankfully, these issues have been fully addressed in Hadoop 2.0. The Hadoop 2.0 kernel consists of three branches, HDFS, MapReduce , and YARN, and other systems in the Hadoop ecosystem, such as HBase, Hive, pig, etc. have been developed based on these three systems. As of this release, the single point of failure of the three subsystems of Hadoop 2.0 has been resolved or is being addressed (Hadoop HA), and this article will introduce you to the current progress and specific solutions.

Before formally introducing a single point of failure solution, briefly review these three systems (three systems use a simple master/slaves architecture , where Master is a single point of failure).

(1) HDFS: A distributed storage system modeled on Google GFs, consisting of two NameNode and Datanode services, NameNode Is the storage of metadata information (fsimage) and operation log (edits), because it is unique, its availability directly determines the availability of the entire storage system;

(2) yarn : Hadoop The introduction of the Resource management system , introduced in 2.0, makes Hadoop no longer confined to the MapReduce class of computations, but rather supports a diverse computational framework. It consists of two types of services, namely resourcemanager and NodeManager , where ResourceManager as the only component of the entire system, there is a single point of failure problem ;

(3) mapreduce : There are currently two MapReduce implementations, namely MapReduce that can be run independently , It consists of two types of services, namely jobtracker and Tasktraker , where Jobtracker there is a single point of failure problem , the other is mapreduce on YARN , in this implementation, Each job uses a job tracker (Applicationmaster) independently of each other, and there is no single point of failure. The single point of failure mentioned in this article is actually a single point of failure in the first implementation of Jobtracker.

First of all, the current Hadoop single point of failure resolution progress, as of this article (20130813) release,HDFs single point of failure has been resolved, and provides two sets of feasible solutions ; MapReduce single point of Failure (Jobtracker) by CDH4 (CDH4 also packaged MRv1 and MRv2, where a single point of failure refers to a single point of MRv1 problem) resolved, and has been released; Yarn single point of failure has not been resolved, but the scheme has been proposed, Since the solution draws on the implementation of HDFs ha and MapReduce ha, it will soon be resolved .

In general, the single point-of-failure solution architecture for HDFs, MapReduce, and yarn in Hadoop is fully consistent, divided into manual mode and automatic mode , where manual mode refers to the primary and standby switchover by an administrator through a command, which is often useful in service upgrades. Automatic mode reduces operational costs, but is potentially dangerous . The schemas in both of these modes are as follows.

"Manual Mode"

"Auto Mode"

In Hadoop ha, there are several components that are mainly composed of the following:

(1)Masterhadaemon: Running in the same process as the Master service, you can receive external RPC commands to control the start and stop of the master service;

(2) sharedstorage : shared storage System , active master information is written to the shared storage system, and standby master then read to keep the sync with active master, reducing the switching time. Common shared storage systems are zookeeper (used by yarn ha), NFS (used by HDFs ha), HDFs (used by MapReduce ha), and class Bookeeper systems (used by HDFs ha) .

(3)Zkfailovercontroller: switching controller based on zookeeper, consisting mainly of two core components:activestandbyelector and HealthMonitor, where Activestandbyelector is responsible for interacting with the zookeeper cluster by attempting to acquire a global lock to determine whether the managed master enters active or standby state The healthmonitor is responsible for monitoring the status of each activity master to make state transitions based on their status.

(4)zookeeper cluster : Core functions control the entire cluster by maintaining a global lock with only one active master. Of course, if Shardstorge uses zookeeper, some other state and run-time information is also logged.

In particular, it is important to consider the following issues to address the HA issue:

(1) brain fissure (brain-split): Cerebral fissure refers to when the primary and standby switch, due to the switch is not complete or other reasons, resulting in the client and slave mistakenly think that there are two active master, eventually the whole cluster in a chaotic state. To solve the problem of brain fissure, usually using isolation (Fencing) mechanism, including three aspects:

shared storage Fencing: Ensure that only one master writes data to the shared storage.

Client Fencing: Ensure that only one master can respond to client requests.

Slave Fencing: Make sure only one master can send commands to Slave.

Two fenching implementations are available externally in the Hadoop public Library, Sshfence and Shellfence(the default implementation), where sshfence refers to landing on the target master node via SSH, Using the command Fuser to kill the process (using the TCP port number to locate the process PID, which is more accurate than the JPS command), Shellfence is the execution of a user-defined shell command (script) to complete the isolation.

(2) switching transparency : To ensure that the entire switch is transparent, Hadoop should ensure that all clients and slave can be automatically redirected to the new active master, usually through several After attempting to connect to the old master is unsuccessful, and then try to link the new master again, there is a certain delay in the process. In the new version of Hadoop RPC, the user can set parameters such as the RPC client attempt mechanism, the number of attempts, and the attempt time-out.

In order to confirm the above general scheme, with MapReduce ha as an example to illustrate, in CDH4, HA scheme introduction can refer to my article:"CDH jobtracker ha Scheme Introduction", the frame composition is as follows:

The HDFS ha solution in Hadoop 2.0 can be read in the article:"Hadoop 2.0 NameNode Ha and Federation practices", two HA schemes are currently available in HDFS2, one of which isscenarios for shared storage based on NFSAbased on Paxos algorithmThe programmeQuorum Journal Manager (QJM), it is the basic principle is to use 2n+1 journalnode storage editlog, each write data operation has most (&GT;=N+1) return success is believed that write success, data is not lost. The community is now trying tousing Bookeeper as a shared storage system, can be referenced in detail.HDFS-1623The given HDFs ha frame composition is as follows:

The current slowest is the yarn ha solution, which has been documented and is being regulated and developed, with reference to: https://issues.apache.org/jira/browse/yarn-149 , overall, its overall architecture and the MapReduce ha and YARN HA is similar, but . The reason for this lightweight "storage System" (zookeeper) is that the zookeeper is designed not to store, but to provide distributed coordination services. However, it is safe and reliable to store a small amount of data to solve the problem of data sharing between multiple services in a distributed environment ), because most of yarn information can be dynamically reconstructed by NodeManager and applicationmaster heartbeat information. The ResourceManager itself only needs to record a small amount of information to zookeeper.

Overall, thedifficulty of the HA solution depends on how much the master itself records information and the information is reconfigurable, if the recorded information is very large and non-dynamic reconfiguration, such as Namenode, you need a high reliability and performance of a shared storage system, If master saves a lot of information, but the vast majority of them can be dynamically refactored by slave, the HA solution is much easier and typically represents mapreduce and yarn. From another point of view, because the computational framework is not very sensitive to information loss, such as a completed task information loss, just a recalculation can be obtained, making the computational framework ha design is much less difficult than the storage class system.

Hadoop Ha Configuration method:

(1) HDFS ha:Hadoop 2.0 NameNode HA and Federation practices

(2) MapReduce HA:configuring Jobtracker High Availability

Reference Links:

reprinted from Dong's blog : http://dongxicheng.org/mapreduce-nextgen/hadoop-2-0-ha/

Hadoop 2.0 NameNode HA and Federation practices: http://www.sizeofvoid.net/hadoop-2-0-namenode-ha-federation-practice-zh/

Collection of articles for this blog :http://dongxicheng.org/recommend/

Solution analysis of single point Fault in Hadoop2.0

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Solution analysis of single point Fault in Hadoop2.0

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support