The Hadoop 1.0 kernel consists of two branches: MapReduce and HDFs, as we all know, the design flaws of these two systems are single point of failure, that is, Mr Jobtracker and HDFs Namenode two core services have a single point of problem, The problem has not been resolved for a long time, which makes Hadoop suitable for offline storage and offline computing for quite a long time.
Thankfully, these issues have been fully addressed in Hadoop 2.0. The Hadoop 2.0 kernel consists of three branches, HDFs, MapReduce, and yarn, and other systems in the Hadoop ecosystem, such as HBase, Hive, pig, etc., are based on these three systems. As of this release, the single point of failure of the three subsystems of Hadoop 2.0 has been resolved or is being addressed (Hadoop HA), and this article will introduce you to the current progress and specific solutions.
Before formally introducing a single point of failure solution, briefly review these three systems (three systems use a simple master/slaves architecture, where Master is a single point of failure).
(1) HDFS: A distributed storage system modeled on Google GFs, consisting of two services of Namenode and Datanode, where Namenode is the storage of metadata information (fsimage) and operation logs (edits). Since it is unique, its availability directly determines the availability of the entire storage system;
(2)YARN: The introduction of the new resource management system in Hadoop 2.0, which makes Hadoop no longer confined to the MapReduce class of computations, but supports a diverse computational framework. It consists of two types of services, namely ResourceManager and NodeManager, wherein, ResourceManager as the only component of the whole system, there is a single point of failure problem;
(3)MapReduce: There are two types of MapReduce implementations, respectively, which are self-running mapreduce, which consists of two types of services, namely Jobtracker and Tasktraker, which In Jobtracker there is a single point of failure problem, and the other is MapReduce on YARN, in this implementation, each job independently using a job tracker (Applicationmaster), no longer interact with each other, there is no single point of failure problem. The single point of failure mentioned in this article is actually a single point of failure in the first implementation of Jobtracker.
First of all, the current Hadoop single point of failure to resolve the progress, as of this release, HDFs single point of failure has been resolved, and provided two sets of feasible solutions; MapReduce single point of failure (Jobtracker) was CDH4 (CDH4) by MRv1 and MRv2, The single point of failure here refers to a single point of MRv1, and has been released; Yarn single point of failure has not yet been resolved, but the solution has been proposed, as the solution draws on the implementation of HDFs ha and MapReduce ha, as it will soon be resolved.
In general, the single point-of-failure solution architecture for HDFs, MapReduce, and yarn in Hadoop is fully consistent, divided into manual mode and automatic mode, where manual mode is the primary and standby switch by the administrator, which is usually useful when the service is upgraded, and the automatic mode can reduce the operational cost. But there is a potential danger. The schemas in both of these modes are as follows.
"Manual Mode"
"Auto Mode"
In Hadoop ha, there are several components that are mainly composed of the following:
(1)Masterhadaemon: Running in the same process as the Master service, you can receive external RPC commands to control the start and stop of the master service;
(2)sharedstorage: Shared storage System, active master writes information to the shared storage system, while standby master reads the information to maintain synchronization with active master, reducing switching time. Common shared storage systems are zookeeper (used by yarn ha), NFS (used by HDFs ha), HDFs (used by MapReduce ha), and class Bookeeper systems (used by HDFs ha).
(3)Zkfailovercontroller: Switching controller based on zookeeper, consisting mainly of two core components: Activestandbyelector and HealthMonitor, of which Activestandbyelector is responsible for interacting with the zookeeper cluster by attempting to acquire a global lock to determine whether the managed master enters active or standby state; HealthMonitor is responsible for monitoring the status of each activity master To switch state based on their state.
(4)zookeeper cluster : Core functions control the entire cluster by maintaining a global lock with only one active master. Of course, if Shardstorge uses zookeeper, some other state and run-time information is also logged.
In particular, it is important to consider the following issues to address the HA issue:
(1) Brain fissure (brain-split): Cerebral fissure refers to when the primary and standby switch, due to the switch is not complete or other reasons, resulting in the client and slave mistakenly think that there are two active master, eventually the whole cluster in a chaotic state. To solve the problem of brain fissure, usually using isolation (Fencing) mechanism, including three aspects:
- Shared storage Fencing: Ensure that only one master writes data to the shared storage.
- Client Fencing: Ensure that only one master can respond to client requests.
- Slave Fencing: Make sure only one master can send commands to Slave.
Two fenching implementations are available externally in the Hadoop Public Library, Sshfence and shellfence (the default implementation), where sshfence refers to landing on the target master node via SSH, Using the command Fuser to kill the process (using the TCP port number to locate the process PID, which is more accurate than the JPS command), Shellfence is the execution of a user-defined shell command (script) to complete the isolation.
(2) Switching transparency : to ensure that the entire switch is transparent, Hadoop should ensure that all clients and slave can be automatically redirected to the new Active master, which is usually done after several attempts to connect the old master are unsuccessful, Try to link the new master again, and the whole process has a certain delay. In the new version of Hadoop RPC, the user can set parameters such as the RPC client attempt mechanism, the number of attempts, and the attempt time-out.
In order to confirm the above general scheme, with MapReduce ha as an example to illustrate, in CDH4, HA scheme introduction can refer to my article: "CDH jobtracker ha Scheme Introduction", the frame composition is as follows:
The HDFS ha solution in Hadoop 2.0 reads: "Hadoop 2.0 NameNode Ha and Federation practices", two HA scenarios are currently available in HDFS2, one based on the NFS shared storage scenario, A scheme based on the Paxos algorithm quorum Journal Manager (QJM), its basic principle is 2n+1 Journalnode storage editlog, each write data operation has a majority (>=n+1) return success is considered the write success, The data is not lost. The community is currently trying to use Bookeeper as a shared storage system, specifically for reference. The composition of the HDFs ha frame given by HDFS-1623 is as follows:
Currently the slowest progress is the YARN HA solution, which has been documented, is being standardized and developed, specifically for reference: https://issues.apache.org/jira/browse/YARN-149, in general, Its overall architecture is similar to MapReduce ha and yarn ha, but the shared storage system uses zookeeper. The reason for this lightweight "storage System" (zookeeper) is that zookeeper is designed not to store, but to provide distributed coordination services, but it does store small amounts of data securely and reliably to address data sharing among multiple services in a distributed environment. It is because most of yarn information can be dynamically reconstructed by NodeManager and applicationmaster heartbeat information, and ResourceManager itself only need to record a small amount of information to zookeeper.
Overall, the difficulty of HA resolution depends on how much the master itself records the information and the information is reconfigurable, if the recorded information is very large and non-dynamic reconfiguration, such as Namenode, you need a high reliability and performance of a shared storage system, and if master save a lot of information, But the vast majority can be slave dynamic reconfiguration, the HA solution is much easier, the typical representative is mapreduce and yarn. From another point of view, because the computational framework is not very sensitive to information loss, such as a completed task information loss, just a recalculation can be obtained, making the computational framework ha design is much less difficult than the storage class system.
Hadoop Ha Configuration method:
(1) HDFS Ha:hadoop 2.0 NameNode HA and Federation practices
(2) MapReduce ha:configuring Jobtracker High Availabili
Hadoop2.0 single point of failure solution Summary---Lao Dong