Principle analysis-—— ha mechanism Avatarnode by HDFs principle

Source: Internet
Author: User

First, the problem description

Since Namenode is the brain of HDFs, and the brain is a single point, if the brain fails, the entire distributed storage system is paralyzed. The HA (high Available) mechanism is used to solve such a problem. Encounter such a problem, the first instinct is to think of redundant backup, backup way There are many kinds of predecessors have designed a meta-data backup solution, secondary Namenode and Avatarnode and other programs. The most advantageous of these schemes is the ability to allow HDFs to complete the failover scenario in the shortest time possible. That's the avatarnode we're going to talk about today.

II. Basic Structure

Primary: Responsible for normal business namenode, that is, to provide the client with metadata query and operation.

Standby: Hot Spare namenode, fully backing up primary metadata, and checkpoint for primary (a metadata persistence mechanism, which is described later).

NFS: Network file server, primary logs a copy of the log to the server in real time to ensure that the integrity of the metadata is backed up when the primary fails.

Thirdly, the mechanism of data persistence--checkpoint

Primary manages all meta-data, usually stored in memory, so that access to metadata is efficient. But there is a hidden danger, that is if the primary node down, or power down, then all the metadata will be gone. If we can save a copy of the metadata in memory and also save a copy on the hard disk, the data can be recovered even if the power is lost.

The checkpoint mechanism is a mechanism for storing metadata in real time on a hard disk.

First, we introduce several key concepts:

Edits: Log file that records the action that caused the metadata to change.

Fsimage: A mirrored file of metadata that can be understood as a copy of the metadata saved on disk.

Problem 1:fsimage represents a moment of metadata mirroring, metadata is constantly changing, so how is this image updated in real time?

Question 2: How can I generate fsimage in the case of primary namenode normal external service?

The checkpoint steps are as follows:

First step: Secondary Namenode request Namenode stop using edits, temporarily recorded in the Edits.new file

Step Two: Secondary namenode copy fsimage from Namenode, edits to local

Step three: Secondary namenode merge fsimage, edits for fsimage.ckpt

Fourth step: Secondary namenode send fsimage.ckpt to Namenode

Fifth step: Namenode with the new fsimage cover the old Fsimage, with the new edits cover the old edits

Sixth step: Update checkpoint time

To here Fsimage update complete, that is guaranteed primary normal service, also completed the Fsimage update

Iv. Avatarnode meta-data consistency

Checkpoint only guarantees the persistence of the metadata, but if primary fails, it still takes a lot of time to load the fsimage after the repair, how to make standby in memory and primary to keep the metadata synchronized. is a highly available hdfs problem that needs to be addressed.

Namenode's meta data actually consists of two parts:

The first part: the directory tree, which manages all the file information stored in HDFs.

Part Two: The correspondence between block data and Datanode

As long as the data of the above two parts can be guaranteed to be consistent, then the metadata consistency problem is solved.

The first part: primary the log in real-time to NFS, and standby can read the Log on NFS in real-time, through the log replay, can solve the problem of consistent directory tree information.

The second part: fast data and Datanode correspondence, is all Datanode want to Namenode report summary, then let all datanode to two Namenode report, can solve block data and datanode correspondence relationship consistency problem.

Problem: The newly introduced NFS brings new single points of issue. According to Facebook engineers, this single point of failure rate is very low, they encounter once in four years.

Here Avatarnode principle is basically finished, but there are still some problems in practical application:

1. How does HDFs quickly detect primary failure?

2, Standby is how to quickly switch from the standby to primary?

Transfer from http://my.oschina.net/shiw019/blog/93481

Principle analysis-—— ha mechanism Avatarnode by HDFs principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.