Principle analysis-—— ha mechanism Avatarnode by HDFs principle

Last Update:2015-03-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the problem description

Since Namenode is the brain of HDFs, and the brain is a single point, if the brain fails, the entire distributed storage system is paralyzed. The HA (high Available) mechanism is used to solve such a problem. Encounter such a problem, the first instinct is to think of redundant backup, backup way There are many kinds of predecessors have designed a meta-data backup solution, secondary Namenode and Avatarnode and other programs. The most advantageous of these schemes is the ability to allow HDFs to complete the failover scenario in the shortest time possible. That's the avatarnode we're going to talk about today.

II. Basic Structure

Primary: Responsible for normal business namenode, that is, to provide the client with metadata query and operation.

Standby: Hot Spare namenode, fully backing up primary metadata, and checkpoint for primary (a metadata persistence mechanism, which is described later).

NFS: Network file server, primary logs a copy of the log to the server in real time to ensure that the integrity of the metadata is backed up when the primary fails.

Thirdly, the mechanism of data persistence--checkpoint

Primary manages all meta-data, usually stored in memory, so that access to metadata is efficient. But there is a hidden danger, that is if the primary node down, or power down, then all the metadata will be gone. If we can save a copy of the metadata in memory and also save a copy on the hard disk, the data can be recovered even if the power is lost.

The checkpoint mechanism is a mechanism for storing metadata in real time on a hard disk.

First, we introduce several key concepts:

Edits: Log file that records the action that caused the metadata to change.

Fsimage: A mirrored file of metadata that can be understood as a copy of the metadata saved on disk.

Problem 1:fsimage represents a moment of metadata mirroring, metadata is constantly changing, so how is this image updated in real time?

Question 2: How can I generate fsimage in the case of primary namenode normal external service?

The checkpoint steps are as follows:

First step: Secondary Namenode request Namenode stop using edits, temporarily recorded in the Edits.new file

Step Two: Secondary namenode copy fsimage from Namenode, edits to local

Step three: Secondary namenode merge fsimage, edits for fsimage.ckpt

Fourth step: Secondary namenode send fsimage.ckpt to Namenode

Fifth step: Namenode with the new fsimage cover the old Fsimage, with the new edits cover the old edits

Sixth step: Update checkpoint time

To here Fsimage update complete, that is guaranteed primary normal service, also completed the Fsimage update

Iv. Avatarnode meta-data consistency

Checkpoint only guarantees the persistence of the metadata, but if primary fails, it still takes a lot of time to load the fsimage after the repair, how to make standby in memory and primary to keep the metadata synchronized. is a highly available hdfs problem that needs to be addressed.

Namenode's meta data actually consists of two parts:

The first part: the directory tree, which manages all the file information stored in HDFs.

Part Two: The correspondence between block data and Datanode

As long as the data of the above two parts can be guaranteed to be consistent, then the metadata consistency problem is solved.

The first part: primary the log in real-time to NFS, and standby can read the Log on NFS in real-time, through the log replay, can solve the problem of consistent directory tree information.

The second part: fast data and Datanode correspondence, is all Datanode want to Namenode report summary, then let all datanode to two Namenode report, can solve block data and datanode correspondence relationship consistency problem.

Problem: The newly introduced NFS brings new single points of issue. According to Facebook engineers, this single point of failure rate is very low, they encounter once in four years.

Here Avatarnode principle is basically finished, but there are still some problems in practical application:

1. How does HDFs quickly detect primary failure?

2, Standby is how to quickly switch from the standby to primary?

Transfer from http://my.oschina.net/shiw019/blog/93481

Principle analysis-—— ha mechanism Avatarnode by HDFs principle

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Principle analysis-—— ha mechanism Avatarnode by HDFs principle

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Principle analysis-—— ha mechanism Avatarnode by HDFs principle

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support