How the pseudo-distribution of HDFS works

Last Update:2015-02-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

" Introduction "

1. HDFs Architecture

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/59/DD/wKioL1TtuwuDalMSAAIPGNbDq6A470.jpg "title=" Picture 1.jpg "alt=" Wkiol1ttuwudalmsaaipgnbdq6a470.jpg "/>

HDFS pseudo-distributed architecture only need to have three parts, Namenode is the eldest brother, Datanode is the junior, secondary Namenode is assistant.

Client clients communicate with Namenode (RPC communication mechanism, described later), secondary Namenode is responsible for synchronizing data.

2. Storage details of Meta data

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/59/E0/wKiom1Ttu2rzlAN7AAHxl82Q9hk790.jpg "title=" Image 2.png "alt=" Wkiom1ttu2rzlan7aahxl82q9hk790.jpg "/>

The metadata of the Namenode is stored in memory.

Data interpretation: There is a file /test/a.log, saved 3 copies, altogether was cut into two pieces, the first pieces were stored in those places, the second piece was stored in those places.

The client needs to download the file, the first query Namenode metadata, know that the file is divided into which blocks, first go to H0 machine download blk_1, and then go H0 download blk_2, if H0 of blk_2 damaged (how to judge the damage? Checksum mechanism), Then according to the router's nearest principle to H2 download blk_2, and so on, to download this file.

"How the Namenode works "

Namenode is the management node for the entire file system. It maintains a file directory tree for the entire file system, meta-information for the file/directory, and a list of data blocks for each file (element basis). Receives the user's action request.

Namenode files include three of these files, which are stored in the Linux file system. ：

(1) Fsimage: Metadata image file. Store a period Namenode memory metadata information, there is secondary namenode is responsible for synchronization, a period of time, the description can not be synchronized.

(2) Edits: operation log file.

(3) Fstime: Save the last checkpoint time, restore point.

1, the principle of namenode

Namenode always saves metedata in memory for processing "read Request"

(1) When a "write request" arrives, Namenode will first write the Editlog to disk, that is, to write the log to the edits file, after the successful return, the memory will be modified and returned to the client

(2) Hadoop maintains a fsimage file, which is the metedata image in Namenode, but Fsimage does not always match Namenode in metedata memory, but every once in a while secondary Namenode Fsimage merges the edits file to update the content.

2, the principle of secondary namenode

The secondary namenode is a solution for ha (high-reliability lines). Hot standby is not supported (real-time synchronization). Configuration.

Execution process: Download metadata information (fsimage,edits) from Namenode, then merge the two, generate a new fsimage, save it locally, and push it to Namenode, replacing the old fsimage.

The default is installed on the Namenode node, but this ... Not Safe!

3, secondary Namenode's work flow

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/59/E0/wKiom1Ttwl6hpLUdAANMQVRJzFw915.jpg "title=" Image 3.png "alt=" Wkiom1ttwl6hpludaanmqvrjzfw915.jpg "/>

(1) Secondary notification namenode switch edits file, generate edits.new

(2) Namenode copy edits and fsimage files, pass to secondary from Namenode (via HTTP)

(3) Secondary loads the Fsimage into memory and then starts merging edits, generating fsimage.ckpt

(4) Secondary send fsimage.ckpt to Namenode via HTTP Post

(5) Namenode replace Fsimage with FSIMAGE.CKPT

(6) Namenode replace Eidts with Edits.new

(7) Wait for the next synchronization (checkpoint)

When to checkpoint? In both cases, the checkpoint is performed:

(1) fs.checkpoint.period Specifies the maximum time interval of two checkpoint, the default is 3,600 seconds. That is, checkpoint every 3,600 seconds.

(2) fs.checkpoint.size specifies the maximum value of the edits file, and if this value is exceeded, the checkpoint is enforced, regardless of whether the maximum time interval is reached. The default size is 64M.

"How the Datenode works "

(1) A storage service that provides real-world file data.

(2) File Block: The most basic unit of storage. For the file content, the length of a file is size, then starting from the 0 offset of the file, according to the fixed size, the order of the file is divided and numbered, divided each block is called a block. HDFS default block size is 128MB, in a 256MB file, a total of 256/128=2 block.

(3) Unlike the ordinary file system, HDFs, if a file is smaller than the size of a block of data, does not occupy the entire block of storage space

(4) Replication. Multiple replicas. The default is three.

"Summary "

Although pseudo-distribution is no longer used, these concepts and ideas are very important.

This article is from the "Mo" blog, please be sure to keep this source http://flycc258.blog.51cto.com/8624126/1615325

How the pseudo-distribution of HDFS works

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How the pseudo-distribution of HDFS works

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How the pseudo-distribution of HDFS works

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support