How the pseudo-distribution of HDFS works

Source: Internet
Author: User

" Introduction "

1. HDFs Architecture

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/59/DD/wKioL1TtuwuDalMSAAIPGNbDq6A470.jpg "title=" Picture 1.jpg "alt=" Wkiol1ttuwudalmsaaipgnbdq6a470.jpg "/>

HDFS pseudo-distributed architecture only need to have three parts, Namenode is the eldest brother, Datanode is the junior, secondary Namenode is assistant.

Client clients communicate with Namenode (RPC communication mechanism, described later), secondary Namenode is responsible for synchronizing data.


2. Storage details of Meta data

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/59/E0/wKiom1Ttu2rzlAN7AAHxl82Q9hk790.jpg "title=" Image 2.png "alt=" Wkiom1ttu2rzlan7aahxl82q9hk790.jpg "/>

The metadata of the Namenode is stored in memory.

Data interpretation: There is a file /test/a.log, saved 3 copies, altogether was cut into two pieces, the first pieces were stored in those places, the second piece was stored in those places.

The client needs to download the file, the first query Namenode metadata, know that the file is divided into which blocks, first go to H0 machine download blk_1, and then go H0 download blk_2, if H0 of blk_2 damaged (how to judge the damage? Checksum mechanism), Then according to the router's nearest principle to H2 download blk_2, and so on, to download this file.


"How the Namenode works "

Namenode is the management node for the entire file system. It maintains a file directory tree for the entire file system, meta-information for the file/directory, and a list of data blocks for each file (element basis). Receives the user's action request.

Namenode files include three of these files, which are stored in the Linux file system. :

(1) Fsimage: Metadata image file. Store a period Namenode memory metadata information, there is secondary namenode is responsible for synchronization, a period of time, the description can not be synchronized.

(2) Edits: operation log file.

(3) Fstime: Save the last checkpoint time, restore point.


1, the principle of namenode

Namenode always saves metedata in memory for processing "read Request"

(1) When a "write request" arrives, Namenode will first write the Editlog to disk, that is, to write the log to the edits file, after the successful return, the memory will be modified and returned to the client

(2) Hadoop maintains a fsimage file, which is the metedata image in Namenode, but Fsimage does not always match Namenode in metedata memory, but every once in a while secondary Namenode Fsimage merges the edits file to update the content.


2, the principle of secondary namenode

The secondary namenode is a solution for ha (high-reliability lines). Hot standby is not supported (real-time synchronization). Configuration.

Execution process: Download metadata information (fsimage,edits) from Namenode, then merge the two, generate a new fsimage, save it locally, and push it to Namenode, replacing the old fsimage.

The default is installed on the Namenode node, but this ... Not Safe!


3, secondary Namenode's work flow

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/59/E0/wKiom1Ttwl6hpLUdAANMQVRJzFw915.jpg "title=" Image 3.png "alt=" Wkiom1ttwl6hpludaanmqvrjzfw915.jpg "/>

(1) Secondary notification namenode switch edits file, generate edits.new

(2) Namenode copy edits and fsimage files, pass to secondary from Namenode (via HTTP)

(3) Secondary loads the Fsimage into memory and then starts merging edits, generating fsimage.ckpt

(4) Secondary send fsimage.ckpt to Namenode via HTTP Post

(5) Namenode replace Fsimage with FSIMAGE.CKPT

(6) Namenode replace Eidts with Edits.new

(7) Wait for the next synchronization (checkpoint)


When to checkpoint? In both cases, the checkpoint is performed:

(1) fs.checkpoint.period Specifies the maximum time interval of two checkpoint, the default is 3,600 seconds. That is, checkpoint every 3,600 seconds.

(2) fs.checkpoint.size specifies the maximum value of the edits file, and if this value is exceeded, the checkpoint is enforced, regardless of whether the maximum time interval is reached. The default size is 64M.


"How the Datenode works "

(1) A storage service that provides real-world file data.

(2) File Block: The most basic unit of storage. For the file content, the length of a file is size, then starting from the 0 offset of the file, according to the fixed size, the order of the file is divided and numbered, divided each block is called a block. HDFS default block size is 128MB, in a 256MB file, a total of 256/128=2 block.

(3) Unlike the ordinary file system, HDFs, if a file is smaller than the size of a block of data, does not occupy the entire block of storage space

(4) Replication. Multiple replicas. The default is three.

"Summary "

Although pseudo-distribution is no longer used, these concepts and ideas are very important.

This article is from the "Mo" blog, please be sure to keep this source http://flycc258.blog.51cto.com/8624126/1615325

How the pseudo-distribution of HDFS works

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.