Operating principle of HDFs

Last Update:2016-10-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Brief introduction

HDFS(Hadoop Distributed File System) Hadoop distributed filesystem. is based on a copy of a paper published by Google. The thesis is the GFS (Google file system) Google filesystem (Chinese, English).

HDFs Features:

1, save multiple copies, and provide fault-tolerant mechanism, copy loss or downtime automatic recovery. The default backup is 3 copies.

2, can support running on the cheap machine.

3, suitable for the processing of big data. HDFs divides the file into blocks, by default a block of 64M, stores the chunked data in a key-value pair to HDFs, and maps the key-value pairs into memory.

As shown, HDFs is also based on the structure of master and slave. The roles of Namenode, Secondarynamenode and Datanode.

NameNode: Is the master node, is the manager. Manage data block mappings, handle read and write requests from clients, configure replica policies, and manage HDFs namespaces.

The block is stored on those datanode nodes (this part of the data is not stored on the Namenode disk, it is escalated to Namenode at Datanode startup, and the information is saved in memory after it is received).

The location information of the block is not stored back in the fsimage.

Edits file records the client operation Fsimage Log, the file additions and deletions and so on.

Secondarynamenode: Share the workload of Namenode, Namenode cold backup, merge Fsimage and fsedits and send to Namenode.

Merge the Fsimage and fsedits files, and then send and replace the Namenode fsimage file, leaving a copy of yourself,

This copy can be part of the file recovery after the Namenode outage or necrosis.

1, you can modify the merge interval by configuring Fs.checkpoint.period, the default is 1 hours.

2, can also configure the size of the edits log file, fs.checkpoint.size specify the maximum value edits file, to let Secondarynamenode to know when to do the merge operation, the default size is 64M.

The merge process is as follows:

DataNode: Slave node, slave, working. It is responsible for storing block blocks of data sent by the client and performing read and write operations on the data blocks.

Hot backup : B is a hot backup, if a is broken off. Then b run the job instead of a right away.

Cold backup : B is a cold backup of a, if a is broken off. Then B can't replace a job immediately. But B stores some information about a, reducing the loss of a after a bad fall.

fsimage: Metadata image file (file system directory tree. ）

edits: metadata operation log (record of modification operation for file system)

=fsimage+edits is stored in namenode memory.

Secondarynamenode is responsible for the scheduled default of 1 hours, from Namenode, get fsimage and edits to merge, and then send to Namenode. Reduce the workload of Namenode.

HDF s pros and Cons:

Advantages

1. High fault tolerance

Data automatically saves multiple copies

Automatic recovery after a copy is lost

2, suitable for batch processing

Calculation and manipulation of movement

Data location exposed to the computational framework

3. Suitable for big data processing

GB, TB, PB or even greater

Number of documents over millions

10k+ node

4, can be framed on the cheap machine

Improve reliability with replicas

Provides fault tolerance and recovery mechanisms

Disadvantages

1. Low Latency Data access

2, small file access consumption resources (occupy Namenode memory space)

3, concurrent write (one file can only have one writer), file can not be randomly modified (only support append)

Operating principle of HDFs

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Operating principle of HDFs

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Operating principle of HDFs

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support