Operating principle of HDFs

Source: Internet
Author: User

Brief introduction

HDFS(Hadoop Distributed File System) Hadoop distributed filesystem. is based on a copy of a paper published by Google. The thesis is the GFS (Google file system) Google filesystem (Chinese, English).

HDFs Features:

1, save multiple copies, and provide fault-tolerant mechanism, copy loss or downtime automatic recovery. The default backup is 3 copies.

2, can support running on the cheap machine.

3, suitable for the processing of big data. HDFs divides the file into blocks, by default a block of 64M, stores the chunked data in a key-value pair to HDFs, and maps the key-value pairs into memory.

As shown, HDFs is also based on the structure of master and slave. The roles of Namenode, Secondarynamenode and Datanode.

NameNode: Is the master node, is the manager. Manage data block mappings, handle read and write requests from clients, configure replica policies, and manage HDFs namespaces.

The block is stored on those datanode nodes (this part of the data is not stored on the Namenode disk, it is escalated to Namenode at Datanode startup, and the information is saved in memory after it is received).

The location information of the block is not stored back in the fsimage.

Edits file records the client operation Fsimage Log, the file additions and deletions and so on.

Secondarynamenode: Share the workload of Namenode, Namenode cold backup, merge Fsimage and fsedits and send to Namenode.

Merge the Fsimage and fsedits files, and then send and replace the Namenode fsimage file, leaving a copy of yourself,

This copy can be part of the file recovery after the Namenode outage or necrosis.

1, you can modify the merge interval by configuring Fs.checkpoint.period, the default is 1 hours.

2, can also configure the size of the edits log file, fs.checkpoint.size specify the maximum value edits file, to let Secondarynamenode to know when to do the merge operation, the default size is 64M.

The merge process is as follows:

DataNode: Slave node, slave, working. It is responsible for storing block blocks of data sent by the client and performing read and write operations on the data blocks.

Hot backup : B is a hot backup, if a is broken off. Then b run the job instead of a right away.

Cold backup : B is a cold backup of a, if a is broken off. Then B can't replace a job immediately. But B stores some information about a, reducing the loss of a after a bad fall.

fsimage: Metadata image file (file system directory tree. )

edits: metadata operation log (record of modification operation for file system)

=fsimage+edits is stored in namenode memory.

Secondarynamenode is responsible for the scheduled default of 1 hours, from Namenode, get fsimage and edits to merge, and then send to Namenode. Reduce the workload of Namenode.

HDF s pros and Cons:

Advantages

1. High fault tolerance

Data automatically saves multiple copies

Automatic recovery after a copy is lost

2, suitable for batch processing

Calculation and manipulation of movement

Data location exposed to the computational framework

3. Suitable for big data processing

GB, TB, PB or even greater

Number of documents over millions

10k+ node

4, can be framed on the cheap machine

Improve reliability with replicas

Provides fault tolerance and recovery mechanisms

Disadvantages

1. Low Latency Data access

2, small file access consumption resources (occupy Namenode memory space)

3, concurrent write (one file can only have one writer), file can not be randomly modified (only support append)

Operating principle of HDFs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.