A detailed description of the HDFs principle in HADOOP1

Last Update:2015-04-06 Source: Internet

Author: User

Tags ack

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HDFs is the short name for the Hadoop distribute file system and a distributed four file system for Hadoop.

First, the main design concept of HDFs

1. Store large files

The "oversized file" here refers to files that are hundreds of MB, GB, or even terabytes in size.

2. The most efficient access mode is one-write, multiple-read (streaming data access)

The data set that HDFs stores is used as the analysis object for Hadoop, and after the dataset has been generated, it is long time to perform various analyses on this dataset. Each analysis will design most of the data and even all of the data, so the time delay to read the entire dataset is more important than the time delay of reading the first record.

3, run on the ordinary cheap server

One of the design concepts of HDFS is that it can run on ordinary hardware, even if the hardware fails, it can also be used to guarantee the high availability of data through fault-tolerant policies.

Second, the taboo of HDFs

1. Use HDFs for scenarios where data access requirements are low latency

2. Store a large number of small files

The metadata in HDFs (the basic information of a file) is stored in Namenode memory, and then a single point, the number of small files is large to a certain extent, then namenode memory is unbearable.

Iii. Basic concept of HDFs

Block: Large files are partitioned into multiple blocks for storage, and the block size defaults to 64MB. Each block stores more than one copy on multiple datanode, with a default of 3 copies.

Namenode:namenode is responsible for managing the file directory, the correspondence between the file and block, and the corresponding relationship between block and Datanode.

Datanode:datanode is responsible for the storage, of course, most of the fault-tolerant mechanisms are implemented on the Datanode.

Iv. Basic architecture of HDFs

Client: cut files, access or manage HDFs from the command line, interact with Namenode, get file location information. Interacts with Datanode to read and write data.

NameNode: Master node, only one, manages HDFs namespace and data mapping information, configures replica celve, and handles client requests.

DataNode: Slave node, store actual data, execute data block read/write, report storage information to Namonode

Secondary NameNode: Auxiliary NameNode, share their workload, regularly merge fsimage and fsedits, push to NameNode, in case of emergency, can assist reply NameNode, but secondary Namenode is not a hot preparation for namenode.

Fsimage and Fsedits

Two very important documents in the NameNode,

Fsimage is the metadata image file (the directory tree where the file system is saved).

Fsedits is the metadata operations log (records all HDFS operations between each save Fsimage and the next save).

Up-to-date metadata information is kept in memory (Fsimage and Fsedits)

Fsiedits The General Assembly causes Namenode to restart slowly, secondary namenode is responsible for periodically merging them.

Merge Flowchart:

Mapping relationships of data blocks

1, including two kinds: file and Data block mapping relationship, Datanode and data block mapping relationship.

2, Namenode start, you can reconstruct the mapping information through the heartbeat information, datanode the current block information during the running process, the mapping relationship is saved in Namenode memory.

3, Namenode restart slow, (because you need to load the Fsimage and fsedits files to generate the latest directory tree and Datanode block information)

Data blocks (block)

1, in HDFs, the file is cut into a fixed-size data block, the default size is 64MB, you can also configure themselves.

2, why the data block so large, because the time of data transmission more than the time to find (high throughput rate).

3, the file storage method, by the size is cut into several blocks, stored on different nodes, by default, each block has three copies.

Default copy storage Policy for HDFs
After Hadoop 0.17: Replica 1-On the same client node; Replica 2-On a node in a different rack; replica 3-another node in the same rack as the second replica; other copies: Randomly selected. As an example:

HDFs Reliability Mechanism

Common error Conditions: file corruption, network or machine failure; namenode hang up;
File integrity: Through CRC32 Check, if there is damage, replace the damaged file with other copy;
Heartbeat:datanode send eartbeat to Namenode regularly;
Meta data information: fsimage, Editlog for multiple backups, when the Namenode down, you can manually restore.

HDFs Physical network environment

HDFD Reading and writing process

HDFs file reads:

1. The first call to the FileSystem object's Open method is actually an example of a distributedfilesystem
2.DistributedFileSystem the locations of the first block of the file is obtained by RPC, and the same block returns multiple locations according to the number of repetitions, these locations are sorted according to the Hadoop topology. Close to the front of the client.
3. The first two steps return a Fsdatainputstream object, which is encapsulated as a Dfsinputstream object, Dfsinputstream can easily manage datanode and namenode traffic. The client calls the Read method, and Dfsinputstream most likely finds the closest datanode to the client and connects.
4. Data flows from Datanode to the client stream.
5. If the first piece of data is read, the Datanode connection to the first block is closed and the next block is read. These operations are transparent to the client, and the client's point of view is simply to read a continuous stream.
6. If the first block is read, Dfsinputstream will go to namenode take a batch of blocks location, and then continue to read, if all the blocks are read, then it will close all the flow.

exception handling for HDFs reads
If the communication between Dfsinputstream and Datanode is abnormal at the time of reading the data, the second near Datanode of the block that is being read is attempted, and the Datanode error is recorded. The remaining blocks will skip the Datanode as soon as they are read. Dfsinputstream also checks the block data checksum, if a bad block is found, it is reported to the Namenode node, and then Dfsinputstream read the block's image on the other Datanode
Thinking on the design of HDFs reading operation
The client connects directly to the Datanode to retrieve the data and Namenode is responsible for providing the optimal datanode,namenode for each block to handle only the block location request, which is loaded in Namenode's memory. HDFs can withstand concurrent access by a large number of clients through the Datanode cluster.

HDFs File Write

1. The client creates a new file by calling the Create method of Distributedfilesystem
2.DistributedFileSystem through RPC call Namenode to create a new file without blocks association, before the creation, Namenode will do a variety of checks, such as whether the file exists, the client has no permissions to create and so on. If the checksum is passed, the Namenode will record the new file or an IO exception will be thrown.
3. The Fsdataoutputstream object is returned after the first two steps, similar to when the file was read, Fsdataoutputstream encapsulated as Dfsoutputstream, Dfsoutputstream can coordinate Namenode and datanode. The client begins to write data to Dfsoutputstream,dfsoutputstream to cut the data into small packet (packets), and then queue to data quene.
4.DataStreamer will be processed to accept data quene, he first inquiry Namenode this new block is most suitable for storage in which several datanode, such as the number of replicas is 3, then found 3 the most suitable datanode, Line them up into a pipeline (pipe). Datastreamer packet the output to the first datanode of the pipeline, the first Datanode outputs the packet to the second Datanode, and so on.
5.DFSOutputStream There is also a pair of columns called Ack Quene, also has a packet composition, waiting for Datanode to receive a response, when all Datanode in pipeline indicates that it has been received, then AKC Quene will remove the corresponding packet bag.
6. Call the Close method to close the write stream after the client finishes writing the data
7.DataStreamer Brush the remaining bags into the pipeline and wait for the ACK message, after receiving the last Ack, notify Datanode to mark the file as completed.
HDFs file Write failed
If a datanode error occurs during the writing process, the following steps are taken:
1.pipeline is turned off
2. In order to prevent packet loss ACK Quene packet will be synchronized to the data quene
3. Delete The block that is currently written but not completed on the Datanode that produced the error
The remainder of the 4.block is written to the remaining two normal Datanode
5.namenode find another datanode to create the complex of this block
These operations are not perceptible to the client.
(After the client performs a write operation, the block is visible, the block being written is invisible to the client, and only the sync method is called, the client ensures that the file is fully written, and the sync method is called by default when the client calls the Close method.) Whether you need to call manually depends on the tradeoff between your data robustness and throughput rates depending on your program needs. )

A detailed description of the HDFs principle in HADOOP1

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More