HDFS basic concepts)

Source: Internet
Author: User
Basic concepts of HDFS 1. Data blocks)

HDFS (hadoop Distributed File System) uses 64 mb data blocks by default.

Similar to common file systems, HDFS files are divided into 64 mb data block storage.

In HDFS, if a file is smaller than the size of a data block, it does not occupy the entire data block storage space.

2. Metadata node (namenode) and data node (datanode)

Namenode is used to manage the namespace of the file system.

It stores the metadata of all files and folders in a file system tree.

This information will also be saved as the following files on the hard disk: namespace image and edit log)

It also saves the data blocks included in a file and the data nodes distributed on it. However, this information is not stored on the hard disk, but collected from the data node when the system is started.

Datanode is the place where data is actually stored in the file system.

The client or metadata information (namenode) can write or read data blocks to the data node.

It periodically returns the data block information it stores to the metadata node.

Slave metadata node (secondary namenode)

Secondary namenode is not a slave node when a metadata node is faulty. It is responsible for different tasks.

The main function is to periodically merge namespace image and edit log of namenode to prevent excessive log files.

The merged namespace image is also saved from the metadata node to prevent restoration when the namenode fails.

2.1 namenode folder structure
 
$ {DFS. Name.Dir}/Current/Version/Edits/Fsimage/Fstime

 

The version file is a Java properties file that saves the version number of HDFS.

# Fri Dec21 16:45:25CST2012Namespaceid=1555019963Ctime=0Storagetype=Name_nodelayoutversion=-32

Layoutversion is a negative integer that stores the format version number of the HDFS's persistent data structure on the hard disk.

Namespaceid is the unique identifier of the file system. It is generated when the file system is formatted for the first time.

Ctime: 0

Storagetype indicates that the data structure of the metadata node is saved in this folder.

 

Fsimage and edits:

When the file system client performs write operations, it is first recorded in the edit log)

The metadata node stores the metadata information of the file system in the memory. After the modification log is recorded, the metadata node modifies the data structure in the memory.

Before each write operation is successful, the modified logs are synchronized to the file system.

The fsimage file, that is, the namespace image file, is the checkpoint of the metadata in the memory on the hard disk. It is a serialized format and cannot be directly modified on the hard disk.

Similar to the data mechanism, when the metadata node fails, the metadata information of the latest checkpoint is loaded from the fsimage to the memory, and the operations in the edit log are re-executed one by one.

 

2.2. directory structure of secondary namenode
$ {DFS. Name.Dir}/Current/Version/Edits/Fsimage/Fstime/Previous. Checkpoint/Version/Edits/Fsimage/Fstime

Secondary namenode is used to help namenode to checkpoint the metadata information in the memory to the hard disk.

The checkpoint process is as follows:

Secondary namenode notifies namenode to generate a new log file edits. New. Later logs will be written to the new log file.

Secondary namenode uses http get to get the fsimage file and old log file from the metadata node.

Secondary namenode loads the fsimage file into the memory, performs operations in the log file, and generates a new fsimage file.

Secondary namenode: send the new fsimage file back to namenode using HTTP POST

Namenode can replace the old fsimage file and the old log file with the new fsimage file and the new log file (generated in the first step), update the fstime file, and write the time of this checkpoint.

In this way, the fsimage file in namenode stores the latest checkpoint metadata information, and the log file also starts again, so it will not become very large.

 

2.3. directory structure of datanode
$ {DFS. Name.Dir}/Current/Version/BLK _ <id_1>/BLK _ <id_1>. Meta/BLK _ <id_2>/BLK _ <id_2>. Meta/.../BLK _ <id_64>/BLK _ <id_64>. Meta/Subdir0 // subdir1 //.../Subdir63/

BLK _ <ID> stores HDFS data blocks and stores specific binary data.

BLK _ <ID>. Meta stores the attribute information of the data block: version information, type information, and checksum.

Subdirxx: when the number of data blocks in a directory reaches a certain value, a subfolder is created to save the data block and data block attribute information.

The data node version file format is as follows:

Namespaceid =1232737062Storageid= Ds-1640411682-127.0.1.1-50010-1254997319480Ctime=0Storagetype=Data_nodelayoutversion=-18

 

 

From http://www.cnblogs.com/forfuture1978/archive/2010/03/14/1685351.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.