HDFS Meta Data management mechanism

Source: Internet
Author: User

1. Meta Data Management Overview
HDFs metadata, grouped by type, consists mainly of the following sections:
1, the file, the directory of its own property information, such as file name, directory name, modify information and so on.
2. Storing information about the information stored in the file, such as block information, block case, number of copies, etc.
3, records the Datanode of HDFs information, for Datanode management.
In the form of memory metadata and metadata files two, respectively, there are memory and disk.
The metadata file on the HDFs disk is divided into two categories for persisting storage:
Fsimage image file: is a persisted checkpoint of metadata that contains all the directory and file metadata information in the Hadoop file system, but does not contain information about the location of the file block. The File block location information is stored in memory only, when the Datanode joins the cluster, Namenode asks Datanode to get it, and updates it intermittently.
Edits edit log: a log of all change operations (file creation, deletion, or modification) of the Hadoop file system is stored, and the change operations performed by the file system client are first recorded in the edits file.
Both the fsimage and edits files are serialized, and when the Namenode is started, it loads the contents of the Fsimage file into memory before performing the operations in the edits file, allowing the in-memory metadata to be synchronized with the actual The presence of in-memory metadata supports read operations by the client and is also the most complete meta-data.
When the client adds or modifies a file in HDFs, the action record is first credited to the edits log file, and when the client operation succeeds, the corresponding metadata is updated to the memory metadata. Because fsimage files are generally large (GB-level common), if all update operations are added to the Fsimage file, this can cause the system to run very slowly.
HDFs This design implementation begins with: One is in-memory data update, query Fast, greatly shorten the operation response time, the second is the memory of the risk of metadata loss is quite high (power outage, etc.), so the backup mechanism of the metadata image file (fsimage) + edit log file (edits) to ensure the security of metadata.
Namenode maintains the entire file system metadata. Therefore, the accurate management of metadata affects the ability of HDFS to provide file storage services.
2. Metadata Catalog-related files
After Hadoop's HDFs first deploys the configuration file, it is not ready to be used immediately, but rather to format the file system first. You need to do the following on the Namenode (NN) node:
$HADOOP _home/bin/hdfs Namenode–format
There are two concepts to note here, one is the file system, at this time the file system is not physically present, and the other is that the formatting here does not refer to the traditional local disk formatting, but some cleanup and preparation work.
After the format is complete, the following file structure will be created under the $dfs.namenode.name.dir/current directory, which is also the file directory associated with the Namenode metadata:

The DFS.NAMENODE.NAME.DIR is configured in the Hdfs-site.xml file, and the default values are as follows:

The Dfs.namenode.name.dir property can be configured with multiple directories, and the file structure and content stored in each directory are exactly the same as backups, and the benefit is that when one of the directories is corrupted, it does not affect the metadata of Hadoop, especially if one of the Directories is NFS (Network File system File System,nfs, the metadata is saved even if your machine is damaged.
The following is an explanation of the files in the $dfs.namenode.name.dir/current/directory.

VERSION
namespaceid=934548976
clusterid=cid-cdff7d73-93cd-4783-9399-0a22e6dce196
Ctime=0
Storagetype=name_node
blockpoolid=bp-893790215-192.168.24.72-1383809616115
layoutversion=-47
Namespaceid/clusterid/blockpoolid These are unique identifiers for the HDFs cluster. Identifiers are used to prevent Datanodes from accidentally registering on Namenode in another cluster. These identities are particularly important in Federated (Federation) deployments. In federated mode, multiple Namenode will work independently. Each namenode provides a unique namespace (Namespaceid) and manages a unique set of File Block pools (blockpoolid). Clusterid the entire cluster together as a single logical unit, all nodes in the cluster are the same.
Storagetype describes what process data structure information is stored in this file (if it is datanode,storagetype=data_node);
CTime Namenode Storage System creation time, the first format file system This property is 0, when the file system upgrade, the value will be updated to the time stamp after the upgrade;
Layoutversion represents the version information of the HDFS persistent data structure, which is a negative integer.
Additional notes:
When you format a cluster, you can specify the cluster_id of the cluster, but you cannot conflict with other clusters in your environment. If cluster_id is not provided, a unique Clusterid is automatically generated.
$HADOOP _home/bin/hdfs Namenode-format-clusterid <cluster_id>
Seen_txid
$dfs. Namenode.name.dir/current/seenTxid is very important, is the file that holds the TransactionID, after format is 0, it represents namenode inside the edits* The mantissa of the file, Namenode Restart, will follow the SEEN_TXID number, sequentially from the beginning edits_0000001~ to Seen_txid number. So when you have an abnormal restart of HDFs, it must be compared to the number within the SEEN_TXID is not your edits final mantissa.
Fsimage & Edits
The fsimage and edits files, and their corresponding MD5 checksum files, are also generated $dfs the. namenode.name.dir/current directory in format.
3. Secondary Namenode

Namenode responsibility is to manage the metadata information, Datanode responsibility is responsible for the data storage, then what is the role of Secondarynamenode? It's very confusing for many beginners. Why it appears in HDFs. From its name, it feels like a namenode backup. But it's actually not.
As you can imagine, when the HDFs cluster runs an event, some of the following problems occur:
L Edit logs file will become very large, how to manage this file is a challenge.
L Namenode Restart will take a long time, because there are many changes to be merged into the Fsimage file.
L If Namenode hangs up, then some changes are lost. Because the Fsimage file is very old at this time.
So to overcome this problem, we need an easy-to-manage mechanism to help us reduce the size of the edit logs file and get an up-to-date Fsimage file, which will also reduce the pressure on the namenode. This is very much like the recovery point of Windows, where the Windows Recovery point mechanism allows us to take snapshots of the OS so that we can roll back to the latest recovery point when there is a problem with the system.
Secondarynamenode is here to help solve the problem, and its responsibility is to merge Namenode's edit logs into the Fsimage file.

4. Checkpoint
For each trigger, the secondary namenode will download all edits and an up-to-date fsimage on the Namenode to local and load into memory for merge (this process is called checkpoint) as shown in:

4.1. Checkpoint detailed steps
L Namenode manages metadata information, which has two types of persisted metadata files: The edits operation log file and the Fsimage metadata image file. The new operations log is not immediately merged with Fsimage, and will not be brushed into Namenode's memory, but will be written to edits first (because the merge consumes a significant amount of resources) and the operation is updated to memory after it succeeds.
L have Dfs.namenode.checkpoint.period and dfs.namenode.checkpoint.txns two configurations, as long as any one of these two conditions is reached, Secondarynamenode will perform the checkpoint operation.
When the checkpoint action is triggered, Namenode generates a new edits edits.new file, and secondarynamenode copies the edits file and fsimage to local (HTTP get mode).
L Secondarynamenode load the downloaded fsimage into memory, and then perform each update operation in edits file one at a to make the fsimage in memory up to date, this process is edits and fsimage file merging, Generate a new Fsimage file that is in the Fsimage.ckpt file.
L Secondarynamenode Copy the newly generated fsimage.ckpt file to the Namenode node.
L edits.new files and fsimage.ckpt files on the Namenode node replace the original edits and fsimage files, which is exactly a reincarnation, that is namenode and edits files in Fsimage.
L wait for the next checkpoint trigger Secondarynamenode to work, and always do this loop operation.
4.2. Checkpoint Trigger conditions
The checkpoint operation is controlled by two parameters and can be configured via Core-site.xml:
<property>
<name> dfs.namenode.checkpoint.period</name>
<value>3600</value>
<description>
Time interval between two successive checkpoint. Default 1 hours
</description>
</property>
<property>
<name>dfs.namenode.checkpoint.txns</name>
<value>1000000</value>
<description>
The maximum number of checkpoint transactions that are not performed will be enforced by the emergency checkpoint, even if the checkpoint cycle has not been reached. The default setting is 1 million.
</description>
</property>
As we can see from the above description, Secondarynamenode is not a hot preparation for namenode, it simply merges fsimage and edits. Its fsimage is not up-to-date, because when he downloads fsimage and edits files from Namenode, the new update operation has been written to the Edit.new file. And these updates are not synced to the Secondarynamenode! Of course, if the namenode in the fsimage really problem, still can use the fsimage in Secondarynamenode to replace Namenode, although it is not the latest fsimage, But we can reduce the loss to a minimum!

HDFS Meta Data management mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.