Interpreting the functions of secondary Namenode

Source: Internet
Author: User

1. Overview

Recently a friend asked me secondary namenode role, is not namenode backup? Is it to prevent a single point of namenode problem? Indeed, just touching hadoop, literally, would easily put secondary namenode as a backup node; in fact, this is a misunderstanding, we can not literally understand, read the official documents, we may know, in fact, is not the case, the following to repeat the next secondary The role of Namenode.

2.Secondary NameNode?

In Hadoop, there are some naming modules that are less than satisfactory, and secondary namenode is one of the typical examples. From its name, it feels like a namenode backup node, but it's actually not. Many of the beginners of Hadoop are puzzled as to what role secondary namenode play in, and what role it plays in HDFs. Below, I'll explain:

From the name, it does have something to do with namenode, so before we get into the secondary namenode, let's look at what Namenode is for.

2.1NameNode

Namenode is primarily used to preserve the metadata information of HDFs, such as namespace information, block information, and so on. This information is present in memory when it is running. However, this information can also be persisted to disk. As shown in the following:

Show to namenode how to save metadata to disk, there are two different files:

    • Fsimage: This is a snapshot of the entire file system at Namenode startup.
    • Edits: It is a sequence of changes to the file system after the Namenode is started.

Only when the Namenode is restarted does the edits merge into the fsimage file, resulting in a recent snapshot of the file system. However, the Namenode in a production environment cluster is rarely restarted, which means that edits files can become very large when Namenode is running for a long time. In this scenario, the following problems occur:

    1. Edits file will become very large, how to manage this file?
    2. Namenode restarts can take a long time because there are many changes to be merged into the Fsimage file.
    3. If Namenode goes down, we lose a lot of changes, because the Fsimage file timestamp is older at this time.

So to overcome this problem, we need an easy-to-manage mechanism to help us reduce the size of the edits file and get an up-to-date Fsimage file, which will also reduce the pressure on the namenode. and secondary namenode is to help solve the above problems proposed, its responsibility is to merge Namenode edits to fsimage file. :

Working principle, I also repeat here:

    1. First, it periodically goes to Namenode to get edits, and updates to Fsimage.
    2. Once it has a new fsimage file, it copies it back to the Namenode.
    3. Namenode will use this new Fsimage file on the next reboot, reducing the time to restart.

The whole purpose of secondary namenode is to provide a checkpoint node in HDFs, which can be clearly understood by reading the official documentation, which is just an assistant node of Namenode, which is also considered checkpoint within the community The reason for node.

Now, we understand that what secondary namenode is doing is setting up a checkpoint in the file system to help namenode better work; it is not a replacement for namenode or a backup of Namenode.

The checkpoint process of the secondary namenode is started and is controlled by two configuration parameters:

    • Fs.checkpoint.period, which specifies the maximum time interval for successive checkpoints, the default value is 1 hours.
    • Fs.checkpoint.size defines the maximum value of the edits log file, which, once exceeded, causes checkpoints to be enforced even if the maximum time interval to checkpoints is not reached. The default value is 64MB.

If all other historical images and edits files are lost on Namenode except for the latest checkpoints, Namenode can introduce this latest checkpoint. The following actions can be implemented for this function:

    • Create an empty folder at the location specified in configuration parameter dfs.name.dir;
    • Assign the location of the checkpoint directory to the configuration parameter fs.checkpoint.dir;
    • Start Namenode, and add-importcheckpoint.

Namenode reads the checkpoint from the Fs.checkpoint.dir directory and stores it in the Dfs.name.dir directory. If there is a valid image file under the Dfs.name.dir directory, Namenode will fail to start. Namenode will check the consistency of the image file in the Fs.checkpoint.dir directory, but will not change it.

  Note: When was the change written to edit logs in Namenode? This operation is actually triggered by Datanode's write operation, and when we write the file to Datanode, Datanode communicates with Namenode and tells Namenode what file block is placed in it, Namenode this time the metadata information will be written to the edit logs file.

  The following is a description of the official documentation:

  The NameNode stores modifications to the file system as a log appended to a native file system file, edits. When a NameNode starts up, it reads HDFS state from an image file, Fsimage, and then applies edits from the edits log file . It then writes new HDFS state to the fsimage and starts normal operation with an empty edits file. Since NameNode merges fsimage and edits files only during start up, the edits log file could get Verylarge over time on a Busy cluster.  Another side effect of a larger edits file is that next restart of NameNode takes longer. The secondary NameNode merges the fsimage and the edits log files periodically and keeps edits log size within a limit. It is usually run ona different machine than the primary NameNode since its memory requirements be on the same order as T  He primary NameNode. The start of the checkpoint process on the secondary NameNode was controlled by the configuration parameters.* Dfs.namenode . Checkpoint.period, set to 1 hour by default, Specifies the maximum delay between-consecutive checkpoints, and* Dfs.namenode.checkpoint.txns, set to 1 million by default, D Efines the number of uncheckpointed transactions on the NameNode which would force an urgent checkpoint, even if the CHEC  Kpoint period have not been reached. The secondary NameNode stores the latest checkpoint in a directory which is structured the same as the primary Namenod E ' s directory. So, the check pointed image is all-ready-to-be-read by the primary NameNode if necessary.

Reference Address: http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html

3. Summary

This article and everybody to share here, if in the reading process have any question, can add group to discuss or send an email to me, I will do my best to answer for you, with June Mutual encouragement!

Interpreting the functions of secondary Namenode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.