Wang Jialin's ninth lecture on hadoop graphic training course, "how to practice hadoop in cloud computing distributed big data-from scratch": analyzes the working mechanisms and processes of namenode and secondary namenode

Source: Internet
Author: User

This article mainly analyzes secondarynamenode.

 

Complete release directory of "cloud computing distributed Big Data hadoop hands-on"

Cloud computing distributed Big Data practical technology hadoop exchange group:312494188Cloud computing practices will be released in the group every day. welcome to join us!

 

When hadoop is started, secondarynamenode is started. Let's run the JPS command to check which processes are started when hadoop is started:

 

Before explaining the specific functions of secondarynamenode, let's take a look at the features and specific functions of namenode:

1. In hadoop clusters, there are namenode and datanode. There can be many datanode at runtime, but there is only one namenode;

2. namenode stores the metadata of the hadoop cluster, that is, the metadata of the file system, including the directory structure of the entire file system, the files in each directory, and the parts in each file, which datanode stores each part;

3. The namenode stores metadata in the memory, so that the client can quickly process "read requests" for data ";

4, but data in the memory is easy to lose. For example, when the power is down, we must have a copy of metadata on the disk;

5. When a" Write Request "arrives, the file system of hadoop needs to be changed, namenode first writes the editlog and actively synchronizes it to the disk. After successful, it modifies the metadata in the memory and returns it to the client, the client will write data to the specific datanode only after receiving the message returned successfully.

6. hadoop maintains a fsimage file on the disk, which is an image of metedata in namenode;

7. fsimage will not be consistent with metedata in namenode at any time, but will be updated by merging the content in the editlog at intervals;

8. The merge process is a relatively memory-and CPU-consuming operation. Therefore, hadoop uses secondarynamenode to update the fsimage file;

 

Let's take a look at the workflow of secondarynamenode:

1. secondarynamenode notifies primarynamenode to switch editlog;

2. secondarynamenode obtains fsimage and editlog from primarynamenode over HTTP;

3. secondarynamenode loads the fsimage into the memory and then merges the editlog;

4. secondarynamenode sends the new fsimage after merging to primarynamenode;

5. After receiving the new fsimage from secondarynamenode, primarynamenode replaces the old fsimage with the new fsimage;

Secondarynamenode is suitable for triggering the above workflow? Or what is checkpoint? Secondarynamenode workflow is triggered when any of the following conditions are met:

1. fs. Checkpoint. Period specifies the maximum interval between two checkpoints. The default time is 3600 seconds, that is, one hour;

2. fs. Checkpoint. Size specifies the maximum value of the editlog file. The default size of this file is 64 mb. Once this value is exceeded, the workflow of secondarynamenode is forcibly triggered;

 

from hadoop 2. X has introduced the active-Backup namenode mode, which has two namenode: Active namenode and backup namenode at the same time. When acitive namenode cannot provide services normally, backup namenode can replace acitive namenode to continue to provide services for the client, which will ensure that the hadoop service will not be interrupted.

you can see in hadoop 1. in Version X, the normal operation of hadoop is extremely dependent on a single primary namenode. When the primary namenode fails, the entire hadoop file system cannot provide services to the client, this is unacceptable for some very critical applications, so the hardware of the machine running the namenode node needs to be very good, for example, the disk Io speed is very fast;

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.