Thoughts on the study of Hadoop (i)--a summary of small knowledge points

Source: Internet
Author: User

One, for the small summary of CDH:

CDH: Is the Cloudera company in the Apache Open source project based on Hadoop, a total of five versions
The first two are no longer updated, the two are CDH4 (based on the hadoop2.0.0 version evolution),
CDH5 (updates are available every once in a while)

The difference between CDH and Apache Hadoop:
The 1.CDH version is clearer and now only has two versions CDH3 (based on hadoop1.0) and
CDH4 (based on hadoop2.0), Hadoop's version is relatively chaotic, CDH than Apache Hadoop
In the compatibility, security, stability has been enhanced;

2.CDH has added a lot of patches and bug fixes to Hadoop, updating faster than Hadoop.

3. Security, CDH supports Kerberos security authentication, Apache Hadoop only supports simple username matching authentication.

There are four ways to install 4.CDH: YUM/APT package, tar package, RPM package, Cloudera Manager
and Apache Hadoop only supports tar installations.


Second, the role of Secondarynamenode:

1.SecondaryNameNode is a snapshot of Namenode, which is periodically configured according to:
Fs.checkpoint.period, the default value is 3,600 seconds, to view the backup on the Namenode node
Fsimage the image file and the edits log file, and periodically merges the two files to
The control of the edits file is within a certain size limit. Fs.checkpoint.size: Set the
The size of the edits file, by default, is 64M, and once the edits is greater than this value, the checkpoint is forced back.

2.SecondaryNameNode as a checkpoint saves the latest checkpoint's directory structure information
The directory structure information on the Namenode is consistent, and the previous fsimage and edits are automatically lost.

3. When Namenode accidentally hangs, it is necessary to manually secondarynamenode the checkpoint information
Copy to Namenode node: do the following:

Premise: Directory has been lost
A. Copy all the contents of ${fs.checkpoint.dir} from the Secondarynamenode node to
Namenode node in the ${fs.checkpoint.dir} directory
B. Create an empty folder Dfs.namenode.name.dir the folder to which you are pointing;
C. Start Namenode:hadoop Namenode-importckeckpoint
(This step is restored from ${fs.checkpoint.dir} to ${dfs.namenode.name.dir},
and start Namenode)


Thirdly, describe the request flow of a resource according to Yarn's architecture:

1.Nodemanager register the resources of each machine to ResourceManager;
2. Client clients submit jobs to ResourceManager;
3.ApplicationMaster (located on one of the NodeManager) to ResourceManager please
Resources, and determine whether the existing resources on the NodeManager meet the needs;
4.ResourceManager sends resources to applicationmaster in the form of container;
5.ApplicationMaster distribute the resulting resources to NodeManager, each nodemanager based on
Container, start a certain number of task run jobs;
6.Container (contains information such as CPU, hard disk, environment configuration, startup command, etc.) as a resource unit to ensure the isolated operation of the job.
7. Each task periodically reports the completion status of the task to applicationmaster through the heartbeat mechanism. Most
Until the task is complete, Applicamaster returns the completion information to ResourceManager.

Thoughts on the study of Hadoop (i)--a summary of small knowledge points

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.