The HDFS architecture function analysis of Hadoop _HDFS

Source: Internet
Author: User
Tags crc32 checksum file permissions server memory
HDFs system architecture Diagram level analysis

Hadoop Distributed File System (HDFS): Distributed File systems

* Distributed applications mainly from the schema: Master node Namenode (one) from the node: Datenode (multiple)

*HDFS Service Components: Namenode,datanode,secondarynamenode

*HDFS storage: Files stored on HDFs are stored as blocks, and the default block size in the hadoop2.x version is 128M.

HDFs Service function

*namenode: Master node, storage file metadata (filename, file directory structure, file attributes (build time, number of replicas, file permissions, etc.)

, and the block list of each file and the Datanode of the block.

*datanode: Stores file data in the local file system, as well as checksum of block data (length, creation time, CRC32 checksum).

*secondary Namenode: A secondary daemon used to monitor the HDFS state to obtain a snapshot of the HDFs metadata at intervals. Periodically merges Namenode's mirrored and edited log files into a single file called a new mirrored file.


*************************************************************

There are two basic components in HDFs

is made up of ' one Namenode server and multiple Datanode servers '

A) Namenode: Name node that is used to store meta data (help you quickly find blocks, that is, data blocks are mapped)

HDFs Storage: (Block operation) is a file logically divided into blocks, Mody a block the default storage is 128M, the blocks are stored in sequence to a different Datanode server, each datanode stored on which blocks are recorded on the Namenode, Read the time is also the first to access Namenode, (Namenode on behalf of each group of access) through the Namenode to find the corresponding block file, Namenode is actually stored in the corresponding relationship meta data.

B) Datanode: Data nodes, used to store real data.

Datanode is actually used to store data blocks.

Note: When we have a lot of servers to form a HDFS cluster, we will start a role that is namenode and datanode process, as Namenode server to start the corresponding role

When a Datanode server failed to start, he does not affect the data access inside, such as all the Datanode server memory full, we can directly add the server in, can be arbitrarily added, not affect, this is the Distributed file storage System

We have a lot of servers in the cluster,

One of the servers started with Namenode.

Other servers are starting with Datanode.







Functions of the Yarn service

ResourceManager: Processing client requests

Start/Monitor Applicationmaster

Monitoring NodeManager

Resource allocation and scheduling

Applicationmaster: Data Segmentation

Request resources for an application and assign it to an internal task

Task monitoring and fault tolerance

NodeManager: Resource management on a single node

To process commands from ResourceManager

To process commands from Applicationmaster

Container: The abstraction of multitasking environment, encapsulates the CPU, memory and other multidimensional resources, as well as environment variables, launch commands and other tasks related to the operation of information


MapReduce how to run on the yarn



1. The client submits the task to the ResourceManager.

2. ResourceManager assigns a applicationsmanager to the NodeManager and finds a container to generate a Mr App MSTR.

3. The Application Manager requests resources from the ResourceManager.

4. After the resource application is completed, find NodeManager to launch the Mr App Mstr in the container.

5. Map task and reduce task start.

6. The map and reduce procedures are to be submitted to the Mr App Mstr for information.

7. When the program is finished, the Application Manager submits the information to the ResourceManager.




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.