Hadoop knowledge Rollup

Source: Internet
Author: User

two major features of Hadoop : Massive data storage and massive data analysis

The three core components of HADOOP2 are: HDFS, mapperreducer, and yarn

1. HDFS: Distributed File system Mass data storage

2, Mapperreducer: Computing framework, massive data analysis

3, Yarn: Resource Scheduling Management cluster

HDFs working mechanism : based on Namenode and Datanode

1, Namenode: Response to the client's request, responsible for maintaining the entire HDFs file system directory tree, as well as each path (file) corresponding block block information (block ID, and the Datanode server located) ; Management of meta-data

2. Datanode: Store and manage user's file data; Periodically report to namenode the block information held by him (via Heartbeat mechanism RPC)

namenode Safe Mode :1), when the Nameonde found that the number of blocks lost to a configured threshold, it will enter the security mode, which in this mode waits for Datanode to report block information to it; 2) , in Safe mode, Namenode can provide meta-data query function, but cannot be modified;

HDFS Read process:

1 , with Namenode Communication Query Metadata, locate the file block that contains the Datanode Server

2 , pick a Datanode (Nearest principle, then random) server, request to establish Socket Flow

3 , Datanode Start sending data (read data from the disk into the stream to Packet for the unit to do the calibration)

4 , clients to Packet received for the unit, now locally cached, and then written to the destination file

HDFs Write Process:

1, the root Namenode communication request uploads the file, Namenode checks whether the target file already exists, the parent directory exists

2, Namenode return whether can upload

3, the client requests the first block on which Datanode server to transfer to

4, Namenode return 3 Datanode Server ABC

5, the client requests a 3 DN of a a upload data (essentially an RPC call, establish pipeline), a received request will continue to call B, then B call C, the whole pipeline is established, step back to the client

6, the client began to upload to a first block (first read data from the disk into a local memory cache), in packet, a received a packet will be passed to B,b to c;a each pass a packet will be put into A reply queue waits for a reply

7. When a block transfer is complete, the client requests Namenode to upload a second block server.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Hadoop knowledge Rollup

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.