Hadoop knowledge Rollup

Last Update:2015-06-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

two major features of Hadoop : Massive data storage and massive data analysis

The three core components of HADOOP2 are: HDFS, mapperreducer, and yarn

1. HDFS: Distributed File system Mass data storage

2, Mapperreducer: Computing framework, massive data analysis

3, Yarn: Resource Scheduling Management cluster

HDFs working mechanism : based on Namenode and Datanode

1, Namenode: Response to the client's request, responsible for maintaining the entire HDFs file system directory tree, as well as each path (file) corresponding block block information (block ID, and the Datanode server located) ; Management of meta-data

2. Datanode: Store and manage user's file data; Periodically report to namenode the block information held by him (via Heartbeat mechanism RPC)

namenode Safe Mode :1), when the Nameonde found that the number of blocks lost to a configured threshold, it will enter the security mode, which in this mode waits for Datanode to report block information to it; 2) , in Safe mode, Namenode can provide meta-data query function, but cannot be modified;

HDFS Read process:

1 , with Namenode Communication Query Metadata, locate the file block that contains the Datanode Server

2 , pick a Datanode (Nearest principle, then random) server, request to establish Socket Flow

3 , Datanode Start sending data (read data from the disk into the stream to Packet for the unit to do the calibration)

4 , clients to Packet received for the unit, now locally cached, and then written to the destination file

HDFs Write Process:

1, the root Namenode communication request uploads the file, Namenode checks whether the target file already exists, the parent directory exists

2, Namenode return whether can upload

3, the client requests the first block on which Datanode server to transfer to

4, Namenode return 3 Datanode Server ABC

5, the client requests a 3 DN of a a upload data (essentially an RPC call, establish pipeline), a received request will continue to call B, then B call C, the whole pipeline is established, step back to the client

6, the client began to upload to a first block (first read data from the disk into a local memory cache), in packet, a received a packet will be passed to B,b to c;a each pass a packet will be put into A reply queue waits for a reply

7. When a block transfer is complete, the client requests Namenode to upload a second block server.

Hadoop knowledge Rollup

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop knowledge Rollup

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support