Hadoop HDFs Storage principle

Source: Internet
Author: User
Tags file size

SOURCE url:http://www.36dsj.com/archives/41391


According to Maneesh Varshney's comic book, the paper explains the HDFs storage mechanism and operation principle in a concise and understandable comic form. first, the role starred

As shown in the figure above, the HDFs storage-related roles and functions are as follows:

Client: Clients, system users, invoke HDFs API operation files, get file metadata interactively with NN, and read and write data with DN.

Namenode: Meta Data node, is the system's only manager. Responsible for metadata management, providing metadata queries with client interaction, assigning data storage nodes, etc.

Datanode: Data Storage node, responsible for data block storage and redundant backup, execution of data block read and write operations. Second, write the data

1. Send Write Data request

The storage unit in HDFS is block. Files are usually stored in chunks of 64 or 128M blocks. Unlike normal file systems, in HDFs, if a file size is smaller than the size of a block of data, it does not need to occupy the entire block of storage space.

2. File segmentation

3. DN Assignment

4. Data Write

5. Finish writing

6. Role positioning

iii. HDFs Read File

1. User needs

HDFs uses the file access model of write-once-read-once. A file does not need to change after it has been created, written, and closed. This assumption simplifies data consistency and makes high-throughput data access possible.

2. Contact the metadata node first

3. Download data

As mentioned earlier, in the process of writing data, the data store has been sorted by the distance between the client and the Datanode node, and the Datanode node that is closer to the client is placed at the front, and the client will first read the data block locally.

4. Thinking

Iv. Fault tolerant mechanism of HDFS--Part one: fault type and monitoring method

1, three types of fault

(1) First Class: node failure

(2) Type II: Network failure

(3) Category III: Data corruption (dirty data)

2. Fault Monitoring mechanism

(1) Node failure monitoring mechanism

(2) Communication fault monitoring mechanism

(3) Data error monitoring mechanism


3, review: Heartbeat information and Data block report

The HDFs storage concept is to buy the worst machines with the least amount of money and achieve the most secure and difficult Distributed file system (high fault-tolerant low cost), as can be seen from the above, HDFs think machine failure is a normal, so in the design of the full consideration of a single machine failure, a single disk failure, a single file loss and so on.

v. Fault tolerance Part II: Read and write fault tolerance

1. Write Fault tolerance

2. Read fault tolerance

Vi. Fault Tolerance Part III: Data node (DN) Failure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.