"Comic reading" HDFs Storage principle (reprint)

Last Update:2016-05-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Explains the HDFs storage mechanism and operation principle in a concise and understandable comic form.

　First, the role starred

As shown, the HDFS storage-related roles and functions are as follows:

Client: Clients, system users, invoke HDFs API operation files, get file metadata interactively with NN, and read and write data with DN.

Namenode: Meta Data node, is the system's only manager. Responsible for metadata management, providing metadata queries with client interaction, assigning data storage nodes, etc.

Datanode: Data Storage node, responsible for data block storage and redundant backup, execution of data block read and write operations.

Second, write the data

1. Send Write Data request

The storage unit in HDFS is block. Files are usually stored in chunks of 64 or 128M blocks. Unlike normal file systems, in HDFs, if a file size is smaller than the size of a block of data, it does not need to occupy the entire block of storage space.

2. File segmentation

3. DN Assignment

4. Data Write

5. Finish writing

6. Role positioning

Iii. HDFs Read File

1. User needs

HDFs uses the file access model of write-once-read-once. A file does not need to change after it has been created, written, and closed. This assumption simplifies data consistency and makes high-throughput data access possible.

2. Contact the metadata node first

3. Download data

As mentioned earlier, in the process of writing data, the data store has been sorted by the distance between the client and the Datanode node, and the Datanode node that is closer to the client is placed at the front, and the client will first read the data block locally.

4. Thinking

Iv. fault tolerant mechanism of HDFS--Part one: fault type and monitoring method

1, three types of fault

(1) First Class: node failure

(2) Type II: Network failure

(3) Category III: Data corruption (dirty data)

2. Fault Monitoring mechanism

(1) Node failure monitoring mechanism

(2) Communication fault monitoring mechanism

(3) Data error monitoring mechanism

3, review: Heartbeat information and Data block report

The HDFs storage concept is to buy the worst machines with the least amount of money and achieve the most secure and difficult Distributed file system (high fault-tolerant low cost), as can be seen from the above, HDFs think machine failure is a normal, so in the design of the full consideration of a single machine failure, a single disk failure, a single file loss and so on.

V. Fault Tolerance Part II: Read and write fault tolerance

1. Write Fault tolerance

2. Read fault tolerance

Vi. Fault Tolerance Part III: Data node (DN) failure

Vii. Backup Rules

1. Rack and Data node

2. Copy Placement Policy

The first copy of the data block is prioritized on the node where the client is writing the data block, but if the data node on the client is out of space or is currently overloaded, you should select an appropriate data node from the rack in which the data node resides as the local node.

If there is no data node on the client, a suitable data node is randomly selected from the entire cluster as the local node of this data block at this time.

The storage strategy for HDFS is to store one copy on the local rack node, and the other two replicas on different nodes in different racks.

This allows the cluster to survive without a single rack. At the same time, this strategy reduces the data transfer between racks and improves the efficiency of write operations because the blocks are stored only on two different racks, reducing the total bandwidth required to read the data. This takes into account the cost of data security and network transmission to some extent.

"Comic reading" HDFs Storage principle (reprint)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Comic reading" HDFs Storage principle (reprint)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Comic reading" HDFs Storage principle (reprint)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support