"Comic reading" HDFs Storage principle (reprint)

Source: Internet
Author: User

reprinted from: Http://www.cnblogs.com/itboys/p/5497698.html

role starred

As shown, the HDFS storage-related roles and functions are as follows:

Client: Clients, system users, invoke HDFs API operation files, get file metadata interactively with NN, and read and write data with DN.

Namenode: Meta Data node, is the system's only manager. Responsible for metadata management, providing metadata queries with client interaction, assigning data storage nodes, etc.

Datanode: Data Storage node, responsible for data block storage and redundant backup, execution of data block read and write operations.

Second, write the data

1. Send Write Data request

The storage unit in HDFS is block. Files are usually stored in chunks of 64 or 128M blocks. Unlike normal file systems, in HDFs, if a file size is smaller than the size of a block of data, it does not need to occupy the entire block of storage space.

2. File segmentation

3. DN Assignment

4. Data Write

5. Finish writing

6. Role positioning

iii. HDFs Read File

1. User needs

HDFs uses the file access model of write-once-read-once. A file does not need to change after it has been created, written, and closed. This assumption simplifies data consistency and makes high-throughput data access possible.

2. Contact the metadata node first

3. Download data

As mentioned earlier, in the process of writing data, the data store has been sorted by the distance between the client and the Datanode node, and the Datanode node that is closer to the client is placed at the front, and the client will first read the data block locally.

4. Thinking

Iv. Fault tolerant mechanism of HDFS--Part one: fault type and monitoring method

1, three types of fault

(1) First Class: node failure

(2) Type II: Network failure

(3) Category III: Data corruption (dirty data)

2. Fault Monitoring mechanism

(1) Node failure monitoring mechanism

(2) Communication fault monitoring mechanism

(3) Data error monitoring mechanism

3, review: Heartbeat information and Data block report

The HDFs storage concept is to buy the worst machines with the least amount of money and achieve the most secure and difficult Distributed file system (high fault-tolerant low cost), as can be seen from the above, HDFs think machine failure is a normal, so in the design of the full consideration of a single machine failure, a single disk failure, a single file loss and so on.

v. Fault tolerance Part II: Read and write fault tolerance

1. Write Fault tolerance

2. Read fault tolerance

Vi. Fault Tolerance Part III: Data node (DN) Failure

Vii. Backup Rules

1. Rack and Data node

2. Copy Placement Policy

The first copy of the data block is prioritized on the node where the client is writing the data block, but if the data node on the client is out of space or is currently overloaded, you should select an appropriate data node from the rack in which the data node resides as the local node.

If there is no data node on the client, a suitable data node is randomly selected from the entire cluster as the local node of this data block at this time.

The storage strategy for HDFS is to store one copy on the local rack node, and the other two replicas on different nodes in different racks.

This allows the cluster to survive without a single rack. At the same time, this strategy reduces the data transfer between racks and improves the efficiency of write operations because the blocks are stored only on two different racks, reducing the total bandwidth required to read the data. This takes into account the cost of data security and network transmission to some extent.

"Comic reading" HDFs Storage principle (reprint)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.