Understanding the HDFS storage mechanism

Source: Internet
Author: User

Understanding the HDFS storage mechanism

Understanding the HDFS storage mechanism

Previous Article: HDFS storage mechanism in Hadoop

1. HDFS pioneered the design of a file storage method, that is, separate file storage after splitting;

2. HDFS splits the large files to be stored, splits them, and stores them in the established Block. It pre-processes the stored data through preset optimization and preprocessing modes, this solves the needs of storing and computing large files;

3. an HDFS cluster consists of two parts: NameNode and DataNode. Generally, one NameNode and multiple DataNode work together in a cluster;

4. nameNode is the master server of the Cluster. It is mainly used to maintain all files and content data in HDFS, and constantly reads and records the status and status of the DataNode host in the cluster, you can store images by reading and writing image log files;

5. DataNode acts as the task execution role in the HDFS cluster and is the working node of the cluster. The file is divided into several data blocks of the same size and stored on several DataNode. DataNode regularly sends its own running status and storage content to the NameNode in the cluster, work according to the commands sent by NameNode;

6. nameNode is responsible for receiving the information sent from the client, and then sending the file storage location information to the client that submits the request. The client can directly contact DataNode to perform operations on some files.

7. Block is the basic storage unit of HDFS. The default size is 64 MB;

8. HDFS can also back up multiple copies of stored blocks and copy each Block to at least three mutually independent hardware devices to quickly recover damaged data;

9. You can use the established API to operate files in HDFS;

10. when an error occurs in the client's read operation, the client reports an error to the NameNode, requests the NameNode to exclude the wrong DataNode, and then sorts it by distance again, to obtain a new DataNode read path. If all DataNode reports a read failure, the entire task fails to be read;

11. FSDataOutputStream will not immediately shut down any issues that occur during write operations. The client reports error information to NameNode and writes data directly to the DataNode that provides backup. The backup DataNode is upgraded to the preferred DataNode, and copies data in the remaining two DataNode. NameNode marks the wrong DataNode for later processing.

-------------------------------------- Split line --------------------------------------

Copy local files to HDFS

Download files from HDFS to local

Upload local files to HDFS

Common commands for HDFS basic files

Introduction to HDFS and MapReduce nodes in Hadoop

Hadoop practice Chinese version + English version + Source Code [PDF]

Hadoop: The Definitive Guide (PDF]

-------------------------------------- Split line --------------------------------------

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.