Understanding the HDFS storage mechanism
Understanding the HDFS storage mechanism
Previous Article: HDFS storage mechanism in Hadoop
1. HDFS pioneered the design of a file storage method, that is, separate file storage after splitting;
2. HDFS splits the large files to be stored, splits them, and stores them in the established Block. It pre-processes the stored data through preset optimization and preprocessing modes, this solves the needs of storing and computing large files;
3. an HDFS cluster consists of two parts: NameNode and DataNode. Generally, one NameNode and multiple DataNode work together in a cluster;
4. nameNode is the master server of the Cluster. It is mainly used to maintain all files and content data in HDFS, and constantly reads and records the status and status of the DataNode host in the cluster, you can store images by reading and writing image log files;
5. DataNode acts as the task execution role in the HDFS cluster and is the working node of the cluster. The file is divided into several data blocks of the same size and stored on several DataNode. DataNode regularly sends its own running status and storage content to the NameNode in the cluster, work according to the commands sent by NameNode;
6. nameNode is responsible for receiving the information sent from the client, and then sending the file storage location information to the client that submits the request. The client can directly contact DataNode to perform operations on some files.
7. Block is the basic storage unit of HDFS. The default size is 64 MB;
8. HDFS can also back up multiple copies of stored blocks and copy each Block to at least three mutually independent hardware devices to quickly recover damaged data;
9. You can use the established API to operate files in HDFS;
10. when an error occurs in the client's read operation, the client reports an error to the NameNode, requests the NameNode to exclude the wrong DataNode, and then sorts it by distance again, to obtain a new DataNode read path. If all DataNode reports a read failure, the entire task fails to be read;
11. FSDataOutputStream will not immediately shut down any issues that occur during write operations. The client reports error information to NameNode and writes data directly to the DataNode that provides backup. The backup DataNode is upgraded to the preferred DataNode, and copies data in the remaining two DataNode. NameNode marks the wrong DataNode for later processing.
-------------------------------------- Split line --------------------------------------
Copy local files to HDFS
Download files from HDFS to local
Upload local files to HDFS
Common commands for HDFS basic files
Introduction to HDFS and MapReduce nodes in Hadoop
Hadoop practice Chinese version + English version + Source Code [PDF]
Hadoop: The Definitive Guide (PDF]
-------------------------------------- Split line --------------------------------------