Re-understanding the storage mechanism of HDFS

Source: Internet
Author: User

Re-understanding the storage mechanism of HDFS

1. HDFs pioneered the design of a set of file storage methods, namely, the separation of files after the storage;


2. HDFs will be stored in the large file segmentation, the partition is stored in the established storage block (block), and through the pre-set optimization processing, the mode of the stored data preprocessing, thus solving the large file storage and computing needs;


3. An HDFS cluster consists of two parts, namely Namenode and Datanode. In general, a cluster will have a namenode and multiple datanode work together;


4. Namenode is the primary server of the cluster, which is mainly used for maintaining all the files and content data in HDFs, and continuously reads the status of Datanode host in the record cluster and stores it by reading and writing the image log file;


5. Datanode in the HDFs cluster as a task-specific execution role , which is the working node of the cluster. The file is divided into several data blocks of the same size, stored on several datanode, and Datanode periodically sends its own running state and storage to namenode within the cluster, and works according to the instructions sent by Namenode;


6. Namenode is responsible for accepting the information sent by the client, and then sending the file storage location information to the client submitting the request, which is contacted by the client directly with the Datanode to perform the operation of some files.


7. Block is the basic storage unit of HDFS, the default size is 64M;


8. HDFs can also make multiple copies of the blocks already stored, replicating each block to at least 3 separate hardware, so that the corrupted data can be recovered quickly;


9. The user can use the established API interface to operate the files in HDFs;


10. When the client's read operation error occurs, the client reports an error to namenode and requests that Namenode troubleshoot the error after Datanode and then re-sort after the distance to obtain a new Datanode read path. If all Datanode have reported a read failure, then the entire task fails to read;


11. Fsdataoutputstream does not close immediately when writing a problem that occurs during the operation. The client reports an error message to Namenode and writes the data directly to the Datanode that provides the backup. The backup Datanode is promoted to the preferred Datanode and the replicated data is backed up in the remaining 2 datanode. Namenode flags the wrong datanode for subsequent processing.



Re-understanding the storage mechanism of HDFS

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.