SOURCE url:http://www.36dsj.com/archives/41391
According to Maneesh Varshney's comic book, the paper explains the HDFs storage mechanism and operation principle in a concise and understandable comic form. first, the role starred
As shown in the figure above, the HDFs storage-related roles and functions are as follows:
Client: Clients, system users, invoke HDFs API operation files, get file metadata interactively with NN, and read and write data with DN.
Namenode: Meta Data node, is the system's only manager. Responsible for metadata management, providing metadata queries with client interaction, assigning data storage nodes, etc.
Datanode: Data Storage node, responsible for data block storage and redundant backup, execution of data block read and write operations. Second, write the data
1. Send Write Data request
The storage unit in HDFS is block. Files are usually stored in chunks of 64 or 128M blocks. Unlike normal file systems, in HDFs, if a file size is smaller than the size of a block of data, it does not need to occupy the entire block of storage space.
2. File segmentation
3. DN Assignment
4. Data Write
5. Finish writing
6. Role positioning
iii. HDFs Read File
1. User needs
HDFs uses the file access model of write-once-read-once. A file does not need to change after it has been created, written, and closed. This assumption simplifies data consistency and makes high-throughput data access possible.
2. Contact the metadata node first
3. Download data
As mentioned earlier, in the process of writing data, the data store has been sorted by the distance between the client and the Datanode node, and the Datanode node that is closer to the client is placed at the front, and the client will first read the data block locally.
4. Thinking
Iv. Fault tolerant mechanism of HDFS--Part one: fault type and monitoring method
1, three types of fault
(1) First Class: node failure
(2) Type II: Network failure
(3) Category III: Data corruption (dirty data)
2. Fault Monitoring mechanism
(1) Node failure monitoring mechanism
(2) Communication fault monitoring mechanism
(3) Data error monitoring mechanism
3, review: Heartbeat information and Data block report
The HDFs storage concept is to buy the worst machines with the least amount of money and achieve the most secure and difficult Distributed file system (high fault-tolerant low cost), as can be seen from the above, HDFs think machine failure is a normal, so in the design of the full consideration of a single machine failure, a single disk failure, a single file loss and so on.
v. Fault tolerance Part II: Read and write fault tolerance
1. Write Fault tolerance
2. Read fault tolerance
Vi. Fault Tolerance Part III: Data node (DN) Failure