HDFs read-write process

Last Update:2015-07-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Data read process:

The client accesses the Namenode, informing the file that needs to be read
Customer Identity Confirmation
1. By trusting the client. By which the user name is specified
2. Through mandatory authentication mechanisms such as Kerberos
Checks the owner of the file and its set access rights, if the file does exist, and the user has access to it. Namenode tells the label of the first chunk of the client file and the Datanode list that holds the block (the list is based on the distance between the Datanode and the client, and the distance is calculated based on the rack topology of the Hadoop cluster)
The client accesses the most appropriate datanode based on the block designator and Datanode hostname, reading the required block of data until all data blocks have been read or the client has actively closed the file stream

Exception Condition:

Process or host exception when reading data from Datanode. When the read operation does not stop, the HDFS library automatically attempts to read the data from other Datanode that have a copy of the data. If all copies of the data are inaccessible, the read operation fails and the client receives an exception error message
When the client tries to read data from Datanode, the data block location information returned by Namenode has expired. If there are other Datanode to save the copy of the block, the client will attempt to read the data from those datanode, otherwise the read operation will fail

Data write Process:

The client sends a request via the Hadoop file system-related API to open a file to be written to, and if the user has sufficient privileges, request a namenode to establish the metadata for the file on Namenode
Client received a "Open file Success" response
The client writes data to the stream, the data is automatically split into packets, and the packet is persisted in the memory queue
The client's stand-alone thread reads packets from the queue and requests a set of Datanode lists to Namenode to write multiple copies of the data block
The first datanode in the client direct connection list, the Datanode is connected to the second Datanode, the second one connects to the third, and the replication pipeline that blocks the data
The packet is streamed to the first Datanode disk, and the next Datanode in the pipeline is written to its disk, and so on
Each datanode in the replication pipeline confirms that the received packet was successfully written to the disk
The client maintains a list of which packets have not received a confirmation message, and each response is received, and the client knows that the data was successfully written to a datanode in the pipeline
When the block is full, the client will reapply to Namenode for the next set of Datanode
The client writes all remaining packets to disk, closes the data stream, and notifies the namenode that the write operation is complete

Exception Condition:

One of the datanode in the replication pipeline cannot write data to disk, and the pipeline shuts down immediately. Packets that have been sent but have not yet received confirmation are rolled back to the queue to ensure that the downstream node of the error node in the pipeline can obtain the packets. In the remaining Health data node, the data block being written is assigned a new ID. When a failed data node recovers, the redundant data block appears to be discarded automatically, and a new replication pipeline consisting of the remaining nodes is reopened, continuing the write operation until the file is closed

This article is from "Lucas" blog, please be sure to keep this source http://4292565.blog.51cto.com/4282565/1672863

HDFs read-write process

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HDFs read-write process

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support