Topic Center

Contact Sales

Home > Others

HDFs read-Write file flow

Last Update:2018-07-26 Source: Internet

Author: User

Tags parent directory

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1.HDFS Write process:

To write data to HDFs, the client first communicates with Namenode to confirm that it can write the file and obtain the Datanode that receives the file block, and then the client passes the file sequentially to the corresponding Datanode, and is responsible for copying the block's copy to other Datanode by the Datanode that received the block.

as shown in figure:

write detailed steps:

1, the root Namenode communication request uploads the file, Namenode checks whether the target file already exists, the parent directory exists
2, Namenode return whether can upload
3, the client will first split the file, such as a Blok block 128m, the file 300m will be cut into 3 blocks, a 128M, a 128M, a 44M request the first block to the transfer to which Datanode server
4, Namenode return to Datanode server
5, the client request a Datanode upload data (essentially an RPC call, establish pipeline), the first Datanode receive the request will continue to call the second Datanode, and then the second call the third Datanode, The entire pipeline is established and returned to the client
6, the client begins to upload the first block to a (the first to read data from the disk into a local memory cache), in packet (a packet of 64KB), of course, when writing Datanode data validation, It is not a packet through a single check, but in chunk units for the check (512byte), the first Datanode received a packet will be passed to the second, the second to the third; the first one each packet will be put into a reply queue waiting to be answered
7. When a block transfer is complete, the client requests Namenode to upload a second block server. HDFs Read process:

The file path to be read by the client is sent to the Namenode,namenode to get the meta information of the file (mainly the location information of the block) returned to the client. The client locates the block of the file and appends the data to the client to obtain the whole file according to the information returned by the Datanode.

as shown in Figure

read the detailed steps:

1, with Namenode Communication query metadata (block is located in the Datanode node), find the file block is located in the Datanode server
2. Select a Datanode (nearest principle, then random) server, request to establish socket stream
3, Datanode start to send data (from the disk to read the data into the stream, in packet to do the calibration)
4, the client is received in packet, first in the local cache, and then write to the target file, the following block block is equivalent to append to the front block block final synthesis of the final required files.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

hdfs file formats hadoop distributed file system hdfs read write file in php hdfs read and write csv file in php sample cobol program to read and write file hdfs architecture

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HDFs read-Write file flow

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support