Hadoop HDFS file read/write process,

Last Update:2016-09-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. HDFS read Process

1.1 hdfs api read files

1 Configuration conf = new Configuration (); 2 FileSystem fs = FileSystem. get (conf); 3 Path file = new Path ("demo.txt"); 4 FSDataInputStream inStream = fs. open (file); 5 String data = inStream. readUTF (); 6 System. out. println (data); 7 inStream. close ();View Code

1.2 HDFS File Reading Process

1. initialize FileSystem, and then the client uses the open () function of FileSystem to open the file.
2. FileSystem calls the metadata node with RPC to obtain the data block information of the file. For each data block, the metadata node returns the address of the data node that saves the data block.
3. FileSystem returns FSDataInputStream to the client to read data. The client calls the stream read () function to start reading data.
4. DFSInputStream connects to the nearest data node that saves the first data block of this file. data is read from the data node to the client)
5. When the data block is read, DFSInputStream closes the connection with the data node and connects to the nearest data node of the next data block of the file.
6. When the client completes Data Reading, call the close function of FSDataInputStream.
7. When reading data, if the client fails to communicate with the data node, try to connect to the next data node that contains the data block.
8. Failed data nodes will be recorded and will not be connected later. [Note: The serial numbers here are not a one-to-one correspondence]

1.3HDFS File Read Process Diagram

Ii. HDFS write process2.1 hdfs api Write File 1 Configuration conf = new Configuration (); 2 FileSystem fs = FileSystem. get (conf); 3 Path file = new Path ("demo.txt"); 4 FSDataOutputStream outStream = fs. create (file); 5 outStream. writeUTF ("Welcome to HDFS Java API !!! "); 6 outStream. close ();View Code 2.2 HDFS file Writing Process

1. initialize FileSystem. The client calls create () to create a file. 2. fileSystem uses RPC to call the metadata node and creates a new file in the file system namespace. The metadata node first determines that the file does not exist and the client has the permission to create the file, then create a new file. 3. FileSystem returns DFSOutputStream. The client is used to write data and the client starts to write data. 4. DFSOutputStream divides data into blocks and writes data queue. Data queue is read by Data Streamer and notifies the metadata node to allocate data nodes to store Data blocks (three data blocks are copied by default ). The allocated data nodes are placed in a pipeline. Data Streamer writes Data blocks to the first Data node in pipeline. The first data node sends the data block to the second data node. The second data node sends the data to the third data node. 5. DFSOutputStream saves ack queue for the sent data block and waits for the data node in the pipeline to inform that the data has been written successfully. 6. When the client ends writing data, the close function of stream is called. This operation writes all data blocks to the data nodes in the pipeline and waits until the ack queue returns success. Finally, the metadata node is notified to have been written. 7. if the data node fails to be written, close the pipeline and put the data block in the ack queue to the beginning of the data queue, the current data block is assigned a new identifier by the metadata node in the data node that has been written. After the faulty node is restarted, it can be noticed that the data block is outdated and deleted. Failed data nodes are removed from pipeline, and other data blocks are written to the other two data nodes in pipeline. The metadata node is notified that the data block is insufficient to copy the data block. A third backup will be created in the future. 2.3HDFS file Writing Process Diagram

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop HDFS file read/write process,

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support