ArticleDirectory
File Reading Process
The client uses the open () function of filesystem to open a file.
Distributedfilesystem calls the metadata node using RPC to obtain the data block information of the file.
For each data block, the metadata node returns the address of the data node that saves the data block.
Distributedfilesystem returns fsdatainputstream to the client to read data.
The client calls stream's read () function to start reading data.
Dfsinputstream connects to the nearest data node that saves the first data block of this file.
Data reads data from the data node to the client)
When this data block is read, dfsinputstream closes the connection with this data node, and then connects to the nearest data node of the next data block of this file.
When the client completes Data Reading, it calls the close function of fsdatainputstream.
During data reading, if the client fails to communicate with the data node, it tries to connect to the next data node that contains the data block.
Failed data nodes will be recorded and will not be connected later.
File Writing Process
The client calls create () to create a file.
Distributedfilesystem uses RPC to call metadata nodes and creates a new file in the file system namespace.
The metadata node first determines that the file does not exist, and the client has the permission to create the file, and then creates the new file.
Distributedfilesystem returns dfsoutputstream, which is used by the client to write data.
The client starts writing data. dfsoutputstream divides the data into blocks and writes the data queue.
Data queue is read by data streamer and notifies the metadata node to allocate data nodes to store data blocks (three data blocks are copied by default ). The allocated data nodes are placed in a pipeline.
Data streamer writes data blocks to the first data node in pipeline. The first data node sends the data block to the second data node. The second data node sends the data to the third data node.
Dfsoutputstream saves the ACK queue for the sent data block and waits for the data node in the pipeline to inform that the data has been written successfully.
If the data node fails to be written:
Close pipeline and put the data blocks in ack queue into the beginning of data queue.
The current data block is assigned a new identifier by the metadata node in the data node that has been written. After the faulty node is restarted, it can be noticed that the data block is outdated and deleted.
Failed data nodes are removed from pipeline, and other data blocks are written to the other two data nodes in pipeline.
The metadata node is notified that the data block is insufficient to copy the data block. A third backup will be created in the future.
When the client ends writing data, the close function of stream is called. This operation writes all data blocks to the data nodes in the pipeline and waits until the ACK queue returns success. Finally, the metadata node is notified to have been written.
From http://www.cnblogs.com/forfuture1978/archive/2010/03/14/1685351.html
The client uses the open () function of filesystem to open a file.
Distributedfilesystem calls the metadata node using RPC to obtain the data block information of the file.
For each data block, the metadata node returns the address of the data node that saves the data block.
Distributedfilesystem returns fsdatainputstream to the client to read data.
The client calls stream's read () function to start reading data.
Dfsinputstream connects to the nearest data node that saves the first data block of this file.
Data reads data from the data node to the client)
When this data block is read, dfsinputstream closes the connection with this data node, and then connects to the nearest data node of the next data block of this file.
When the client completes Data Reading, it calls the close function of fsdatainputstream.
During data reading, if the client fails to communicate with the data node, it tries to connect to the next data node that contains the data block.
Failed data nodes will be recorded and will not be connected later.
File Writing Process
The client calls create () to create a file.
Distributedfilesystem uses RPC to call metadata nodes and creates a new file in the file system namespace.
The metadata node first determines that the file does not exist, and the client has the permission to create the file, and then creates the new file.
Distributedfilesystem returns dfsoutputstream, which is used by the client to write data.
The client starts writing data. dfsoutputstream divides the data into blocks and writes the data queue.
Data queue is read by data streamer and notifies the metadata node to allocate data nodes to store data blocks (three data blocks are copied by default ). The allocated data nodes are placed in a pipeline.
Data streamer writes data blocks to the first data node in pipeline. The first data node sends the data block to the second data node. The second data node sends the data to the third data node.
Dfsoutputstream saves the ACK queue for the sent data block and waits for the data node in the pipeline to inform that the data has been written successfully.
If the data node fails to be written:
Close pipeline and put the data blocks in ack queue into the beginning of data queue.
The current data block is assigned a new identifier by the metadata node in the data node that has been written. After the faulty node is restarted, it can be noticed that the data block is outdated and deleted.
Failed data nodes are removed from pipeline, and other data blocks are written to the other two data nodes in pipeline.
The metadata node is notified that the data block is insufficient to copy the data block. A third backup will be created in the future.
When the client ends writing data, the close function of stream is called. This operation writes all data blocks to the data nodes in the pipeline and waits until the ACK queue returns success. Finally, the metadata node is notified to have been written.