1. File Reading
What is the interaction between the client and HDFS and the data stream between namenode and each datanode when the client performs the read operation? The following is a detailed description of figure 1.
Figure 1 client reads data from HDFS
1) The client reads the required data by calling the open () function in the filesystem object. Filesystem is an instance of distributedfilesystem in HDFS.
2) distributedfilesystem uses the RPC protocol to call namenode to determine the location of the requested file block.
It should be noted that namenode will only return the first several blocks in the called file rather than all. Each returned block contains the datanode address of the block. Then, the returned datanode will get the client distance according to the cluster Topology Defined by hadoop, and then sort it. If the client itself is a datanode, it reads files from the local device. Second, distributedfilesystem returns an input stream object fsdatainputstream that supports file location to the client for reading data from the client. Fsdatainputstream contains a dfsinputstream object, which is used to manage the IO between datanode and namenode.
3) when the preceding steps are completed, the client will call the read () function on the input stream.
4) The dfsinputstream object contains the datanode address of the data block where the start part of the file is located. First, it connects to the latest datanode of the first part of the file. Then, the read () function is repeatedly called in the data stream until the block is fully read.
5) when the first block is read, dfsinputstream closes the connection and finds the datanode that stores the next database closest to the client. These steps are transparent to the client.
6) The client reads the data stream returned by the connection to datanode Based on the dfsinputstream. It also calls namenode to retrieve the location information of the next block. When all files are read, the client will call the close () function in dfsinputstream.
What if HDFS fails when the client is reading data? Currently, HDFS handles this problem: if the client fails to read the connected datanode, it will try to connect to the next nearest datanode of the block, at the same time, it will record the fault of this node to avoid further connection to this node. The client also verifies the data checksum transmitted from datanode. If a damaged block is found, the client will try to read the data block from another datanode and report this information to namenode. namenode will also update the saved file information.
One of the key points of attention here is that the client obtains the most appropriate datanode address through the namenode guide and then directly connects datanode to read data. The advantage of this design is that HDFS can be extended to a larger scale of client parallel processing, because the data flow is distributed across all datanode, and the pressure on namenode is also reduced, this allows namenode to only provide the location information of the request block, instead of providing data. This avoids namenode becoming a System Bottleneck as the number of clients increases.
2. File writing
What is the writing process of files in HDFS? See figure 2.
Figure 2 data written by the client in HDFS
1) The client creates a file by calling the CREATE () function in the distributedfilesystem object. Distributedfilesystem creates a new file in the file system namespace of namenode by calling rpc. No related datanode is available yet.
2) namenode ensures that the new file does not exist in the file system through multiple verifications and that the client is requested to have the permission to create the file. When all the verification passes, namenode creates a record for the new file. If the creation fails, an ioexception is thrown. If the creation succeeds, distributedfilesystem returns a fsdataoutputstream for the client to write data. Here, both fsdataoutputstream and fsdataoutputstream contain a data stream object dfsoutputstream, which is used by the client to process and communicate with datanode and namenode.
3), 4) when the client writes data, dfsoutputstream splits the file into a package and puts it into an internal queue, which is called a "data queue ". Datastreamer puts these small packages into the data stream. datastreamer requests namenode to allocate a proper datanode storage copy for the new package. The returned datanode list forms a "Pipeline". If the number of replicas is 3, there will be three datanode in this pipeline. Datastreamer transmits the package to the first datanode in the queue as a stream. The first datanode will store the package, push it to the second datanode, and then proceed as follows until the last datanode in the pipeline.
5) dfsoutputstream also saves an internal queue of a package to wait for the datanode in the pipeline to return confirmation information. This queue is called a confirmation Queue (ask Queue ). Only when datanode in all pipelines returns a successfully written information package will it be deleted from the validation queue.
Of course, HDFS will consider writing failures. When the Data Writing node fails, HDFS will respond as follows. first, the MPs queue is closed, and any packages in the validation notification queue are added to the front-end of the data queue, so that data in the failed datanode In the MPs queue will not be lost. The file block currently stored on normal working datanode will be assigned a new identity and associated with namenode. In this way, if the failed datanode is recovered from the fault over a period of time, some data blocks are deleted. Then, the pipeline will delete the failed datanode, and the file will continue to be written to the other two datanode in the pipeline. Finally, namenode will notice that the current number of file block copies has not reached the Configuration Attribute requirements, and will reschedule the creation of a copy on another datanode. The subsequent file will normally perform the write operation.
Of course, during file block writing, multiple datanode may be faulty at the same time, but it is very small. As long as the DFS. Replication. Min attribute value (1 by default) is successfully written, this file block will be asynchronously copied to other datanode until the DFS. replictaion attribute value is satisfied (3 by default ).
6) after the client successfully writes data, it will call the close () function to close the data stream. In this step, all the remaining packages will be placed in the datanode pipeline before the namenode validation file is fully written, waiting for confirmation by notification. Namenode will know which blocks constitute a file (obtain the block location information through datastreamer), so that namenode will wait until the block is minimized before the success mark is returned (DFS. replication. min) Copy.
References: hadoop version 2nd