HDFs reads:
The client first reads the data he needs by calling the open () function in the FileSystem object, and filesystem is an instance of Distributedfilesystem. Distributedfilesystem uses RPC protocol and Namenode communication to determine where the requested file block resides. For each returned block that contains the address of the Datanode that the block resides in, then these returned Datanode will be based on the cluster topology defined by Hadoop, datanode the distance to the client and then sort. If the client itself is a datanode, then he will read the file from the local Distributedfilesystem will return to the client an input stream object that supports file positioning Fsdatainputstream, which has a subclass Dfsdatainputstream, This object manages the IO between Namenode and Datanodewhen the above steps have been completed,Dfsdatainputstream invokes the Read () method of the parent class DataInputStream. Dfsdatainputstream contains the Datanode address of the data block of the beginning of the file, he calls the most recent block containing the Datanode node, and then repeats the call to the read function until the data on the block is read. When the last block is finished, Dfsdatainputstream closes the link and finds the next datanode of the containing block that is closest to the client.the client reads the block in the order in which the Dfsdatainputstream is opened and the Datanode connection returns the data stream, and it calls Namenode to retrieve the location of the next set of Datanode that contains the block, and when all the Datanode blocks have been read out, He'll call Fsdatainputsetream's close () function again.HDFs also takes into account the failure of the read node, and he does so: if the client and the connected Datanode fails, he reads the last datanode of the next client and logs the last Datanode's failure, This way he will not continue to connect to this block, the client will also verify the data from the Datanode pass the checksum, if the damaged block found, the client will look for the next block, to Namenode report this information, Namenode will save the update this file
Note here: When a customer orders a connection with Namenode, Namenode simply returns the Datanode address of the client request containing the block, not the data that returns the request block, and the benefit is that HDFS can be extended to a larger scale client side This is because the flow of data is distributed between Namenode processing, Namenode only return Datanode address, but also reduce the pressure of namenode, so as to avoid the increase with the client namenode into the neck bottle
HDFs writes:The client Distributedfilesystem a Create () function of the object that creates a new file in the Namenode file namespace through the RPC protocol with the Namenode link. This is a file that's not associated with Datanode .Namenode will verify that the new file does not exist in the file system with multiple validations, and that the client has permission to create the file, and the file is not created until all the checks have passed. Failure throws a IOException exception, and success returns an output stream Fsdataouputstream object that supports file positioning, which contains a Dfsdataoutputstream object that is used by the client to write data to. The client can use him to handle communication between Namenode and Datanode.
Dfsdataoutputstrean will split the files into packets, put them into the data queue, DataStream request these new packages to Namenode to assign the appropriate Datanode
Read and write data streams for HDFs