1.HDFS Write process:
To write data to HDFs, the client first communicates with Namenode to confirm that it can write the file and obtain the Datanode that receives the file block, and then the client passes the file sequentially to the corresponding Datanode, and is responsible for copying the block's copy to other Datanode by the Datanode that received the block.
as shown in figure:
write detailed steps:
1, the root Namenode communication request uploads the file, Namenode checks whether the target file already exists, the parent directory exists
2, Namenode return whether can upload
3, the client will first split the file, such as a Blok block 128m, the file 300m will be cut into 3 blocks, a 128M, a 128M, a 44M request the first block to the transfer to which Datanode server
4, Namenode return to Datanode server
5, the client request a Datanode upload data (essentially an RPC call, establish pipeline), the first Datanode receive the request will continue to call the second Datanode, and then the second call the third Datanode, The entire pipeline is established and returned to the client
6, the client begins to upload the first block to a (the first to read data from the disk into a local memory cache), in packet (a packet of 64KB), of course, when writing Datanode data validation, It is not a packet through a single check, but in chunk units for the check (512byte), the first Datanode received a packet will be passed to the second, the second to the third; the first one each packet will be put into a reply queue waiting to be answered
7. When a block transfer is complete, the client requests Namenode to upload a second block server. HDFs Read process:
The file path to be read by the client is sent to the Namenode,namenode to get the meta information of the file (mainly the location information of the block) returned to the client. The client locates the block of the file and appends the data to the client to obtain the whole file according to the information returned by the Datanode.
as shown in Figure
read the detailed steps:
1, with Namenode Communication query metadata (block is located in the Datanode node), find the file block is located in the Datanode server
2. Select a Datanode (nearest principle, then random) server, request to establish socket stream
3, Datanode start to send data (from the disk to read the data into the stream, in packet to do the calibration)
4, the client is received in packet, first in the local cache, and then write to the target file, the following block block is equivalent to append to the front block block final synthesis of the final required files.