Preface
HDFS is a file storage and delivery system that can be extended for large-scale data usage. is a file system that allows files to be shared across multiple hosts on a network, allowing multiple users on multiple machines to share files and storage space. Let's actually access the file through the network action, by the program and the user, it is like accessing the local disk generally. Even if some nodes are offline in the system, the system can continue to operate without any data loss as a whole.
I. HDFS architecture
1, Namenode
Namenode is the management node for the entire file system. It maintains the file directory tree of the entire file system, the meta-information of the file/directory, and the list of data blocks corresponding to each file, receiving the user's action request.
Documents include:
①fsimage: Metadata image file. Stores Namenode memory metadata information for a certain period of time.
②edits: Operation log file.
③fstime: Time to save last checkpoint
These files are stored in the Linux file system. Set by the Dfs.namenode.name.dir property of the Hdfs-site.xml.
View Namenode's fsimage and edits content
The contents of this two file cannot be viewed directly using a normal text editor, and fortunately Hadoop has a dedicated tool for viewing the contents of the file, Oev and OIV, which can be executed using HDFS calls.
Start server: Bin/hdfs oiv-i a fsimage file
bash$ Bin/hdfs oiv-i Fsimage
14/04/07 13:25:14 INFO OfflineImageViewer.WebImageViewer:WebImageViewer started.
Listening on/127.0.0.1:5978. Press CTRL + C to stop the viewer.
View content: Bin/hdfs dfs-ls-r webhdfs://127.0.0.1:5978/
bash$ Bin/hdfs Dfs-ls webhdfs://127.0.0.1:5978/
Found 2 Items
drwxrwx–*-root supergroup 0 2014-03-26 20:16 webhdfs://127.0.0.1:5978/tmp
Drwxr-xr-x-root supergroup 0 2014-03-31 14:08 Webhdfs://127.0.0.1:5978/user
Export the contents of Fsimage: Bin/hdfs oiv-p xml-i
Tmp/dfs/name/current/fsimage_0000000000000000055-o Fsimage.xml
bash$ Bin/hdfs oiv-p xml-i fsimage-o fsimage.xml
0000055-o Fsimage.xml
View Edtis's content: Bin/hdfs oev-i
Tmp/dfs/name/current/edits_0000000000000000057-0000000000000000186-o Edits.xml
bash$ Bin/hdfs Oev-i
Tmp/dfs/name/current/edits_0000000000000000057-0000000000000000186-o Edits.xml
2, Datanode
A storage service that provides real-world file data.
File Block: The most basic unit of storage.
For the file content, the length of a file is size, then starting from the 0 offset of the file, according to the fixed size, the order of the file is divided and numbered, divided each block is called a block. HDFS default block size is 128MB, therefore, a 256MB file, a total of 256/128=2 block.
Unlike the normal file system, in HDFs, if a file is smaller than the size of a block of data, it does not occupy the entire block of storage space.
Replication: Multiple replicas. The default is three. Set by the Dfs.replication property of the Hdfs-site.xml.
second, data storage operations
1. Data storage: Block
The default data block size is 128MB and configurable. If the file size is less than 128MB, it will be saved as a single block.
Why are chunks of data so large?
Data transfer time exceeded seek time (high throughput rate)
How is a file stored?
By size is cut into several blocks, stored on different nodes, by default each block has three copies.
HDFS Block design concept: What blocks a file consists of. The nodes on which a block is stored. Benefits: Easy allocation to individual nodes. As follows:
Block1:node1,node2,node3
Block2:node2,node3,node4
Block3:node4,mode5,node6
Block4:node5,node6.node7
2. Data storage: Staging
When the HDFs client uploads data to HDFs, it first caches the data locally and requests Namenode to allocate a block when the data reaches a block size. Namenode will tell the HDFS client the address of the Datanode where the block resides. The HDFS client communicates directly with the Datanode and writes the data to a block file in the Datanode node.
3. Data storage: Read File operation
1. The first call to the FileSystem object's Open method is actually an example of a distributedfilesystem.
2.DistributedFileSystem the locations of the first block of the file obtained by RPC, the same block returns multiple locations according to the number of repetitions, these locations are sorted by the Hadoop topology, Close to the front of the client.
3. The first two steps return a Fsdatainputstream object that is encapsulated Dfsinputstream object, Dfsinputstream can easily manage datanode and namenode traffic. The client calls the Read method, and Dfsinputstream most likely finds the closest datanode to the client and connects.
4. Data flows from Datanode to the client stream.
5. If the first piece of data is read, the Datanode connection to the first block is closed and the next block is read. These operations are transparent to the client, and the client's point of view is simply to read a continuous stream.
6. If the first block has been read, Dfsinputstream will go to Namenode to take a block of locations, and then continue to read, if all the blocks are read, then it will be closed off all streams.
If the communication between Dfsinputstream and Datanode is abnormal at the time of reading the data, it will try to sort the second near datanode of the block being read, and it will record which datanode an error occurred. The remaining blocks will skip the Datanode as soon as they are read. Dfsinputstream also checks the block data checksum, and if a bad block is found, it is reported to the Namenode node and then Dfsinputstream on the other Datanode to read the block's image.
The design is that the client connects directly to the Datanode to retrieve the data and Namenode is responsible for providing the optimal datanode for each block, Namenode only the block location request, which is loaded in Namenode memory , HDFs can withstand concurrent access by a large number of clients through the Datanode cluster.
4. Data storage: Write file Operation
1. The client creates a new file by calling the Create method of Distributedfilesystem.
2.DistributedFileSystem through RPC call Namenode to create a new file without blocks association, before the creation, Namenode will do a variety of checks, such as whether the file exists, the client has no permissions to create and so on. If the checksum is passed, the Namenode will record the new file or an IO exception will be thrown.
3. At the end of the first two steps, the Fsdataoutputstream object is returned, similar to the time the file was read, and Fsdataoutputstream encapsulated as Dfsoutputstream. Dfsoutputstream can coordinate Namenode and datanode. The client begins to write data to Dfsoutputstream,dfsoutputstream to cut the data into small packet, and then queue to data quene.
4.DataStreamer will be processed to accept data quene, it first asked Namenode this new block is most suitable for storage in which several datanode (such as repeat number is 3, then find 3 most suitable datanode), Lined them up in a pipeline. Datastreamer packet the output to the first datanode of the pipeline, the first Datanode outputs the packet to the second Datanode, and so on.
5.DFSOutputStream There is also a pair of columns called Ack Quene, also composed of packet, waiting for Datanode to receive a response, when all Datanode in pipeline indicated that it had been received, then AKC Quene will remove the corresponding packet bag.
If a datanode error occurs during the writing process, the following steps are taken:
1) The pipeline is shut off;
2) in order to prevent packet loss ACK Quene in the packet will be synchronized to the data quene;
3) Delete The block that is currently written but not completed on the Datanode that produced the error;
4) The remainder of the block is written to the remaining two normal datanode;
5) Namenode Find another datanode to create the copy of this block. Of course, these operations are not perceptible to the client.
6. After the client finishes writing the data, call the Close method to close the write stream.
7.DataStreamer Brush the remaining packets into the pipeline, and then wait for the ACK message, after receiving the last Ack, notify Datanode that the file is marked as completed.
Note: After the client performs the write operation, the block that is written is visible, the block being written is invisible to the client, and only the sync method is called, and the client ensures that the write operation of the file is complete, and when the client calls the Close method, the The sync method is called by default. Whether you need to call manually depends on the tradeoff between your data robustness and throughput rates depending on your program needs.
HDFs Original understanding (overall architecture, read and write operation Flow)