The architecture of HDFS adopts the masterslave mode. an HDFS cluster consists of one Namenode and multiple Datanode. In an HDFS cluster, there is only one Namenode node. As the central server of the HDFS cluster, Namenode is mainly responsible for: 1. Managing the Namespace of the file system in the HDFS cluster, such as opening the file system or closing the file
HDFSArchitectureMaster/slave Mode, an HDFSClusterIs composed of one Namenode and multiple Datanode.
In an HDFS cluster, there is only one Namenode node. As the central server of the HDFS cluster, Namenode is mainly responsible:
1. Manage the Namespace of the file system in the HDFS cluster, such as opening the file system, closing the file system, renaming the file or directory. In addition, any request to modify the file system namespace or attributes is recorded by Namenode.
2. Manage the client's access to files in the file system of the HDFS cluster. In fact, files are stored in Datanode as blocks, the file system client requests the file block for the operation to be executed from the Namenode (the block is stored on the specified Dadanode data node), and then completes file read/write operations through interaction with the Datanode node. Therefore, when the file system client interacts with the Namenode, only the Datanode node corresponding to the requested file block is obtained from the Namenode can the file read and write operations be performed. That is to say, the Namenode node is also responsible for determining the ing between the specified file block and the specific Datanode node.
3. Manage the status reports of the Datanode node, including the health status reports of the Datanode node and the status reports of the data blocks on the node to timely process the failed data nodes.
In an HDFS cluster, A Datanode node may exist multiple times. Generally, a node corresponds to a Datanode instance. The task of the Datanode data node process is:
1. manages the reading and writing of data stored on the node where it is located. Generally, the file system client needs to request read and write operations on the specified data node. Datanode serves as the service process of the data node to deal with the file system client. In addition, whether to create, delete, and copy file blocks is required. The Datanode data node process must be completed under the unified command and scheduling of Namenode, when a file block can be created, deleted, or copied during the interaction with NamenodeCommandBefore the file system client executes the specified operation. The operations on specific files are not actually completed by Datanode, but are performed by the client process of the file system after the Datanode license.
2. Report the status to the Namenode node. Each Datanode node periodically sends heartbeat signals and file block status reports to Namenode so that Namenode can obtain a global view of the status of Datanode nodes in the working cluster to master their statuses. If the Datanode node fails, the Namenode Schedules other Datanode nodes to copy the file blocks on the failed nodes to ensure that the number of copies of the file blocks reaches the specified number.
3. Execute Data Pipeline replication. When the file system client obtains the list of data blocks to be copied from the Namenode server process (the list contains the storage location of the specified copy, that is, a Datanode node, first, the cached file block on the client is copied to the first Datanode node. At this time, the entire block is not copied to the first Datanode before it is copied to the second Datanode node, instead, the first Datanode replicates data to the second Datanode node ,......, In this way, the assembly line copy of the file block and its block copy is completed.
As described above, three main processes exist in the HDFS cluster: Namenode process, Datanode process, and file system client process, the three processes communicate with each other based on the RPC mechanism implemented by Hadoop. the IPC model communicates with each other based on the Client/Server mode. Therefore, the above three processes have the following end-to-end communication and interaction: