The connection between Namenode and Datanode

Source: Internet
Author: User

The contents of this article or reproduced from--Chao Wu meditation, or quite admire Chao Wu teacher O (∩_∩) o~

The following describes the roles played by Namenode and Datanode:

(1) NameNode

The function of Namenode is to manage the file directory structure and to manage the data node. Namenode maintains two sets of data: One is the relationship between the file directory and the data block , and the other is the relationship between the data block and the node . The previous set is static, is stored on the disk, through the fsimage and edits files to maintain, the latter set of data is dynamic, not persisted to the disk, each time the cluster starts, it will automatically establish this information .

(2) DataNode

Datanode is the real data stored in HDFs. There is a message in HDFs called block (data block). Assuming the file size is 100GB, starting at byte position 0, each (3) deployment situation

A specialized machine in a cluster is used to run Namenode, and the other machines in the cluster run a datanode. (Of course, you can also run Datanode on a machine running Namenode, or a machine line running multiple Datanode).

(4) data storage in HDFs

1. Redundant backup:

HDFs stores each file as a series of blocks of data, with a default fast size of 64MB (customizable settings). In order to fault tolerance, the file has a number of data blocks can be dead copy (the default is 3, this can be customized settings, but the General 3 is the best choice, the personal think). when Datanode is started, it traverses the local filesystem, generates a list of HDFS data blocks and local file correspondence, and sends the report to Namenode, which is the report block (Blockreport). The report block contains a list of all the blocks on the Datanode.

2. Copy storage:

HDFs clusters typically run on multiple racks, and the communication of machines on different racks requires a switch. In general, the storage strategy of the replica is more critical, the bandwidth between the nodes within the rack is greater than the bandwidth across the racks, which can affect the reliability and performance of HDFs. HDFS employs a strategy called rack-aware (rack-aware) to improve data reliability, availability, and utilization of network bandwidth. In general, the storage strategy for HDFS is to store one copy on the local rack node, one copy on the other node in the same rack, and the last copy on the nodes in different racks. This strategy reduces the data transfer between racks and improves the efficiency of write operations. Rack errors are far less than node errors, so this strategy does not affect the reliability and availability of the data.

3. Heartbeat Detection:

Namenode periodically receives heartbeat packets and block reports from each Datanode in the cluster, Namenode can validate mappings and other file system metadata based on this report. Burn to the heartbeat pack, indicating that the Datanode is working properly. If Datanode cannot send heartbeat information, Namenode will flag datanode that have not had a heartbeat recently as an outage and will not send them any I/O requests.

The connection between Namenode and Datanode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.