Data management and fault tolerance in HDFs
1. Placement of data blocks
Each data block 3 copies, just like above database A, this is because the data in the transmission process of any node is likely to fail (no way, cheap machine is like this), in order to ensure that the data can not be lost, so there are 3 copies, so that the hardware fault tolerance, ensure the accuracy of data transmission process.
3 copies of the data, placed on two racks. For example, there are 2 copies of rack 1 above, and 1 copies of Rack 2.
(1) If the following DATANODE1 data block is not available, the DataNode2 and DataNode3 on rack 1 can fetch data or fetch data on rack 2;
(2) If this is the case, the rack 1 is not available, we can take the data on the rack 2;
2. Heartbeat detection
Datanode sends a heartbeat message to namenode,namenode every once in a while to determine the state of Datanode by parsing these heartbeat messages, such as determining which datanode are hanging and which are available .
3. Level Two Namenode
There is only one Namenode node, but if Namenode fails, the entire cluster will have problems, so there is a namenode backup, which is secondarynamenode. The metadata is periodically synced to this secondarynamenode.
Note that in Namenode normal state, the Secondarynamenode here will only receive backups and will not receive requests .
If the Namenode fails, the following:
Secondarynamenode will replace Namenode as the main namenode.
Big Data Note 05: HDFs for Big Data Hadoop (data management strategy)