"Hadoop" HDFS data replication

Source: Internet
Author: User

To ensure the reliability of the storage file, HDFs decomposes the file into multiple sequence blocks and saves multiple copies of the data block. This is important for fault tolerance, where a copy of a block of data can be read from another node when one of the data blocks of the file is corrupted.

HDFs has a "rack-aware" strategy for placing a copy of the file because the same rack bandwidth is greater than the bandwidth across the rack, so in a system with a replication factor of 3 default, HDFs saves one copy of the backup on the local node, the other on the same rack, and the last one on the other rack nodes. This guarantees both file security and increased write and read rates (spanning only two racks). The maximum number of copies of a file is Datanode nodes, and the same node can hold only one copy of the same file. The maximum number of copies per rack is lower than the upper limit, and the upper value is calculated as: ((number of copies-1)/(rack number +2)) rounding.

Namenode will enter safe mode after each boot, and when in safe mode, Namenode will not block the data. The Namenode receives Datanode Heartbeat and block report information at this time. The block report contains a list of all the blocks of this datanode, each with a specific number of replicas, and Namenode considers the data block to be copy-safe when the block reaches the minimum number of copies. When the copy-safe data block is detected at a certain percentage (specified by the dfs.safemode.threshold.pct parameter), the Namenode exits safe mode after 30 seconds. Namenode then determines a list of blocks that do not reach the minimum number of copies and copies the blocks to other DATANODE nodes until the minimum number of copies is reached.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.