Hadoop can be so widely used, and the hdfs behind it silently is inseparable. As a file system that can run on hundreds of nodes, HDFs has taken a very careful look at reliability design.
Design of 3.2.1 HDFS data block multi-copy storage
As a distributed file system, HDFs uses the means to hold multiple replicas in the system (multiple replicas below), and multiple copies of the same block of data are stored on different nodes, as shown in Figure 3-2. The use of this multiple-copy method has the following advantages: 1 The use of multiple copies, allowing customers to read data from different blocks, speed up the transmission speed; 2 because the HDFs of the datanode between the network transmission data, if the use of multiple copies can determine whether the data transmission error; 3 Multiple replicas can guarantee that a datanode is invalidated without losing data.
HDFs randomly select storage nodes according to block, in order to determine whether the file error, the number of replicas defaults to 3 (note: If the number of pairs of 1 or 2, it is not able to determine the data right and wrong). Due to the cost of data transmission and error recovery, the preservation of replicas is not evenly distributed among the clusters, with respect to the distribution and maintenance of datanodehttp://www.aliyun.com/zixun/aggregation/13996.html. "> Load balancing of the more detailed content, you can refer to the following 3.4.4 section on the introduction of balancer."
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.