Hadoop DFS. Replication

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the DFS. Replication parameter is a client parameter, that is, the node level parameter. It must be set on each datanode.
In fact, by default, three replicas are enough and too many replicas are useless.

When a file is uploaded to HDFS, several copies are specified. After you modify the number of copies, it does not work for uploaded files. You can specify the number of copies when uploading files.
Hadoop DFS-d dfs. Replication = 1-put 70 m logs/2

You can run the following command to change the number of copies of uploaded files:
Hadoop FS-setrep-R 3/

View the number of copies of the current HDFS
Hadoop fsck-Locations
Fsck started by hadoop from/172.18.6.112 for path/at Thu Oct 27 13:24:25 CST 2011
...... Status: Healthy
Total size: 4834251860 B
Total dirs: 21
Total files: 20
Total blocks (validated): 82 (avg. Block Size 58954290 B)
Minimally replicated blocks: 82 (100.0%)
Over-replicated blocks: 0 (0.0%)
Under-replicated blocks: 0 (0.0%)
Mis-replicated blocks: 0 (0.0%)
Default replication factor: 3
Average block replication: 3.0
Upt blocks: 0
Missing replicas: 0 (0.0%)
Number of data-nodes: 3
Number of racks: 1
Fsck ended at Thu Oct 27 13:24:25 CST 2011 in 10 milliseconds
The filesystem under path '/' is healthy

Number of copies of a file, which can be viewed through the file descriptor in LS
Hadoop DFS-ls
-RW-r -- 3 hadoop supergroup 153748148/user/hadoop/logs/201108/impression_witspixel2011080100.thin.log.gz

If you only have three datanode, But you specify the number of copies as 4, it will not take effect, because each datanode can only store one copy.
Hadoop fsck-locations can see the corresponding prompt information, you can see the copy loss rate is 33.33%:
/User/hadoop/logs/test. Log: Under replicated BLK _-45151128047308146_1147. Target replicas is 4 but found 3 replica (s ).
Status: Healthy
Total size: 4834251860 B
Total dirs: 21
Total files: 20
Total blocks (validated): 82 (avg. Block Size 58954290 B)
Minimally replicated blocks: 82 (100.0%)
Over-replicated blocks: 0 (0.0%)
Under-replicated blocks: 82 (100.0%)
Mis-replicated blocks: 0 (0.0%)
Default replication factor: 3
Average block replication: 3.0
Upt blocks: 0
Missing replicas: 82 (33.333332%)
Number of data-nodes: 3
Number of racks: 1
Fsck ended at Thu Oct 27 13:22:14 CST 2011 in 12 milliseconds
Reference: hdfs_design
Http://hadoop.apache.org/common/docs/r0.20.204.0/hdfs_design.pdf
Http://hadoop.apache.org/common/docs/r0.20.204.0/hdfs_design.html
When a file is uploaded, the client does not immediately contact namenode, but caches data locally. When HDFS block size is used, the client contacts namenode and namenode inserts the file name into the file system structure, A data block is allocated for a period of time. Namenode uses the datanode host name and data block location to request the corresponding client. The client refreshes data from the local temporary file to the specified datanode. When the file is closed, temporary files that are not refreshed will be transmitted to datanode, and the client notifies the namenode file to be closed. In this case, namenode submits the file creation operation to permanent storage. If namenode is die before file closes, the file is lost.

Create a copy
when the client writes a file to HDFS, as mentioned earlier, first write the file to a local temporary file. Assume that the copy coefficient of HDFS is set to 3. when the cached file reaches the HDFS block size, the client retrieves a datanode list from the namenode. This list contains the datanode list of the copy of the host. The client refreshes the data to the first datanode in the list. The first datanode receives data in 4 kb, writes the data locally, and transmits it to the second datanode In the list. The second datanode also performs the same operation. A datanode can obtain data from the previous data pipeline and send the data to the next data pipeline.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop DFS. Replication

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support