Hadoop Replica number configuration

Source: Internet
Author: User
Tags hadoop fs

A file that is specified when uploading to HDFs is a few copies. The number of replicas (dfs.replications) has been modified and will not work for files that have already been uploaded.
Of course, you can specify the number of copies to be created while uploading a file
Hadoop dfs-d dfs.replication=2-put abc.txt/tmp
You can change the number of copies of a file that has been uploaded by command:
Hadoop fs-setrep-r 2/

View the number of replicas for the current HDFs
Hadoop fsck-locations
The number of copies of a file that can be seen by the file descriptor in LS
Hadoop Dfs-ls

If you only have 3 Datanode, but you specify the number of replicas is 4, it will not take effect, because only one copy can be stored on each datanode

When a file is uploaded, the client does not immediately contact Namenode, but first caches the data locally, when HDFS block size contacts Namenode, Namenode inserts the file name into the filesystem structure and allocates a block of data for a period.
Namenode the client's request with the location of the Datanode hostname and data block. The client flushes the data from the local temporary file to the specified datanode.
When file is closed, temporary files that are not refreshed are transferred to the Datanode,client notification Namenode file is closed. At this point, Namenode submits the file creation operation to persistent storage.
If the Namenode die before file closes, the files are lost.

Create a replica
When the client writes the file to HDFs, as mentioned earlier, writes the file to the local temporary file, assuming that the set HDFs replica factor is 3. When the cached file reaches the HDFs block size, the client retrieves a list of Datanode from Namenode. The list contains a list of Datanode that will host the replica.
The client refreshes the data to the first datanode in the list.
The first datanode receives data in 4KB, writes the data locally and transfers it to the second datanode in the list, and the second datanode does the same.
A datanode can fetch data from the previous data pipeline and send the data to the next data pipeline at the same time.

Configuration parameters can be specified more than once
Highest priority value takes precedence
Order of precedence (low to High):
-*-site.xml on the slave node
-*-site.xml on the client machine
-Values set explicitly in the Jobconf object for a MapReduce job

If the value in the configuration file is marked as final, it overrides all other
<property>
<name>some.property.name</name>
<value>somevalue</value>
<final>ture</final>
</property>

For a similar number of copies, DATA.DIR,FS related parameters proposed in the Datanode node is final=true

Q: What are the dfs.replication settings for the preprocessing host?
A: Preprocessing settings dfs.replication parameters of the Hdfs-site.xml configuration file, modified after restart preprocessing service, the number of replicas problem.

The number of blocks backed up is determined by the client-side configuration that writes the data, so this type of problem is typically caused by the client's configuration.

"Reference" http://blog.sina.com.cn/s/blog_edd9ac0e0101it34.html

Hadoop Replica number configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.