Hadoop Manual Move Block

Source: Internet
Author: User

1. For disk usage policy, refer to http://www.it165.net/admin/html/201410/3860.html

In hadoop2.0, there are two ways in which the Datanode data copy holds the disk selection policy:

The first is to follow the hadoop1.0 disk directory polling method, the implementation class: Roundrobinvolumechoosingpolicy.java

The second option is to select enough disk storage for the available space, implementing class: Availablespacevolumechoosingpolicy.java

The configuration items corresponding to the selection policy are:

  <property>    <name>dfs.datanode.fsdataset.volume.choosing.policy</name>    <value> Org.apache.hadoop.hdfs.server.datanode.fsdataset.availablespacevolumechoosingpolicy</value>  </ Property>

If not configured, the default is to use the first method of polling the selection disk to store a copy of the data, but although polling ensures that all disks can be used, there is often an imbalance in the direct data storage of individual disks, some disk storage is full, and some disks may still have a lot of storage space not being exploited , all in the hadoop2.0 cluster, it is better to configure the disk selection policy as a second, based on the amount of disk space remaining to select a copy of the disk storage data, so that all the disk can be used, but also to ensure that all the disks are utilized balanced.

There are two other parameters that are used when using the second method:

Dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold

The default value is 10737418240, which is 10G, generally using the default value, the following is the official explanation of this option:

This setting controls how much DN volumes is allowed to differ on terms of bytes of free disk space before they is Consi dered imbalanced. If The free space of the volumes is within this range of each of the volumes would be considered balanced and bloc K assignments would be do on a pure round robin basis.

This means that two values are calculated first, one is the maximum free space on all disks, and the other is the smallest free space on all disks, and if these two values are less than the threshold specified by the configuration item, then the disk selection policy of the polling mode is used to select a copy of the disk storage data. The source code is as follows:

public Boolean areallvolumeswithinfreespacethreshold () {      long leastavailable = Long.max_value;      Long mostavailable = 0;      for (Availablespacevolumepair volume:volumes) {        leastavailable = math.min (leastavailable, volume.getavailable ()) ;        mostavailable = Math.max (mostavailable, volume.getavailable ());      }      Return (mostavailable-leastavailable) < balancedspacethreshold;    }

Dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction

The default value is 0.75f, the default is generally used, the following is the official explanation of this option:
This setting controls what percentage of new block allocations would be is sent to volumes with more available disk space than Others. This setting should is in the range 0.0-1.0, though in practice 0.5-1.0, since there should is no reason to prefer tha T volumes with

This means that the percentage of copies of the data should be stored on a disk that has enough space left over. This configuration item value range is 0.0-1.0, generally take 0.5-1.0, if the configuration is too small, it will cause enough space for the disk to not actually allocate enough copies of the data, and the remaining space is not enough disk needs to store more copies of the data, resulting in unbalanced disk data storage.

This configuration requires a reboot to take effect because the disk selection policy is loaded after the local disk information is loaded after Datanode startup.

2. If you do not configure this policy, it is easy to make the internal disk usage of the node uneven, if there is more than 90% of the disk usage, you need manual intervention--Manually move the block

Manual move block need to pay attention to the problem: need to stop Datanode, the move block after the completion of restart

Cause: constant Datanodewill cause the block storage path in Datanode memory to be inconsistent with the actual storage path. When Dfs.datanode.scan twice for the bad block, it will report to Namenode, Namenode will be sent to delete the bad block after receiving the report.

Note: ①DirectoryScanner cannot detect the health of blocks moving between disks,

IiDataThe role of the .block.scanner is to periodically validate the block to detect consistency Datanode datablockscanner in its internal use of the logger to persist each block last scan time, so, Datanode the log file after startup to restore the effective time of all blocks before datanode in order to conserve system resources, its validation of the block is not only dependent on the Datablockscanner backstage (Verification_scan way), This block scan is also performed when a block is routed to a client (Remote_read mode), because Datanode must verify the block

③ in my own test, after the move block and then restart is also not a bad block, but in the production of the implementation of the bad block, and then reflect, their own test is not fully considered the situation of production, in the production of time will be much longer than the test (because the need to move more blocks AH), During this time the application has been executing, there are new blocks in the increase, of course, I do not think this is the cause of the pseudo-bad block, the reason for the impact of the test result is that the test block setting is not correct, I am a new upload a file, and then manually move a block of the file, this is the first scan Period still have to, immediately after reboot is not cause bad block (marked yellow with confirmation ....) This thing is really not research once two times can understand, or oneself is too stupid ... Hey.... , for example, the block in the test may be large time interval, scan period will not be next to each other, even if the time to restart the Datanode, the moved block may not operate in production or to use the most prudent method of insurance. The reason should be: Although scan period has a cycle, but the detection point of each block is not the same, at any time Datanode can be in a moving block checkpoint, So it is very easy to datanode the block of the checkpoint block (the block is moved and not lost, but HDFs fsck will show corrupt).

To manually move the block process:

① Stop Datanode;

②MV (using Hadoop user execution, remember to confirm that the file permissions to move to the new directory is the owner is correct, I am using ROOT to execute the script, the result Datanode himself shutdown, report the permissions problem)

③ Kai Datanode;

Hadoop Manual Move Block

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.