Hadoop HDFS Data Block exploration

Source: Internet
Author: User

1. Location of file storage

Example View

./bin/hadoop Fsck/data/bb/bb.txt-files-blocks-racks–locations

blk_1076386829_2649976 is a meta file name, specifically how to find the meta files, you can use the Find command, we can see the files stored in 117 and 229 of the two machines, such as we log on to the 117 machine.

First path to Dfs.datanode.data.dir (if forgotten, can be viewed in $hadoop_home/etc/hadoop/hdfs-site.xml)

My machine is configured as follows:

Execute the Find statement in 3 directories, as shown in the example command:

Find/data1/hdfs1/data/current/bp-236683338-10.207.0.217-1403487328282/current-name Blk_1076386829_2649976.meta

The meta file is eventually found. As follows:

This will also find your file, you can view the cat blk_1076386829.

A simple simulation of one of the data block corruption, after the data block corruption, Before the node executes Directoryscan (Dfs.datanode.directoryscan.interval decision), no corruption is found, before the data block information is reported to Namenode (Dfs.blockreport.intervalMsec decision), will not recover data blocks, and will not take recovery measures until Namenode receives block information

The real situation will certainly be more complicated, and you can learn from this simple process the two parameters that are mentioned at the beginning.

Parameter configuration

Two main parameters in Hdfs-site.xml configuration in the next

<property>    <name>dfs.namenode.secondary.http-address</name>    <value>master:9001 </value></property><property>  <name>dfs.blockreport.intervalMsec</name>    <value>600000</value>      <description>determines block reporting interval in milliseconds.</ Description></property><property>  <name>dfs.datanode.directoryscan.interval</name >    <value>600</value>    </property>

It's all 10 minutes.

Log details

2016-06-14 21:48:51,083 INFO Org.apache.hadoop.hdfs.server.datanode.DirectoryScanner:BlockPool bp-660628275-192.168.1.100-1464787466998 total blocks:1, missing metadata files:1, missing block files:1, missing blocks In memory:0, mismatched blocks:0
2016-06-14 21:48:51,084 WARN Org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:Removed block 1073741825 from memory with Missing block file on the disk
2016-06-14 21:49:17,168 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Blockreport of 1 blocks took 0 msec to generate and 1 msecs for RPC and NN processing
2016-06-14 21:49:17,169 INFO org. Apache.hadoop.hdfs.server.datanode.DataNode:sent block report, processed command:[email protected]
2016-06-14 21:49:20,977 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:Receiving bp-660628275-192.168.1.100-1464787466998:blk_1073741825_1001 src:/192.168.1.101:53718 dest:/192.168.1.102:50010

2016-06-14 21:49:20,984 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:Received bp-660628275-192.168.1.100-1464787466998:blk_1073741825_1001 src:/192.168.1.101:53718 dest:/192.168.1.102:50010 of size 1366

Hadoop HDFS Data Block exploration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.