Scenario: An error occurred running the Spark program
1. Error message:
17/05/09 14:30:58 WARN Scheduler. Tasksetmanager:lost task 28162.1 in stage 0.0 (TID 30490, 127.0.0.1): Java.io.IOException:Cannot obtain block length for locatedblock{bp-203532773-dfsfdf-1476004795661:blk_1080431162_6762963; getblocksize () =411; corrupt=false; offset= 0; Locs=[datanodeinfowithstorage[127.0.0.1:1004,ds-e9905a06-4607-4113-b717-709a087b8b96,disk], Datanodeinfowithstorage[127.0.0.1:1004,ds-a5046b43-4416-45d9-8ff6-44891bcdf3b8,disk], DatanodeInfoWithStorage[ 127.0.0.1:1004,ds-f6b04bbe-9555-4ac8-b06a-3317eb229511,disk]]}
2, the solution reference:
Https://community.hortonworks.com/questions/37412/cannot-obtain-block-length-for-locatedblock.html
3. Start checking files
Results of the command check: note the red font
HDFs fsck/user/admin/data/cdn/20170509-locations-blocks-files
status:healthy
Total size: 2115443944 B (Total open files size:7684855 B)
Total dirs: 1 All
files: 67353 total
symlinks: 0 (Files currently being written:367)
Total Blocks (validated): 357)
minimally replicated blocks: 67339 (100.0%)
over-replicated blocks: 0 (0.0%)
under-replicated blocks: 0 (0.0%)
mis-replicated blocks: 0 (0.0%)
Default replication factor: 3
Average block replication: 3.0
corrupt blocks: 0
Missing Replicas: 0 (0.0%) Number of
Data-nodes: 6 Number of
racks: 1
Found: 357 files are open
4. List the problematic documents again
HDFs Fsck/user/admin/data/cdn/20170509-openforwrite
Total size: 2123128799 B. Total
dirs: 1 All
files: 67720 total
symlinks: 0
Blocks (validated): 67696 (avg. block size 31362 B)
* * * * * ***************** *
corrupt FILES: 253
MISSING BLOCKS: 253
MISSING SIZE: 7473074 B
* *** * * ******************
minimally replicated blocks: 67443 (99.626274%)
over-replicated blocks: 0 (0.0%)
under-replicated blocks: 0 (0.0%)
mis-replicated blocks: 0 (0.0%)
Default replication factor: 3
Average block replication: 2.9887881
corrupt blocks: 0
Missing Replicas: 0 (0.0%) Number of
Data-nodes: 6 Number of
racks: 1
FSCK ended at Wed Ma Y 10:01:56 CST in 1357 milliseconds
The filesystem under path '/user/admin/data/cdn/20170509 ' is corrupt
(1) Find the file with the problem
Cat tmp.txt |tr '/' \ n ' |grep ngaahcs-acc |tr ': ' |awk ' {print '} ' |sort |uniq |grep-v ' 2017112318 '
(2) Best workaround: Delete tmp file
HDFs dfs-rmr/user/admin/data/cdn/20170509/*.tmp
However no solution!!
(3) After deleting the TMP file, execute
HDFs Fsck/user/admin/data/cdn/20170509-openforwrite
Or find those files in this way.
[Root@eeeee spark]# HDFs fsck/user/admin/data/cdn/20170509-openforwrite |grep "/user/admin/data/cdn//20170509"
Connecting to Namenode via http://rrrrrr:50070
/user/admin/data/cdn//20170509/ngaahcs-access.log. 201705090002.1494259322790.gz bytes, 1 block (s), Openforwrite:
/user/admin/data/cdn//20170509/ngaahcs-access.log. 201705090002.1494259322790.gz:missing 1 blocks of total size B ....
/user/admin/data/cdn//20170509/ngaahcs-access.log.705090000.1494259200039.gz 1222 Bytes, 1 block (s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.l4.201705090000.1494259200039.gz:missing 1 blocks of total size 1222
/user/admin/data/cdn//20170509/ngaahcs-access.log.c2-3l4.201705090245.1494269103909.gz 211 Bytes, 1 block (s), Openforwrite:
/user/admin/data/cdn//20170509/ngaahcs-access.log.ctsx2-3l4.201705090750.1494287404133.gz 1504 Bytes, 1 block (s), Openforwrite:
/user/admin/data/cdn//20170509/ngaahcs-access.log.ct-3l4.201705090820.1494289204450.gz 308 Bytes, 1 block (s), Openforwrite:
/user/admin/data/cdn//20170509/ngaahcs-access.log.c2-3l4.201705091545.1494315903839.gz 437 Bytes, 1 block (s), Openforwrite:
/user/admin/data/cdn//20170509/ngaahcs-access.log.sx3-3l3.201705090002.1494259321230.gz 1075 Bytes, 1 block (s), Openforwrite:
/user/admin/data/cdn//20170509/ngaahcs-access.log.cx3-3l4.201705090001.1494259260581.gz 521 Bytes, 1 block (s), Openforwrite:
/user/admin/data/cdn//20170509/ngaahcs-access.log.ct-x3-3l4.201705090001.1494259260581.gz:missing 1 Blocks of Total size
/user/admin/data/cdn//20170509/ngaahcs-access.log.ct-sx3-3l4.201705090002.1494259320807.gz 729 Bytes, 1 block (s), Openforwrite:
/user/admin/data/cdn//20170509/ngaahcs-access.log.ct-gx-gd-sx4-3l4.201705090001.1494259260236.gz 1138 Bytes, 1 Block (s), Openforwrite:
/user/admin/data/cdn//20170509/ngaahcs-access.log.ct-3l4.201705090001.1494259260236.gz:missing 1 Blocks of total Size 1138 B ........ ...........
/user/admin/data/cdn//20170509/ngaahcs-access.log.ctx9-3n3.201705090001.1494259260495.gz 2379 Bytes, 1 block (s), Openforwrite:
/user/admin/data/cdn//20170509/ngaahcs-access.log.cxq-3k1.201705090002.1494259320204.gz:missing 1 Blocks of total Size 10153/user/admin/data/cdn//20170509/ngaahcs-access.log.ctxq-3k2.201705090001.1494259260772.gz 539 bytes, 1 Block (s), Openforwrite:
/user/admin/data/cdn//20170509/ngaahcs-access.log.ct-gxq-3n1.201705090002.1494259320328.gz 1278 Bytes, 1 block (s), Openforwrite:
/user/admin/data/cdn//20170509/ngaahcs-access.log.ct-g-3n2.201705090001.1494259260696.gz 2183 Bytes, 1 block (s), Openforwrite:
Delete Files If they are not important
HDFs dfs-rmr/user/admin/data/cdn/meitu/20170509/ngaahcs-access.log.ct.201705090002.1494259322790.gz HDFs dfs -rmr/user/admin/data/cdn/meitu/20170509/ngaahcs-access.log.c.201705090002.1494259322790.gz HDFs dfs-rmr/user/ admin/data/cdn/meitu/20170509/ngaahcs-access.log.ct-.201705090000.1494259200039.gz HDFs dfs-rmr/user/admin/data/ cdn/meitu/20170509/ngaahcs-access.log.ct-.201705090000.1494259200039.gz HDFs dfs-rmr/user/admin/data/cdn/meitu/ 20170509/ngaahcs-access.log.ct-.201705090245.1494269103909.gz HDFs dfs-rmr/user/admin/data/cdn/meitu/20170509/ ngaahcs-access.log.ct-gl4.201705090750.1494287404133.gz HDFs dfs-rmr/user/admin/data/cdn/meitu/20170509/ Ngaahcs-access.log.ct-g3l4.201705090820.1494289204450.gz
Re-check
HDFs Fsck/user/admin/data/cdn/20170509-openforwrite
Total size:2115004402 B
Total Dirs:1
Total files:67337
Total symlinks:0
Total blocks (validated): 67337 (avg. block size 31409 B)
Minimally replicated blocks:67337 (100.0)
over-replicated blocks:0 (0.0)
under-replicated blocks:0 (0.0)
mis-replicated blocks:0 (0.0)
Default Replication Factor:3
Average Block replication:3.0
Corrupt blocks:0
Missing replicas:0 (0.0)
Number of Data-nodes:6
Number of Racks:1
FSCK ended at Wed-10:16:52 CST in 1329 milliseconds
The filesystem under path '/user/admin/data/cdn//20170509 ' is HEALTHY
And then run the Spark program
Note: This is not the final solution, so you need to find out why
If the file is important, you need to fix it.
View file status one by one and restore
Take this file as an example:/user/admin/data/cdn//20170508/ngaahcs-access.log.3k3.201705081700.1494234003128.gz
To perform a repair command:
HDFs Debug Recoverlease-path <path-of-the-file>-retries <retry times>
HDFs Debug Recoverlease-path/user/admin/data/cdn//20170508/ngaahcs-access.log.c00.1494234003128.gz-retries 10
Summary of Hadoop commands:
Https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#fsck