[Hadoop shell命令]--處理hdfs上錯誤的block塊並修複

來源:互聯網
上載者:User

情景:運行Spark程式出現報錯

1、報錯資訊:
17/05/09 14:30:58 WARN scheduler.TaskSetManager: Lost task 28162.1 in stage 0.0 (TID 30490, 127.0.0.1): java.io.IOException: Cannot obtain block length for LocatedBlock{BP-203532773-dfsfdf-1476004795661:blk_1080431162_6762963; getBlockSize()=411; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:1004,DS-e9905a06-4607-4113-b717-709a087b8b96,DISK], DatanodeInfoWithStorage[127.0.0.1:1004,DS-a5046b43-4416-45d9-8ff6-44891bcdf3b8,DISK], DatanodeInfoWithStorage[127.0.0.1:1004,DS-f6b04bbe-9555-4ac8-b06a-3317eb229511,DISK]]}

2、解決參考:
https://community.hortonworks.com/questions/37412/cannot-obtain-block-length-for-locatedblock.html
3、開始檢查檔案

執行命令檢查的結果:注意紅色字型

hdfs fsck /user/admin/data/cdn/20170509 -locations -blocks -files Status: HEALTHY Total size:    2115443944 B (Total open files size: 7684855 B) Total dirs:    1 Total files:    67353 Total symlinks:        0 (Files currently being written: 367) Total blocks (validated):    67339 (avg. block size 31414 B) (Total open file blocks (not validated): 357) Minimally replicated blocks:    67339 (100.0 %) Over-replicated blocks:    0 (0.0 %) Under-replicated blocks:    0 (0.0 %) Mis-replicated blocks:        0 (0.0 %) Default replication factor:    3 Average block replication:    3.0 Corrupt blocks:        0 Missing replicas:        0 (0.0 %) Number of data-nodes:        6 Number of racks:        1

發現:有357個檔案處於開啟狀態


4、再列出有問題的檔案
hdfs fsck /user/admin/data/cdn/20170509 -openforwrite

Total size:    2123128799 B Total dirs:    1 Total files:    67720 Total symlinks:        0 Total blocks (validated):    67696 (avg. block size 31362 B) ************************  CORRUPT FILES:    253  MISSING BLOCKS:    253  MISSING SIZE:        7473074 B ************************ Minimally replicated blocks:    67443 (99.626274 %) Over-replicated blocks:    0 (0.0 %) Under-replicated blocks:    0 (0.0 %) Mis-replicated blocks:        0 (0.0 %) Default replication factor:    3 Average block replication:    2.9887881 Corrupt blocks:        0 Missing replicas:        0 (0.0 %) Number of data-nodes:        6 Number of racks:        1FSCK ended at Wed May 10 10:01:56 CST 2017 in 1357 milliseconds

The filesystem under path '/user/admin/data/cdn/20170509' is CORRUPT


(1)找到有問題的檔案

cat tmp.txt |tr '/' '\n' |grep ngaahcs-acc |tr ':' ' '|awk '{print $1}' |sort |uniq |grep -v "2017112318"


(2)最好的解決方案:刪除tmp檔案
hdfs dfs -rmr /user/admin/data/cdn/20170509/*.tmp

然而沒有解決!!
(3)刪除tmp檔案後,再執行
hdfs fsck /user/admin/data/cdn/20170509 -openforwrite


或者用這種方式尋找那些檔案
[root@eeeee spark]# hdfs fsck /user/admin/data/cdn/20170509 -openforwrite |grep "/user/admin/data/cdn//20170509" 
Connecting to namenode via http://rrrrrr:50070

/user/admin/data/cdn//20170509/ngaahcs-access.log..201705090002.1494259322790.gz 250 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log..201705090002.1494259322790.gz: MISSING 1 blocks of total size 250 B.......
/user/admin/data/cdn//20170509/ngaahcs-access.log.705090000.1494259200039.gz 1222 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log.l4.201705090000.1494259200039.gz: MISSING 1 blocks of total size 1222
/user/admin/data/cdn//20170509/ngaahcs-access.log.C2-3l4.201705090245.1494269103909.gz 211 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log.CTSX2-3l4.201705090750.1494287404133.gz 1504 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-3l4.201705090820.1494289204450.gz 308 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log.C2-3l4.201705091545.1494315903839.gz 437 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log.SX3-3l3.201705090002.1494259321230.gz 1075 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log.CX3-3l4.201705090001.1494259260581.gz 521 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-X3-3l4.201705090001.1494259260581.gz: MISSING 1 blocks of total size 
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-SX3-3l4.201705090002.1494259320807.gz 729 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-GX-GD-SX4-3l4.201705090001.1494259260236.gz 1138 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-3l4.201705090001.1494259260236.gz: MISSING 1 blocks of total size 1138 B.........................
/user/admin/data/cdn//20170509/ngaahcs-access.log.CTX9-3n3.201705090001.1494259260495.gz 2379 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log.CXq-3k1.201705090002.1494259320204.gz: MISSING 1 blocks of total size 10153 /user/admin/data/cdn//20170509/ngaahcs-access.log.CTXq-3k2.201705090001.1494259260772.gz 539 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-GXq-3n1.201705090002.1494259320328.gz 1278 bytes, 1 block(s), OPENFORWRITE: 
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-G-3n2.201705090001.1494259260696.gz 2183 bytes, 1 block(s), OPENFORWRITE: 

如果檔案不重要則刪除他們

 hdfs dfs -rmr   /user/admin/data/cdn/meitu/20170509/ngaahcs-access.log.CT.201705090002.1494259322790.gz hdfs dfs -rmr   /user/admin/data/cdn/meitu/20170509/ngaahcs-access.log.C.201705090002.1494259322790.gz hdfs dfs -rmr   /user/admin/data/cdn/meitu/20170509/ngaahcs-access.log.CT-.201705090000.1494259200039.gz hdfs dfs -rmr   /user/admin/data/cdn/meitu/20170509/ngaahcs-access.log.CT-.201705090000.1494259200039.gz hdfs dfs -rmr   /user/admin/data/cdn/meitu/20170509/ngaahcs-access.log.CT-.201705090245.1494269103909.gz hdfs dfs -rmr   /user/admin/data/cdn/meitu/20170509/ngaahcs-access.log.CT-Gl4.201705090750.1494287404133.gz hdfs dfs -rmr   /user/admin/data/cdn/meitu/20170509/ngaahcs-access.log.CT-G3l4.201705090820.1494289204450.gz

再檢查
hdfs fsck /user/admin/data/cdn/20170509 -openforwrite
Total size: 2115004402 B
Total dirs: 1
Total files: 67337
Total symlinks: 0
Total blocks (validated): 67337 (avg. block size 31409 B)
Minimally replicated blocks: 67337 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 6
Number of racks: 1
FSCK ended at Wed May 10 10:16:52 CST 2017 in 1329 milliseconds

The filesystem under path '/user/admin/data/cdn//20170509' is HEALTHY

然後再運行spark程式

註:這不是最終解決方案,所以需要查明原因


如果檔案重要,則需要修複。
一個一個地查看檔案狀態並且恢複
以這個檔案為例:/user/admin/data/cdn//20170508/ngaahcs-access.log.3k3.201705081700.1494234003128.gz


執行修複命令:

hdfs debug recoverLease -path <path-of-the-file> -retries <retry times>
hdfs debug recoverLease -path /user/admin/data/cdn//20170508/ngaahcs-access.log.C00.1494234003128.gz -retries 10


hadoop 命令匯總:

https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#fsck





相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.