Fsck commands in Hadoop
The fsck command in Hadoop can check the file in HDFS, check whether there is upt corruption or data loss, and generate the overall health report of the hdfs file system. Report content, including:
Total blocks (Total number of blocks), Average block replication (Average number of copies), upt blocks, number of lost blocks,... and so on.
-------------------------
The command is as follows:
Old Version:
Hadoop fsck/
New Version:
Hdfs fsck/
Hdfs fsck -- help
Hdfs fsck/-files-blocks-locations
(It can be used to identify the files with lost blocks and the data nodes on which the blocks are stored)
Figure 1
-------------------------
If it is healthy, the following information is displayed:
Status: HEALTHY
If any damage occurs, the following information is displayed:
Status: upt
-------------------------
Hadoop fsck is not like Linux fsck. The former is non-destructive by default and does not correct the detected errors. Therefore, it can be executed daily for inspection. Fsck is an operation only related to metadata. All the information required by fsck can be obtained from NameNode. Therefore, it does not need to communicate with all NameNode of the cluster. However, RPC (Remote Program Calling) may have a high number of times. Therefore, it is recommended that you check the remote program during off-peak hours.
The CCAH certificate (CCA-500) will test the purpose of this command, but will not test the detailed functions and parameters, as long as you know the approximate function of fsck.
You may also like the following articles about Hadoop:
Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04
Install and configure Hadoop2.2.0 on CentOS
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition