Hadoop cluster daily operation and maintenance

Source: Internet
Author: User
Tags scp command


(i) Back up the meta-data of Namenode
Metadata in Namenode is important, such as loss or corruption, and the entire system is unusable. Therefore, the metadata should always be backed up, preferably offsite.
1. Copy metadata to the remote site
(1) The following code copies the metadata in secondary namenode to a time-named directory and then sends it remotely to other machines via the SCP command
#!/bin/bashexport dirname=/mnt/tmphadoop/dfs/namesecondary/current/' Date +%y%m%d%h ' if [!-D ${dirname}]thenmkdir  ${dirname}cp/mnt/tmphadoop/dfs/namesecondary/current/*  ${dirname}fiscp-r ${dirname} slave1:/mnt/ Namenode_backup/rm-r ${dirname}
(2) Configure Crontab to perform this task regularly
0 0,8,14,20 * * * bash/mnt/scripts/namenode_backup_script.sh

2. Start a local Namenode daemon in the remote site and attempt to load these backup files to determine if the backup has been properly backed up.

(ii) data backup
For important data, you cannot rely entirely on HDFS, but you need to make a backup, note the following points
(1) Backup as far as possible
(2) If you use Distcp to back up to another HDFs cluster, do not use the same version of Hadoop, and avoid Hadoop itself causing data errors.

(iii) file system inspection
Periodically run the HDFs tool on the entire file system to proactively find missing or damaged blocks.
It is recommended to do it once a day.
[Email protected] ~]$ Hadoop fsck/... Omit the output (if there is an error, it appears in addition, otherwise only points appear, a point represents a file) ...  Status:healthy Total size:    14466494870 B all dirs:    502 Total Files:   1592 (Files currently being Written:2) Total blocks (validated):      1725 (avg. block size 8386373 B) minimally replicated blocks:   1725 (100.0) over-replicated blocks:        0 (0.0) Under-replicated Blocks:       648 (37.565216) mis-replicated blocks:         0 (0.0) Default replication factor:    2 Average block replication:     2.0 corrupt blocks:                0 Missing Replicas:              760 ( 22.028986%) Number of Data-nodes:          2 Number of racks:               1FSCK ended at Sun Mar 20:17:57 CST 608 millisecondsthe File System under path '/' is HEALTHY

(1) If the dfs.replication in Hdfs-site.xml is set to 3 and only 2 Datanode are implemented, the following error occurs when FSCK is executed;
/hbase/mar0109_webpage/59ad1be6884739c29d0624d1d31a56d9/il/43e6cd4dc61b49e2a57adf0c63921c09:under Replicated BLK _-4711857142889323098_6221. Target replicas is 3 but found 2 replica (s).
Note that because the original dfs.replication was 3, and then a datanode, and changed dfs.replication to 2, but the original created file will also be recorded dfs.replication 3, resulting in the above error, and cause under-replicated blocks:648 (37.565216).

(2) The Fsck tool can also be used to check which blocks are included in a file, and where the blocks are respectively
[[email protected] conf]$ Hadoop fsck/hbase/feb2621_webpage/c23aa183c7cb86af27f15d4c2aee2795/s/ 30bee5fb620b4cd184412c69f70d24a7-files-blocks-racksfsck started by Jediael from/10.171.29.191 for path/hbase/feb2621 _webpage/c23aa183c7cb86af27f15d4c2aee2795/s/30bee5fb620b4cd184412c69f70d24a7 at Sun Mar 20:39:35 CST 2015/hbase/  Feb2621_webpage/c23aa183c7cb86af27f15d4c2aee2795/s/30bee5fb620b4cd184412c69f70d24a7 21507169 Bytes, 1 block (s): Under replicated blk_7117944555454804881_3655. Target replicas is 3 but found 2 replica (s). 0. blk_7117944555454804881_3655 len=21507169 repl=2 [/default-rack/10.171.94.155:50010,/default-rack/      10.251.0.197:50010]status:healthy Total size:21507169 (B) All dirs:0 all files:1 all blocks (validated): 1 (avg. block size 21507169 B) minimally replicated blocks:1 (100.0) over-replicated blocks:0 (0.0) under-replicated blocks:1 (100.0) mis-replicated blocks:0 (0.0) Default Replication Factor:2 Average block replication:2.0 corrupt blocks:0 Missing replicas:1 (50.0) Number of data-nodes:2 number of racks:1fsck ended at Sun Mar 20:39:35 CST in 0 Milliseco Ndsthe filesystem under Path '/hbase/feb2621_webpage/c23aa183c7cb86af27f15d4c2aee2795/s/ 30bee5fb620b4cd184412c69f70d24a7 ' is HEALTHY

The use of this command is as follows:
[[email protected] ~]$ hadoop fsck-filesusage:dfsck <path> [-move |-delete |-openforwrite] [-files [-blocks]        [-locations |-racks]] <path> start checking from this path-move move corrupted files to/lost+found-delete Delete Corru pted Files-files print out files being checked-openforwrite print off files opened for WRITE-BL Ocks Print out block report-locations print out locations for every block-racks print off network to Pology for Data-node locations By default fsck ignores files opened for write, Use-openforwrite to report Such files. They is usually tagged corrupt or HEALTHY depending on their block allocation Statusgeneric options supported Are-conf & Lt;configuration file> Specify an application configuration file-d <property=value> use value for Given Property-fs <local|namenode:port> Specify a NAMENODE-JT <local|jobtracker:port>    Specify a job Tracker-files <comma separated list of files> specify comma separated files to being copied to the Map reduce Cluster-libjars <comma separated list of jars> specify Comma separated jar files to include in the CL Asspath.-archives <comma separated list of archives> specify Comma separated archives to being unarchived on the COM Pute machines. Command line Syntax isbin/hadoop command [genericoptions] [commandoptions]

For a detailed explanation, see the "Hadoop authoritative guide" P376

(d) Equalizer
Over time, the block distribution on each datanode becomes increasingly unbalanced, which reduces Mr Locality and causes some datanode to be relatively busier.

The Equalizer is a Hadoop daemon that moves blocks from the busy DN to the relatively idle DN while persisting in the block replica placement strategy, spreading the replicas to different machines, racks.

It is recommended to perform the equalizer regularly, such as daily or weekly.

(1) Run the equalizer with the following command
[Email protected] log]$ start-balancer.shstarting balancer, logging to/var/log/hadoop/ Hadoop-jediael-balancer-master.out
View the log as follows:
[Email protected] hadoop]$ pwd/var/log/hadoop[[email protected] hadoop]$ lshadoop-jediael-balancer-master.log  Hadoop-jediael-balancer-master.out[[email protected] hadoop]$ cat hadoop-jediael-balancer-master.log2015-03-01 21:08:08,027 INFO org.apache.hadoop.net.NetworkTopology:Adding A new node:/default-rack/ 10.251.0.197:500102015-03-01 21:08:08,028 INFO org.apache.hadoop.net.NetworkTopology:Adding A new node:/default-rack /10.171.94.155:500102015-03-01 21:08:08,028 INFO org.apache.hadoop.hdfs.server.balancer.balancer:0 over utilized nodes:2015-03-01 21:08:08,028 INFO org.apache.hadoop.hdfs.server.balancer.balancer:0 under utilized nodes:


(2) Equalizer will use the rate of each DN with the entire cluster usage, this "close" is specified by the-threashold parameter, the default is 10%.
(3) The bandwidth for replicating data between different nodes is limited, by default 1mb/s, which can be specified by the Dfs.balance.bandwithPerSec attribute in the Hdfs-site.xml file (in bytes).



Hadoop cluster daily operation and maintenance

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.