Shell script monitors flume output to HDFs file legitimacy

Source: Internet
Author: User
Tags hadoop fs

In the use of flume found due to network, HDFs and other reasons, so that after the flume collected to the HDFs log some anomalies, performance as:

1. Files that have not been closed: Files ending with tmp (default). Added to the HDFs file should be a GZ compressed file, the file with the end of TMP can not be used;

2, there is a size of 0 files, such as GZ compressed file size of 0, we take this file alone decompression found is infinite loop compression ... This can't be used directly to run MapReduce.

At present, the above two cases are found, others have not yet found. As for the above situation is not clear why, and both of these conditions will affect the normal implementation of Hive, MapReduce, 2 words directly failed,1 may lose the corresponding data.

For 2 directly delete the line, 1 of the situation we found that directly remove the TMP suffix is OK. In order to write a shell script, timed to check the HDFs file Discovery 1 removed the TMP suffix, found that 2 deleted files, the script is as follows:

1#!/bin/SH2 3CD 'dirname$0`4 5 Date=`Date-D"1 day ago"+%y/%m/%d '6 Echo "date is ${date}"7hadoop_home=/usr/lib/hadoop-0.20-mapreduce/8Datadir=/data/*/9 echo "dir is ${datadir}"Ten echo "Check HDFs file is Crrect?" One  A ifs=$ ' \ n '; for name in ' ${hadoop_home}/bin/hadoop fs-ls ${datadir}${date} ' -  Do - size= ' echo ' ${name} ' | awk ' {print $} ' the fileallname= ' echo ' ${name} ' | awk ' {print $8} ' - filenamenotmp= ' echo ${fileallname%.tmp*} ' - tmp= ' echo ${fileallname#*.gz} ' - if ["${size}" = = "0"];then + echo "${fileallname} ' s size is ${size} ..... delete it!" - ${hadoop_home}/bin/hadoop FS-RMR ${fileallname} + fi A if ["${tmp}" = = ". tmp"];then at ${hadoop_home}/bin/hadoop fs-mv ${fileallname} ${filenamenotmp} - echo "${fileallname} have changed to ${filenamenotmp} ..." - fi -  Done

Note: The above ground 8 lines, HDFS support regular. The above HDFs directory is:/data/*/2014/12/08 So, we can change according to their own needs

You can use crontab to check it regularly.

Shell script monitors flume output to HDFs file legitimacy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.