The shell collects data to hdfs at timed intervals

Last Update:2018-07-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

On-line websites generate log data every day. If there is a requirement: Log files generated by the day before the start of the operation at 24 o'clock in the morning are expected to be uploaded to the HDFS cluster in real time.

How to do it? Can I implement a recurring upload requirement after implementation? How to schedule?

Linux crontab::
Crontab-e
0 0 * * */shell/uploadfile2hdfs.sh//Daily 12:00

Implementation process

The logic generated by the general log file is determined by the business system, such as scrolling once per hour, or a certain size scroll once, to avoid a single log file too large inconvenient operation.

For example, the scrolling file is named access.log.x, where x is a number. The log file that is being written is called Access.log. In this case, if the log file suffix is a number such as 1\2\3, then the file to meet the requirements can be uploaded, the file is moved to the ready to upload the working range directory. After the workspace has files, you can use the Hadoop put command to upload the files.

Create a directory on the server

#日志文件存放的目录mkdir-R/root/logs/log/#待上传文件存放的目录mkdir-R/root/logs/toupload/

Writing shell scripts

VI uploadfile2hdfs.sh

#!/bin/bash#set java envexport java_home=/export/servers/jdk1.8.0_65export jre_home=${java_home}/jreexport Classpath=.:${java_home}/lib:${jre_home}/libexport path=${java_home}/bin: $PATH #set Hadoop envexport HADOOP_HOME=/ Export/servers/hadoop-2.7.4export path=${hadoop_home}/bin:${hadoop_home}/sbin: $PATH # directory where log files are stored log_src_dir=/root/ logs/log/#待上传文件存放的目录log_toupload_dir =/root/logs/toupload/#日志文件上传到hdfs的根路径date1 = ' date-d last-day +%y_%m_%d ' Hdfs_ root_dir=/data/clicklog/$date 1/#打印环境变量信息echo "Envs:hadoop_home: $HADOOP _home" #读取日志文件的目录 to determine if there are files that need to be uploaded echo "log_src _dir: "$log _src_dirls $log _src_dir | While read Filenamedo if [["$fileName" = = access.log.*]]; Then # if ["Access.log" = "$fileName"];then date= ' date +%y_%m_%d_%h_%m_%s ' #将文件移动到待上传目录并重命名 #打印信 echo "Moving $log _src_dir$filename to $log _toupload_dir" Xxxxx_click_log_$filename "$date" MV $log _src_dir$f     Ilename $log _toupload_dir "Xxxxx_click_log_$filename" $date #将待上传的文件path写入一个列表文件willDoing   echo $log _toupload_dir "Xxxxx_click_log_$filename" $date >> $log _toupload_dir "willdoing." $date fi done# Find list files Willdoingls $log _toupload_dir | grep would |grep-v "_copy_" | Grep-v "_done_" | While read Linedo #打印信息 echo "Toupload are in file:" $line #将待上传文件列表willDoing改名为willDoing_COPY_ MV $log _toupload _dir$line $log _toupload_dir$line "_copy_" #读列表文件willDoing_COPY_的内容 (one to upload file name), where line is the path cat for the list of files to be uploaded $log _toupload_dir$line "_copy_" |while read line do #打印信息 echo "puting ... $line to HDFs path ... $hdfs _root_dir "Hadoop fs-mkdir-p $hdfs _root_dir hadoop fs-put $line $hdfs _root_dir done MV $log _toupload_dir $line "_copy_" $log _toupload_dir$line "_done_" done

Set execution permissions

chmod 777 uploadfile2hdfs.sh

To add a test file execution script in/root/logs/log/

./uploadfile2hdfs.sh

Viewing phenomena in the WebUI of/root/logs/toupload/and HDFs

The shell collects data to hdfs at timed intervals

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More