Flume Monitoring Directory Operations

Source: Internet
Author: User
Tags hdfs dfs

  • Flume Monitoring Directory Operations
One: Flume monitoring directory operation file Requirements:
监控某个目录,若目录下面产生成符合条件的文件,flume 就抽取它到hdfs 上,目录 下可能有多中文件,比如当文件以log.tmp 结尾表示正在写,对log.tmp 文件设置size 值,就会变成一个以.log 结尾,则已经是完整文件(往往存在短暂),flume 可以抽取其中的数据,以log.completed 结尾则表示flume已经抽取完成,可以删除。
1.1 Creating a directory for extracting data
mkdir /home/hadoop/datas/spoolingmkdir /home/hadoop/datas/checkpointmkdir /home/hadoop/datas/data
1.2 Creating data to hold extracts on the HDFs directory
hdfs dfs -mkdir /spool
1.3 Preparing data, more than three, two types
1.4 Preparing the Agent configuration file
cp -p hive-conf.properties test-dir.properties
Vim Test-dir.properties
# example.conf:a Single-node Flume configuration# Name The components in this agenta3.sources = R3a3.sinks = K3a3.channel s = c3# describe/configure the Sourcea3.sources.r3.type = Spooldir A3.sources.r3.spoolDir =/home/hadoop/datas/ Spoolinga3.sources.r3.ignorePattern = ^ (.) *\\.tmp$# Describe The Sinka3.sinks.k3.type = HDFs A3.sinks.k3.hdfs.path = hdfs://namenode01.hadoop.com:8020/spool/%y% M/%da3.sinks.k3.hdfs.filetype = DataStreama3.sinks.k3.hdfs.writeFormat = Texta3.sinks.k3.hdfs.batchSize = 10# Set level two catalog by hour cut A3.sinks.k3.hdfs.round = Truea3.sinks.k3.hdfs.roundValue = 1a3.sinks.k3.hdfs.roundunit = hour# Set file rollback condition A3.sinks.k3.hdfs.rollInterval = 60a3.sinks.k3.hdfs.rollsize = 128000000a3.sinks.k3.hdfs.rollcount = 0a3.sinks.k3.hdfs.uselocaltimestamp = Truea3.sinks.k3.hdfs.minBlockReplicas = # Use a channel which buffers events in mem Orya3.channels.c3.type = File A3.channels.c3.checkpointDir =/home/hadoop/datas/checkpointa3.channels.c3.datadirs =/ home/hadoop/datas/data# Bind the source and Sink to the Channela3.sources.r3.channels = C3a3.sinks.k3.channel = C3  
1.5 Execute Collection command:
Create a file test.

Flume Monitoring Directory Operations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.