- Flume Monitoring Directory Operations
One: Flume monitoring directory operation file Requirements:
监控某个目录,若目录下面产生成符合条件的文件,flume 就抽取它到hdfs 上,目录 下可能有多中文件,比如当文件以log.tmp 结尾表示正在写,对log.tmp 文件设置size 值,就会变成一个以.log 结尾,则已经是完整文件(往往存在短暂),flume 可以抽取其中的数据,以log.completed 结尾则表示flume已经抽取完成,可以删除。
1.1 Creating a directory for extracting data
mkdir /home/hadoop/datas/spoolingmkdir /home/hadoop/datas/checkpointmkdir /home/hadoop/datas/data
1.2 Creating data to hold extracts on the HDFs directory
hdfs dfs -mkdir /spool
1.3 Preparing data, more than three, two types
1.4 Preparing the Agent configuration file
cp -p hive-conf.properties test-dir.properties
Vim Test-dir.properties
# example.conf:a Single-node Flume configuration# Name The components in this agenta3.sources = R3a3.sinks = K3a3.channel s = c3# describe/configure the Sourcea3.sources.r3.type = Spooldir A3.sources.r3.spoolDir =/home/hadoop/datas/ Spoolinga3.sources.r3.ignorePattern = ^ (.) *\\.tmp$# Describe The Sinka3.sinks.k3.type = HDFs A3.sinks.k3.hdfs.path = hdfs://namenode01.hadoop.com:8020/spool/%y% M/%da3.sinks.k3.hdfs.filetype = DataStreama3.sinks.k3.hdfs.writeFormat = Texta3.sinks.k3.hdfs.batchSize = 10# Set level two catalog by hour cut A3.sinks.k3.hdfs.round = Truea3.sinks.k3.hdfs.roundValue = 1a3.sinks.k3.hdfs.roundunit = hour# Set file rollback condition A3.sinks.k3.hdfs.rollInterval = 60a3.sinks.k3.hdfs.rollsize = 128000000a3.sinks.k3.hdfs.rollcount = 0a3.sinks.k3.hdfs.uselocaltimestamp = Truea3.sinks.k3.hdfs.minBlockReplicas = # Use a channel which buffers events in mem Orya3.channels.c3.type = File A3.channels.c3.checkpointDir =/home/hadoop/datas/checkpointa3.channels.c3.datadirs =/ home/hadoop/datas/data# Bind the source and Sink to the Channela3.sources.r3.channels = C3a3.sinks.k3.channel = C3
1.5 Execute Collection command:
Create a file test.
Flume Monitoring Directory Operations