Flume Introduction
Flume installation 1. Unzip the flume installation package into the/itcast/directory
TAR-ZXVF/*flume Installation package *//itcast/
2. Modify the Flume configuration file: 2.1 flume-env.sh
Modify file Name:
MV Flume-env.sh.template flume-env.sh
Added java_home
to ensure that the flume
JDK used is the same as HDFS (you can use echo JAVA_HOME
the path to view the javahome used by the current machine)
2.2 Write Agent configuration file a4.conf Define the name of the agent name, source, channel, sink
A4.sources = R1
A4.channels = C1
A4.sinks = K1
Specifically define the source
A4.sources.r1.type = Spooldir #具体实现类是通过反射加载的
A4.sources.r1.spoolDir =/root/logs #监听这个目录
Specifically define Channel
A4.channels.c1.type = memory #
A4.channels.c1.capacity = 10000 #多少条数据进行一次发送
A4.channels.c1.transactionCapacity = #事物的容量
Define interceptors to add timestamps to messages
A4.sources.r1.interceptors = I1
A4.sources.r1.interceptors.i1.type= Org.apache.flume.interceptor.timestampinterceptor$builder
Specific definition sink
A4.sinks.k1.type = HDFs
A4.sinks.k1.hdfs.path = hdfs://ns1/flume/%y%m%d #根据时间动态生成
A4.sinks.k1.hdfs.filePrefix = events-#产生日志的前缀
A4.sinks.k1.hdfs.fileType = DataStream #纯文本方式接收
Do not generate files according to the number of bars
A4.sinks.k1.hdfs.rollCount = 0 #多少条flush成1个文件
Generate a file when the file on HDFs reaches 128M
A4.sinks.k1.hdfs.rollSize = 134217728 #文件达到多大时flush成一个文件
The file on HDFs reaches 60 seconds to generate a file
A4.sinks.k1.hdfs.rollInterval = #flush成一个文件的时间间隔
Assemble source, channel, sink
A4.sources.r1.channels = C1
A4.sinks.k1.channel = C1
3. Start Flume
Switch to the /itcast/apache-flume-1.5.0-bin/
directory first:
Enter the command:
Bin/flume-ng agent-n a4-c conf-f conf a4.conf-dflume.root.logger=info,console
Command explanation:
After startup, you may encounter the following errors, listed here, the wrong children's shoes:
Error 1:
Workaround: description missing jar package, copy /itcast/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar
to /itcast/apache-flume-1.5.0-bin/lib/
folder
To use the SCP command:
Error 2:
Workaround: Description Missing jar package, copy /itcast/hadoop-2.6.0/share/hadoop/common/lib/commons-configuration-.jar
to /itcast/apache-flume-1.5.0-bin/lib/
folder
Error 3:
Workaround: Copy the/itcast/hadoop-2.6.0/share/hadoop/common/lib/hadoop-auth-2.6.0.jar to the Flume/lib directory
Error 4:
FIX: Create logs directory under/root directory: Mkdir/root/logs
Error 5:
Resolution: Inform Flume ns1 of configuration information
1) Copy the Core-site.xml and Hdfs-site.xml to the Conf directory of the Flume
scp /itcast/hadoop-2.6.0/etc/hadoop/{core-site.xml, hdfs-site.xml} 192.168.1.204:/itcast/apache-flume-1.5.0-bin/conf
2) Modify the/etc/hosts file to let the host know the IP address of itcast01 and itcast02
Add mappings for itcast01 and itcast02 IP and host names
3) Copy Hadoop-hdfs-2.6.0.jar
If the following content appears and is displayed in a continuous scrolling, it is no problem, flume start successfully!
It should look like this after a successful start:
http://a1.qpic.cn/psb?/V10Zmdsw4YSoqq/fMmXfJ7K1qrritvBcVWq5Ol2M4fi3PAdfvWstPtcqSk!/b/dHQBAAAAAAAA&bo= 4qiuaqaaaaadaok!&rf=viewer_4
3.1 Write Test
Now if you drop the file into the/root/logs directory, Flume writes the contents of the file to HDFs
Execute the command first:
bin/flume-ng-n-c-f conf/a4.conf -Dflume.root.logger=INFO,console
After starting flume, place the log file Access_2013_05_30.log under the Logs folder:
Through the Web page to view HDFs, found a more directory /flume , in this directory file has 20160618 , the document is a time-named
Under /flume/20160618 this folder
question : Why is 3 files generated, and I am not writing 1? And these 3 file sizes add up just equal to the size of the log file Access_2013_05_30.log
cause : Here sink is set to scroll writes every 60 seconds or when the buffer file size reaches 134217728 bytes (128M).
By calculating the time, the write takes a few minutes, it is bound to be the second scrolling option is not sufficient, so the file is read into a portion every 60s, and then is written in HDFs.
Another type of 4.flume configuration
Source-exec
Channel-memory
Sink-logger
The startup mode is the same as before, except that the read-in configuration file is different:
Bin/flume-ng agent-n a2-f/home/hadoop/a2.conf-c conf-dflume.root.logger=info,console
a2.conf configuration file:
Define the name of the agent name, source, channel, sink
A2.sources = R1
A2.channels = C1
A2.sinks = K1
Specifically define the source
A2.sources.r1.type = Exec
A2.sources.r1.command = Tail-f/home/hadoop/a.log
Specifically define Channel
A2.channels.c1.type = Memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100
Specific definition sink
A2.sinks.k1.type = Logger
Assemble source, channel, sink
A2.sources.r1.channels = C1
A2.sinks.k1.channel = C1
Monitoring when data is written to this log file, the flume data is collected and printed on the console
A) Just like the tail–f file command
# echo 111111 >> log # echo 222222 >> log # echo 333333 >> log
Print in blocked form, append record to log file
b) Run flume with a2.conf configuration file
Copy the a2.conf file to the flume/conf directory
Create a file log under the/root directory
To run the command:
bin/flume-ng-n-f /itcast/apache-flume-1.5.0-bin/conf/a2.conf -c-Dflume.root.logger=INFO,console
Hive Getting Started--4.flume-data collection tool