[Flume] uses Flume to pass the Web log to HDFs example:
Create the directory where log is stored on HDFs:
$ HDFs dfs-mkdir-p/test001/weblogsflume
Specify the log input directory:
$ sudo mkdir-p/flume/weblogsmiddle
Settings allow log to be accessed by any user:
$ sudo chmod a+w-r/flume
$
To set the configuration file contents:
$ cat/mytraining/exercises/flume/spooldir.conf
#Setting component
Agent1.sources = Webserver-log-source
Agent1.sinks = Hdfs-sink
Agent1.channels = Memory-channel
#Setting Source
Agent1.sources.webserver-log-source.type = Spooldir
Agent1.sources.webserver-log-source.spooldir =/flume/weblogsmiddle
Agent1.sources.webserver-log-source.channels = Memory-channel
#Setting Sinks
Agent1.sinks.hdfs-sink.type = HDFs
Agent1.sinks.hdfs-sink.hdfs.path =/test001/weblogsflume/
Agent1.sinks.hdfs-sink.channel = Memory-channel
Agent1.sinks.hdfs-sink.hdfs.rollinterval = 0
Agent1.sinks.hdfs-sink.hdfs.rollsize = 524288
Agent1.sinks.hdfs-sink.hdfs.rollcount = 0
Agent1.sinks.hdfs-sink.hdfs.filetype = DataStream
#Setting Channels
Agent1.channels.memory-channel.type = Memory
Agent1.channels.memory-channel.capacity = 100000
agent1.channels.memory-channel.transactioncapacity = 1000
$CD/mytraining/exercises/flume/spooldir.conf
Start Flume:
$ flume-ng Agent--conf/etc/flume-ng/conf \
>--conf-file spooldir.conf \
>--name agent1-dflume.root.logger=info,console
Info:sourcing Environment Configuration script/etc/flume-ng/conf/flume-env.sh
Info:including Hadoop libraries found via (/usr/bin/hadoop) for HDFS access
Info:excluding/usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-log4j12.jar from Classpath
Info:including HBase libraries found via (/usr/bin/hbase) for HBASE access
info:excluding/usr/lib/hbase/bin/. /lib/slf4j-api-1.7.5.jar from Classpath
info:excluding/usr/lib/hbase/bin/. /lib/slf4j-log4j12.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-log4j12.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-log4j12.jar from Classpath
Info:excluding/usr/lib/zookeeper/lib/slf4j-api-1.7.5.jar from Classpath
Info:excluding/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar from Classpath
Info:excluding/usr/lib/zookeeper/lib/slf4j-log4j12.jar from Classpath
Info:including Hive Libraries found via () for hive access
...
-djava.library.path=:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hbase/bin/. /lib/native/linux-amd64-64 org.apache.flume.node.Application--conf-file spooldir.conf--name agent1
2017-10-20 21:07:08,929 (lifecyclesupervisor-1-0) [INFO- Org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start ( POLLINGPROPERTIESFILECONFIGURATIONPROVIDER.JAVA:61)] Configuration provider starting
2017-10-20 21:07:09,057 (conf-file-poller-0) [INFO- Org.apache.flume.node.pollingpropertiesfileconfigurationprovider$filewatcherrunnable.run ( pollingpropertiesfileconfigurationprovider.java:133)] Reloading configuration file:spooldir.conf
2017-10-20 21:07:09,300 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty (flumeconfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,302 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty (flumeconfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,302 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty (flumeconfiguration.java:931)] Added sinks:hdfs-sink agent:agent1
...
2017-10-20 21:07:09,304 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty (flumeconfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,306 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty (flumeconfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,310 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty (flumeconfiguration.java:1017)] Processing:hdfs-sink
...
2017-10-20 21:07:10,398 (conf-file-poller-0)
[Info-org.apache.flume.node.application.startallcomponents (application.java:138)] Starting new configuration:{Sourcerunners:{webserver-log-source=eventdrivensourcerunner: {source:Spool Directory SOURCE Webserver-log-source: {spooldir:/flume/weblogsmiddle}}} Sinkrunners:{hdfs-sink=sinkrunner: {policy:[email Protected] countergroup:{name:null counters:{}}} Channels:{memory-channel=org.apache.flume.channel.memorychannel {Name:memory-channel}} }
...
2017-10-20 21:10:25,268 (pool-6-thread-1) [INFO- Org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents (Reliablespoolingfileeventreader.java : 238)] Last read was never committed-resetting mark position.
Incoming log to/flume/weblogsmiddle:
Cp-r/mytest/weblogs/tmp/tmpweblogs
mv/tmp/tmpweblogs/*/flume/weblogsmiddle
After a few minutes of waiting, review the changes on the HDFs:
$
$ HDFs Dfs-ls/test001/weblogsflume
-rw-rw-rw-1 training supergroup 527909 2017-10-20 21:10/test001/weblogsflume/flumedata.1508558917884-rw-rw-rw-1 Training supergroup 527776 2017-10-20 21:10/test001/weblogsflume/flumedata.1508558917885... -rw-rw-rw-1 training supergroup 527909 2017-10-20 21:10/test001/weblogsflume/flumedata.1508558917884-rw-rw-rw-1 Training supergroup 527776 2017-10-20 21:10/test001/weblogsflume/flumedata.1508558917885
In the Flume-ng startup window, press CTRL + C CTROL+Z to stop Flume running
^c
^z
[1]+ Stopped
Flume-ng Agent--conf/etc/flume-ng/conf--conf-file spooldir.conf--name agent1-dflume.root.logger=info,console
[Email protected] flume]$
[Flume] using Flume to pass the Web log to HDFs example