Flume and Kakfa example (KAKFA as Flume sink output to Kafka topic)
To prepare the work:
$sudo mkdir-p/flume/web_spooldir
$sudo chmod a+w-r/flume
To edit a flume configuration file:
$ cat/home/tester/flafka/spooldir_kafka.conf
# Name The components in this agent
Agent1.sources = Weblogsrc
Agent1.sinks = Kafka-sink
Agent1.channels = Memchannel
# Configure The source
Agent1.sources.weblogsrc.type = Spooldir
Agent1.sources.weblogsrc.spoolDir =/flume/web_spooldir
Agent1.sources.weblogsrc.channels = Memchannel
# Configure The sink
Agent1.sinks.kafka-sink.type = Org.apache.flume.sink.kafka.KafkaSink
Agent1.sinks.kafka-sink.topic = weblogs
Agent1.sinks.kafka-sink.brokerlist = localhost:9092
Agent1.sinks.kafka-sink.batchsize = 20
Agent1.sinks.kafka-sink.channel = Memchannel
# Use a channel which buffers events in memory
Agent1.channels.memchannel.type = Memory
Agent1.channels.memchannel.capacity = 100000
agent1.channels.memchannel.transactionCapacity = 1000
$
Run Flume-ng:
$ flume-ng Agent--conf/etc/flume-ng/conf \
>--conf-file spooldir_kafka.conf \
>--name agent1-dflume.root.logger=info,console
The output looks like this:
Info:sourcing Environment Configuration script/etc/flume-ng/conf/flume-env.sh
Info:including Hadoop libraries found via (/usr/bin/hadoop) for HDFS access
Info:excluding/usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-log4j12.jar from Classpath
Info:including HBase libraries found via (/usr/bin/hbase) for HBASE access
info:excluding/usr/lib/hbase/bin/. /lib/slf4j-api-1.7.5.jar from Classpath
info:excluding/usr/lib/hbase/bin/. /lib/slf4j-log4j12.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-log4j12.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-log4j12.jar from Classpath
Info:excluding/usr/lib/zookeeper/lib/slf4j-api-1.7.5.jar from Classpath
Info:excluding/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar from Classpath
Info:excluding/usr/lib/zookeeper/lib/slf4j-log4j12.jar from Classpath
Info:including Hive Libraries found via () for hive access
+ EXEC/USR/JAVA/DEFAULT/BIN/JAVA-XMX500M-DFLUME.ROOT.LOGGER=INFO,CONSOLE-CP '/etc/flume-ng/conf:/usr/lib/flume-
ng/lib/*:/etc/hadoop/conf:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/ Apacheds-i18n-2.0.0-m15.jar
...
-djava.library.path=:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hbase/bin/. /lib/native/linux-amd64-64
Org.apache.flume.node.Application--conf-file spooldir_kafka.conf--name agent1
2017-10-23 01:15:11,209 (lifecyclesupervisor-1-0) [INFO- Org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start
(pollingpropertiesfileconfigurationprovider.java:61)] Configuration Provider Starting
2017-10-23 01:15:11,223 (conf-file-poller-0) [INFO- Org.apache.flume.node.PollingPropertiesFileConfigurationProvider
$FileWatcherRunnable. Run (pollingpropertiesfileconfigurationprovider.java:133)] Reloading configuration file: Spooldir_kafka.conf
2017-10-23 01:15:11,256 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty
(flumeconfiguration.java:1017)] Processing:kafka-sink
...
2017-10-23 01:15:11,933 (lifecyclesupervisor-1-3) [INFO- Org.apache.flume.instrumentation.MonitoredCounterGroup.start
(monitoredcountergroup.java:96)] Component Type:source, NAME:WEBLOGSRC started
2017-10-23 01:15:13,003 (lifecyclesupervisor-1-1) [Info-kafka.utils.logging$class.info (Logging.scala:68)] Verifying properties
2017-10-23 01:15:13,271 (lifecyclesupervisor-1-1) [Info-kafka.utils.logging$class.info (Logging.scala:68)] Property
Key.serializer.class is overridden to Kafka.serializer.StringEncoder
2017-10-23 01:15:13,271 (lifecyclesupervisor-1-1) [Info-kafka.utils.logging$class.info (Logging.scala:68)] Property
Metadata.broker.list is overridden to localhost:9092
2017-10-23 01:15:13,277 (lifecyclesupervisor-1-1) [Info-kafka.utils.logging$class.info (Logging.scala:68)] Property
Request.required.acks is overridden to 1
2017-10-23 01:15:13,277 (lifecyclesupervisor-1-1) [Info-kafka.utils.logging$class.info (Logging.scala:68)] Property Serializer.class
Is overridden to Kafka.serializer.DefaultEncoder
2017-10-23 01:15:13,718 (lifecyclesupervisor-1-1) [INFO- Org.apache.flume.instrumentation.MonitoredCounterGroup.register
(monitoredcountergroup.java:120)] Monitored counter Group for Type:sink, name:kafka-sink:successfully registered new MBean.
2017-10-23 01:15:13,719 (lifecyclesupervisor-1-1) [INFO- Org.apache.flume.instrumentation.MonitoredCounterGroup.start
(monitoredcountergroup.java:96)] Component Type:sink, Name:kafka-sink started
...
2017-10-23 01:15:13,720 (pool-4-thread-1) [INFO- Org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents
(reliablespoolingfileeventreader.java:258)] Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2017-10-23 01:15:13,720 (pool-4-thread-1) [INFO- Org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile
(reliablespoolingfileeventreader.java:348)] Preparing to move File/flume/web_spooldir/2014-01-13.log to
/flume/web_spooldir/2014-01-13.log.completed
..
2017-10-23 01:16:11,441 (pool-4-thread-1) [INFO- Org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents
(reliablespoolingfileeventreader.java:258)] Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2017-10-23 01:16:11,451 (pool-4-thread-1) [INFO- Org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile
(reliablespoolingfileeventreader.java:348)] Preparing to move File/flume/web_spooldir/2014-01-24.log to
/flume/web_spooldir/2014-01-24.log.completed
2017-10-23 01:16:11,818 (pool-4-thread-1) [INFO- Org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents
(reliablespoolingfileeventreader.java:258)] Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2017-10-23 01:16:11,819 (pool-4-thread-1) [INFO- Org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile
(reliablespoolingfileeventreader.java:348)] Preparing to move File/flume/web_spooldir/2014-02-15.log to
/flume/web_spooldir/2014-02-15.log.completed
Execute Kafka Consumer Program:
$kafka-console-consumer--zookeeper localhost:2181--topic weblogs
In a different terminal window, enter the Web log into the/flume/web_spooldir directory:
Cp-rf/home/tester/weblogs/tmp/tmp_weblogs
mv/tmp/tmp_weblogs/*/flume/web_spooldir
Rm-rf/tmp/tmp_weblogs
Flume-ng the contents of the window display (transferring the log file to Kafka topic weblogs):
2017-10-23 01:36:28,436 (pool-4-thread-1) [INFO- org.apache.flume.client.avro.reliablespoolingfileeventreader.readevents
( reliablespoolingfileeventreader.java:258)] Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2017-10-23 01:36:28,449 (pool-4-thread-1) [INFO- org.apache.flume.client.avro.reliablespoolingfileeventreader.rollcurrentfile
( reliablespoolingfileeventreader.java:348)] Preparing to move File/flume/web_spooldir/2013-09-22.log to
/ Flume/web_spooldir/2013-09-22.log. Completed
2017-10-23 01:36:28,971 (pool-4-thread-1) [INFO- org.apache.flume.client.avro.reliablespoolingfileeventreader.readevents
( reliablespoolingfileeventreader.java:258)] Last read took us just up to a file boundary. Rolling to the next file, if there is one.
...
2017-10-23 01:37:39,011 (pool-4-thread-1) [INFO- Org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile
(reliablespoolingfileeventreader.java:348)] Preparing to move File/flume/web_spooldir/2014-02-19.log to
/flume/web_spooldir/2014-02-19.log.completed
2017-10-23 01:37:39,386 (pool-4-thread-1) [INFO- Org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents
(reliablespoolingfileeventreader.java:258)] Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2017-10-23 01:37:39,386 (pool-4-thread-1) [INFO- Org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile
(reliablespoolingfileeventreader.java:348)] Preparing to move File/flume/web_spooldir/2014-03-09.log to
/flume/web_spooldir/2014-03-09.log.completed
Consumer window, output the contents of all Web files (Receive topic weblogs, get all Web log content):
...
213.125.211.10-66543 [09/mar/2014:00:00:14 +0100] "get/kbdoc-00131.html http/1.0" 9807 "http://www.tester.com" "Te Ster
Test 001 "
213.125.211.10-66543 [09/mar/2014:00:00:14 +0100] "Get/theme.css http/1.0", 6448 "http://www.tester.com" Tester tes T 002 "
$kafka-console-consumer--zookeeper localhost:2181--topic weblogs
[Flume][kafka]flume and Kakfa example (KAKFA as Flume sink output to Kafka topic)