[Flume] using Flume to pass the Web log to HDFs example

Source: Internet
Author: User
Tags zookeeper hdfs dfs


[Flume] uses Flume to pass the Web log to HDFs example:

Create the directory where log is stored on HDFs:
$ HDFs dfs-mkdir-p/test001/weblogsflume

Specify the log input directory:
$ sudo mkdir-p/flume/weblogsmiddle

Settings allow log to be accessed by any user:
$ sudo chmod a+w-r/flume
$

To set the configuration file contents:

$ cat/mytraining/exercises/flume/spooldir.conf

#Setting component
Agent1.sources = Webserver-log-source
Agent1.sinks = Hdfs-sink
Agent1.channels = Memory-channel

#Setting Source
Agent1.sources.webserver-log-source.type = Spooldir
Agent1.sources.webserver-log-source.spooldir =/flume/weblogsmiddle
Agent1.sources.webserver-log-source.channels = Memory-channel

#Setting Sinks
Agent1.sinks.hdfs-sink.type = HDFs
Agent1.sinks.hdfs-sink.hdfs.path =/test001/weblogsflume/
Agent1.sinks.hdfs-sink.channel = Memory-channel
Agent1.sinks.hdfs-sink.hdfs.rollinterval = 0
Agent1.sinks.hdfs-sink.hdfs.rollsize = 524288
Agent1.sinks.hdfs-sink.hdfs.rollcount = 0
Agent1.sinks.hdfs-sink.hdfs.filetype = DataStream

#Setting Channels
Agent1.channels.memory-channel.type = Memory
Agent1.channels.memory-channel.capacity = 100000
agent1.channels.memory-channel.transactioncapacity = 1000

$CD/mytraining/exercises/flume/spooldir.conf

Start Flume:

$ flume-ng Agent--conf/etc/flume-ng/conf \
>--conf-file spooldir.conf \
>--name agent1-dflume.root.logger=info,console

Info:sourcing Environment Configuration script/etc/flume-ng/conf/flume-env.sh
Info:including Hadoop libraries found via (/usr/bin/hadoop) for HDFS access
Info:excluding/usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-log4j12.jar from Classpath
Info:including HBase libraries found via (/usr/bin/hbase) for HBASE access
info:excluding/usr/lib/hbase/bin/. /lib/slf4j-api-1.7.5.jar from Classpath
info:excluding/usr/lib/hbase/bin/. /lib/slf4j-log4j12.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-log4j12.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-api-1.7.5.jar from Classpath
Info:excluding/usr/lib/hadoop/lib/slf4j-log4j12.jar from Classpath
Info:excluding/usr/lib/zookeeper/lib/slf4j-api-1.7.5.jar from Classpath
Info:excluding/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar from Classpath
Info:excluding/usr/lib/zookeeper/lib/slf4j-log4j12.jar from Classpath
Info:including Hive Libraries found via () for hive access

...

-djava.library.path=:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hbase/bin/. /lib/native/linux-amd64-64 org.apache.flume.node.Application--conf-file spooldir.conf--name agent1
2017-10-20 21:07:08,929 (lifecyclesupervisor-1-0) [INFO- Org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start ( POLLINGPROPERTIESFILECONFIGURATIONPROVIDER.JAVA:61)] Configuration provider starting
2017-10-20 21:07:09,057 (conf-file-poller-0) [INFO- Org.apache.flume.node.pollingpropertiesfileconfigurationprovider$filewatcherrunnable.run ( pollingpropertiesfileconfigurationprovider.java:133)] Reloading configuration file:spooldir.conf
2017-10-20 21:07:09,300 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty (flumeconfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,302 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty (flumeconfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,302 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty (flumeconfiguration.java:931)] Added sinks:hdfs-sink agent:agent1

...

2017-10-20 21:07:09,304 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty (flumeconfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,306 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty (flumeconfiguration.java:1017)] Processing:hdfs-sink
2017-10-20 21:07:09,310 (conf-file-poller-0) [info-org.apache.flume.conf.flumeconfiguration$ Agentconfiguration.addproperty (flumeconfiguration.java:1017)] Processing:hdfs-sink
...

2017-10-20 21:07:10,398 (conf-file-poller-0)
[Info-org.apache.flume.node.application.startallcomponents (application.java:138)] Starting new configuration:{Sourcerunners:{webserver-log-source=eventdrivensourcerunner: {source:Spool Directory SOURCE Webserver-log-source: {spooldir:/flume/weblogsmiddle}}} Sinkrunners:{hdfs-sink=sinkrunner: {policy:[email Protected] countergroup:{name:null counters:{}}} Channels:{memory-channel=org.apache.flume.channel.memorychannel {Name:memory-channel}} }

...

2017-10-20 21:10:25,268 (pool-6-thread-1) [INFO- Org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents (Reliablespoolingfileeventreader.java : 238)] Last read was never committed-resetting mark position.

Incoming log to/flume/weblogsmiddle:

Cp-r/mytest/weblogs/tmp/tmpweblogs
mv/tmp/tmpweblogs/*/flume/weblogsmiddle


After a few minutes of waiting, review the changes on the HDFs:

$
$ HDFs Dfs-ls/test001/weblogsflume

-rw-rw-rw-1 training supergroup 527909 2017-10-20 21:10/test001/weblogsflume/flumedata.1508558917884-rw-rw-rw-1 Training supergroup 527776 2017-10-20 21:10/test001/weblogsflume/flumedata.1508558917885... -rw-rw-rw-1 training supergroup 527909 2017-10-20 21:10/test001/weblogsflume/flumedata.1508558917884-rw-rw-rw-1 Training supergroup 527776 2017-10-20 21:10/test001/weblogsflume/flumedata.1508558917885

In the Flume-ng startup window, press CTRL + C CTROL+Z to stop Flume running

^c
^z
[1]+ Stopped
Flume-ng Agent--conf/etc/flume-ng/conf--conf-file spooldir.conf--name agent1-dflume.root.logger=info,console
[Email protected] flume]$

[Flume] using Flume to pass the Web log to HDFs example

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.