Flume a data source corresponding to multiple channel, multiple Sink__flume

Source: Internet
Author: User

Original link: Http://www.tuicool.com/articles/Z73UZf6


The data collected on the HADOOP2 and HADOOP3 are sent to the HADOOP1 cluster and HADOOP1 to a number of different purposes.



I. Overview

1, now there are three machines, respectively: HADOOP1,HADOOP2,HADOOP3, HADOOP1 for the log summary

2, HADOOP1 Summary of the simultaneous output to multiple targets

3, flume a data source corresponding to multiple channel, multiple sink, is configured in the consolidation-accepter.conf file

Ii. deployment of Flume to collect logs and summary logs

1, running on the HADOOP1

Flume-ng agent--conf./F Consolidation-accepter.conf-n Agent1-dflume.root.logger=info,console

The contents of its script (consolidation-accepter.conf) are as follows

# Finally, now we ' ve defined all of our components, tell # agent1 which ones we want to activate. Agent1.channels = ch1 CH2 agent1.sources = source1 agent1.sinks = hdfssink1 sink2 Agent1.source.source1.selector.type = re  Plicating # Define A memory channel called CH1 on agent1 agent1.channels.ch1.type = Memory Agent1.channels.ch1.capacity =

1000000 agent1.channels.ch1.transactionCapacity = 1000000 agent1.channels.ch1.keep-alive = 10 Agent1.channels.ch2.type = Memory Agent1.channels.ch2.capacity = 1000000 agent1.channels.ch2.transactionCapacity =  100000 agent1.channels.ch2.keep-alive = # Define a Avro source called Avro-source1 on Agent1 and tell it # to bind to
0.0.0.0:41414. Connect it to channel CH1.
Agent1.sources.source1.channels = ch1 CH2 Agent1.sources.source1.type = Avro Agent1.sources.source1.bind = con  Agent1.sources.source1.port = 44444 agent1.sources.source1.threads = 5 # Define A logger sink that simply logs all events It receives # and connect it to the other End of the same channel. Agent1.sinks.hdfssink1.channel = ch1 Agent1.sinks.hdfssink1.type = HDFs Agent1.sinks.hdfssink1.hdfs.path = hdfs://
Mycluster/flume/%y-%m-%d/%h%m Agent1.sinks.hdfssink1.hdfs.filePrefix = s1pa124-consolidation-accesslog-%h-%m-%s
Agent1.sinks.hdfssink1.hdfs.useLocalTimeStamp = True Agent1.sinks.hdfssink1.hdfs.writeFormat = Text
Agent1.sinks.hdfssink1.hdfs.fileType = DataStream Agent1.sinks.hdfssink1.hdfs.rollInterval = 1800
Agent1.sinks.hdfssink1.hdfs.rollSize = 5073741824 Agent1.sinks.hdfssink1.hdfs.batchSize = 10000
Agent1.sinks.hdfssink1.hdfs.rollCount = 0 Agent1.sinks.hdfssink1.hdfs.round = True Agent1.sinks.hdfssink1.hdfs.roundValue = Agent1.sinks.hdfssink1.hdfs.roundUnit = Minute Agent1.sinks.sink2.type =
Logger agent1.sinks.sink2.sink.batchsize=10000 agent1.sinks.sink2.sink.batchtimeout=600000
Agent1.sinks.sink2.sink.rollInterval = 1000 agent1.sinks.sink2.sink.directory=/root/data/flume-logs/ Agent1.sinks.sink2.sink.filename=accesslog Agent1.sinks.sink2.Channel = CH2 

2, respectively in HADOOP2 and HADOOP3 run the following command

Flume-ng agent--conf./  --conf-file collect-send.conf--name agent2

Flume Data transmitter configuration file collect-send.conf content as follows

agent2.sources = Source2 Agent2.sinks = Sink1 Agent2.channels = CH2 Agent2.sources.source2.type = exec Agent2.sources.sou Rce2.command = tail-f/root/data/flume.log agent2.sources.source2.channels = CH2 #channels configuration Agent2.channel
S.ch2.type = Memory Agent2.channels.ch2.capacity = 10000 Agent2.channels.ch2.transactionCapacity = 10000 agent2.channels.ch2.keep-alive = 3 #sinks Configuration Agent2.sinks.sink1.type = Avro Agent2.sinks.sink1.hostname= Consolidationipaddress Agent2.sinks.sink1.port = 44444 Agent2.sinks.sink1.channel = CH2 
1, Start flume summary process
  flume-ng agent--conf./F Consolidation-accepter.conf-n Agent1-dflume.root.logger=info,console
2, start flume acquisition process
  flume-ng agent--conf./  --conf-file collect-send.conf--name
3, configuration parameter description ( The following two conditions are the relationship of or, which is triggered when a condition is satisfied
(1) Flushing the data in the channel to the sink every half hour, and a new file is stored
    Agent1.sinks.hdfssink1.hdfs.rollInterval = 1800
(2) when the file size is 5073741824 bytes, a new file is stored
    Agent1.sinks.hdfssink1.hdfs.rollSize = 5073741824

Installation reference: http://blog.csdn.net/panguoyuan/article/details/39555239

User Manual reference: http://flume.apache.org/FlumeUserGuide.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.