I. Overview
1, now has three machines, respectively: HADOOP1,HADOOP2,HADOOP3, to HADOOP1 for the log summary
2, HADOOP1 Summary of the simultaneous output to multiple targets
3, flume a data source corresponding to multiple channel, multiple sink, is configured in the consolidation-accepter.conf file
Ii. deploy flume to collect logs and summary logs
1, running on the HADOOP1
Flume-ng agent--conf./-F Consolidation-accepter.conf-n agent1-dflume.root.logger=info,console
Its script (consolidation-accepter.conf) reads as follows
# Finally, now the we ' ve defined all the components, tell# agent1 which ones we want to Activate.agent1.channels = Ch1 ch2agent1.sources = Source1agent1.sinks = Hdfssink1 Sink2agent1.source.source1.selector.type = Replicatingagent1.source.source1.selector.optional = ch1# Define A memory channel called CH1 on Agent1agent1.channels.ch1.type = Memoryagent1.channels.ch1.capacity = 1000000agent1.channels.ch1.transactioncapacity = 1000000agent1.channels.ch1.keep-alive = 10agent1.channels.ch2.type = memoryagent1.channels.ch2.capacity = 1000000agent1.channels.ch2.transactioncapacity = 100000agent1.channels.ch2.keep-alive = 10# Define an Avro source called Avro-source1 on Agent1 and tell it# to bind to 0.0 .0.0:41414. Connect it to channel ch1.agent1.sources.source1.channels = CH1 Ch2agent1.sources.source1.type = Avroagent1.sources.source1.bind = Conagent1.sources.source1.port = 44444agent1.sources.source1.threads = AA Define A Logger sink that simply logs all events it receives# and connect It to the other end of the same Channel.agent1.sinks.hdfssink1.channel = Ch1agent1.sinks.hdfssink1.type = Hdfsagent1.sink S.hdfssink1.hdfs.path = Hdfs://mycluster/flume/%y-%m-%d/%h%magent1.sinks.hdfssink1.hdfs.fileprefix = S1pa124-consolidation-accesslog-%h-%m-%sagent1.sinks.hdfssink1.hdfs.uselocaltimestamp = Trueagent1.sinks.hdfssink1.hdfs.writeFormat = Textagent1.sinks.hdfssink1.hdfs.fileType = DataStreamagent1.sinks.hdfssink1.hdfs.rollInterval = 1800agent1.sinks.hdfssink1.hdfs.rollsize = 5073741824agent1.sinks.hdfssink1.hdfs.batchsize = 10000agent1.sinks.hdfssink1.hdfs.rollcount = 0agent1.sinks.hdfssink1.hdfs.round = Trueagent1.sinks.hdfssink1.hdfs.roundValue = 60agent1.sinks.hdfssink1.hdfs.roundunit = Minuteagent1.sinks.sink2.type = loggeragent1.sinks.sink2.sink.batchsize= 10000agent1.sinks.sink2.sink.batchtimeout=600000agent1.sinks.sink2.sink.rollinterval = 1000agent1.sinks.sink2.sink.directory=/root/data/flume-logs/agent1.sinks.sink2.sink.filename= Accesslogagent1.sinks.sink2.channEl = CH2
2. Run the following commands in HADOOP2 and HADOOP3 respectively
Flume-ng agent--conf./ --conf-file collect-send.conf--name agent1
Flume Data transmitter configuration file collect-send.conf content as follows
agent2.sources = Source2agent2.sinks = Sink1agent2.channels = Ch2agent2.sources.source2.type = Execagent2.sources.source2.command = tail-f/root/data/flume.logagent2.sources.source2.channels = ch2#channels Configurationagent2.channels.ch2.type = Memoryagent2.channels.ch2.capacity = 10000agent2.channels.ch2.transactioncapacity = 10000agent2.channels.ch2.keep-alive = 3#sinks Configurationagent2.sinks.sink1.type = Avroagent2.sinks.sink1.hostname= ConsolidationIpAddressagent2.sinks.sink1.port = 44444agent2.sinks.sink1.channel = CH2
Iii. Summary
1, Start flume summary process flume-ng agent--conf./-F Consolidation-accepter.conf-n agent1-dflume.root.logger=info,console2, Start the Flume acquisition process flume-ng agent--conf./ --conf-file collect-send.conf--name agent13, configuration parameter description (The following two conditions are the or relationship, That is, trigger when a condition is met) (1) Flushes the data in the channel into sink every half hour, and a new file to store Agent1.sinks.hdfssink1.hdfs.rollInterval = 1800 (2) when the file size is 5073741824 bytes, another new file is stored agent1.sinks.hdfssink1.hdfs.rollSize = 5073741824
Flume a data source corresponds to multiple channel, multiple sink