Data Collection with Apache Flume (iii)

Source: Internet
Author: User

Finally, two agents are mentioned. The first one is to pass an event to another agent using a Avro Souce and a Avro sink, and then write to the specific directory.

First look at the configuration code.

Agent6.sources = Avrosource//define Avrosource, you can use the Avro client to transmit data to it on the network Agent6.sinks = Avrosinkagent6.channels = Memorychannelagent6.sources.avrosource.type = Avroagent6.sources.avrosource.bind = Localhostagent6.sources.avrosource.port = 2000agent6.sources.avrosource.threads = 5agent6.sinks.avrosink.type = Avroagent6.sinks.avrosink.hostname = Localhostagent6.sinks.avrosink.port = 4000  //port is 4000, Corresponds to the source below agent3 Agent6.channels.memorychannel.type = Memoryagent6.channels.memorychannel.capacity = 1000agent6.channels.memorychannel.transactioncapacity = 100agent6.sources.avrosource.channels = Memorychannelagent6.sinks.avrosink.channel = Memorychannel

The configuration code for another agent here is as follows.

Agent3.sources = avrosource//definition Avro Scourceagent3.sinks = Filesinkagent3.channels = jdbcchannel//using JDBC Channelagent3.sources.avrosource.type = Avroagent3.sources.avrosource.bind = Localhostagent3.sources.avrosource.port = 4000agent3.sources.avrosource.threads = 5agent3.sinks.filesink.type = FILE _rollagent3.sinks.filesink.sink.directory =/home/leung/flume/filesagent3.sinks.filesink.sink.rollinterval = 0agent3.channels.jdbcchannel.type = Jdbcagent3.sources.avrosource.channels = Jdbcchannelagent3.sinks.filesink.channel = Jdbcchannel

OK, as can be seen from the two configuration files, the event is passed from Agent6 to Agent3 and finally written to the files directory. First, start the agent one by one.

Since all two agents use Avro source, they are now attempting to submit data to two agents using avro-client. The first is to submit a message to Agent3, the content of which is today's a good day.

The data is then submitted to AGENT6. The content of Message2 is Hadoop is a good project!

Finally, look at the write of the target file. You can see the order of the two sentences. Because all are written by the same agent, ancient capital is written in the same file.

This achieves a chain. Data is submitted to Port 2000, the data is agent6,agent3 and finally arrives at the destination file. You can follow this example to achieve multiple chain, communication on the network is mainly using Avro, so the intermediate nodes need to use Avro source and Avro sink two types.

agent7.sources = Netsourceagent7.sinks = Hdfssink filesink//respectively two sinkagent7.channels = Memorychannel1 Memorychannel2/ /definition of two Channelagent7.sources.netsource.type = Netcatagent7.sources.netsource.bind = Localhostagent7.sources.netsource.port = 3000agent7.sources.netsource.interceptors = Tsagent7.sources.netsource.interceptors.ts.type = Org.apache.flume.interceptor.timestampinterceptor$builder Agent7.sinks.hdfssink.type = Hdfsagent7.sinks.hdfssink.hdfs.path =/flume-%y-%m-% Dagent7.sinks.hdfssink.hdfs.filePrefix = Logagent7.sinks.hdfssink.hdfs.rollInterval = 0agent7.sinks.hdfssink.hdfs.rollcount = 3agent7.sinks.hdfssink.hdfs.filetype = DataStreamagent7.sinks.filesink.type = FILE_ROLLagent7.sinks.filesink.sink.directory =/home/leung/flume/ Filesagent7.sinks.filesink.sink.rollInterval = 0agent7.channels.memorychannel1.type = memoryagent7.channels.memorychannel1.capacity = 1000agent7.channels.memorychannel1.transactioncapacity = 100agent7.channels.memorychannel2.type = Memoryagent7.channels.memorychannel2.capacity = 1000agent7.channels.memorychannel2.transactioncapacity = 100agent7.sources.netsource.channels = Memorychannel1 Memorychannel2agent7.sinks.hdfssink.channel = memorychannel1/ /Specify Channel1 corresponding Hdfssinkagent7.sinks.filesink.channel = MEMORYCHANNEL2// Specifies that channel2 corresponds to Filesinkagent7.sources.netsource.selector.type = replicating//Specifies that source is passed to sink for all sink receive all event

Start agent7 below.

Here the source selectors chooses the replicating, so it transmits all the event to all the channel. If multiplexing is selected, it is routed to the specified sink based on the specified header field. OK, here's a look at the results. See the information entered in the HDFs and the files directory, respectively.

Here, you can imagine. The first implementation obtains the data from the Web server, then passes through one agent to the Hadoop cluster, then passes to the next agent, passes the next agent to save locally, or passes to where ... Can be freely played by everyone! Visible, Flume is a very flexible and convenient tool!

Thank you all! The level is limited, please do not hesitate to correct!

Data Collection with Apache Flume (iii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.