Flume 1.5 Log collection and deposit MONGODB installation structure

Last Update:2015-08-12 Source: Internet

Author: User

Tags mongodb driver log4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Flume The demo is not saying. You can search by yourself.

But now the internet is mainly Flume 1.4 version number of information. Flume 1.5 In a sensational big change. Assuming you're ready to try, I'm here to introduce you to the program minimization structure, and the data that uses Mongosink is stored in MongoDB. Completely independent of execution, without master. There is no collector (plainly collector is an agent, just data from multiple other agents), only one agent. If you understand this set of things, you'll be free to play.

Flume is required for the Java execution Environment, the JDK installation is not explained. Recommended Yum installation.

In addition to installing the JDK in the future without setting any environment variables, flume in fact can find themselves, just send a warning. Ignore it

First go to download the Flume 1.5 installation package

Http://flume.apache.org/download.html

Only need to download apache-flume-1.5.0.1-bin.tar.gz to be able, 32-bit 64-bit system takes all

Here we put the installation package in the/home folder, and then unzip

Tar zxvf apache-flume-1.5.0-bin.tar.gz

The extracted Apache-flume-1.5.0-bin directory renamed to Flume, so we flume the path is/home/flume

Online very much said to set the flume home environment variable, in fact, no

Here we add a configuration file to/home/flume/conf/netcat.conf (in which the Agent2/source2/sink2/channel2 are custom names. Random change)

# define Component Name Agent2.sources = Source2agent2.sinks = Sink2agent2.channels = channel2# Define data Entry Agent2.sources.source2.type = Netcatagent2.sources.source2.bind = 192.168.6.198agent2.sources.source2.port = 44444agent2.sources.source2.channels = channel2# define data Exit Agent2.sinks.sink2.type = Org.riderzen.flume.sink.MongoSinkagent2.sinks.sink2.host = 192.168.6.222agent2.sinks.sink2.port = 27017agent2.sinks.sink2.model = Singleagent2.sinks.sink2.collection = Eventsagent2.sinks.sink2.batch = 100agent2.sinks.sink2.channel = channel2# using memory pipe Agent2.channels.channel2.type = memoryagent2.channels.channel2.capacity = 1000agent2.channels.channel2.transactioncapacity = 100

The above configuration file is believed to be very clear to everyone at a glance. Let me briefly introduce:

The data source is source2. The definition of SOURCE2 is to receive the data sent by the 192.168.6.198:44444port, and then store it in the CHANNEL2 buffer pipe.
What is Channel2? Look at the following definition of Channel2, which is a memory buffer queue. The capacity is 1000 data, the full 100 will be sink disposed of

What about the definition of sink? Here we use Leon Lee (Li long? It may be the great God of the country. Thank you ha) wrote a mongosink, the role of the data taken from the Channel2 into MongoDB, and accumulated 100 data before submission

Mongosink please download here: Https://github.com/leonlee/flume-ng-mongodb-sink

Mongosink I simply say, just need to put him into a jar bag into the/home/flume/lib inside can, of course, don't forget to put the MongoDB driver also thrown in. In the future, if you develop other expansion packs, you can just throw them into lib.

Good. Then we know that Flume's role is to get data from source and into the channel buffer queue. Finally put into permanent storage by sink

Execute the following command to start flume

/home/flume/bin/flume-ng Agent--conf/home/flume/conf--conf-file/home/flume/conf/netcat.conf--name Agent2- dflume.monitoring.type=http-dflume.monitoring.port=34545

A general explanation:

--name Agent2 Specifies the name of the agent that is currently executing

--conf/home/flume/conf This parameter is the best way to specify an absolute path, which means that your profile is stored in a folder, not just the agent's configuration. There is also the log4j configuration. Or you can't log the logs.

--conf-file/home/flume/conf/netcat.conf this refers to the configuration file used by the agent that is currently executing

-DFLUME.MONITORING.TYPE=HTTP specifies that HTTP monitoring is enabled, allowing direct access to native HTTP addresses via the browser to view Flume execution status

-dflume.monitoring.port=34545 specifying the port for HTTP monitoring

If you need to display some debugging information on the console, configure the/home/flume/conf/log4j.properties yourself.

OK, wait for the success.

Now we are mainly to conquer the log data do not have to directly write MONGO library, only to the IP and port to inform other project team, they directly send data here can be

After considering MongoDB may have some limitations, we were able to rewrite the sink, put the data deposit HDFs, and then, and big Hadoop with pull intimate contact

Flume 1.5 Log collection and deposit MONGODB installation structure

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More