Flume 1.5 log capture and deposit to MongoDB installation

Last Update:2014-11-13 Source: Internet

Author: User

Tags mongodb driver log4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Flume's introduction is not much to say, we can search by ourselves. But the internet is mostly Flume 1.4 version or before the material, Flume 1.5 feeling change is very big, if you are ready to try, I here to introduce you to the minimization of the construction scheme, and use the Mongosink to the data into MongoDB. Completely single-machine operation, no master, no collector (plainly collector is an agent, just data from a number of other agents), only one agent. If you understand this set of things, you're free to play.

Flume is the Java runtime environment that must be required, the JDK installation is not explained, recommended Yum installation.

In addition, the JDK installed in the future without setting any environment variables, Flume can actually find themselves, but will issue a warning, ignoring the

First go to download the Flume 1.5 installation package

Http://flume.apache.org/download.html

Only need to download apache-flume-1.5.0.1-bin.tar.gz, 32-bit 64-bit system take-all

Here we put the installation package in the/home directory, and then unzip

Tar zxvf apache-flume-1.5.0-bin.tar.gz

Rename the extracted Apache-flume-1.5.0-bin folder to Flume, so we flume the path is/home/flume

Online a lot of said to set flume home environment variable, actually do not

Here we add a configuration file to/home/flume/conf/netcat.conf (where the agent2/source2/sink2/channel2 are defined by their own name, arbitrarily changed)

# define Component Name Agent2.sources = Source2agent2.sinks = Sink2agent2.channels = channel2# Define data Entry Agent2.sources.source2.type = Netcatagent2.sources.source2.bind = 192.168.6.198agent2.sources.source2.port = 44444agent2.sources.source2.channels = channel2# define data Exit Agent2.sinks.sink2.type = Org.riderzen.flume.sink.MongoSinkagent2.sinks.sink2.host = 192.168.6.222agent2.sinks.sink2.port = 27017agent2.sinks.sink2.model = Singleagent2.sinks.sink2.collection = Eventsagent2.sinks.sink2.batch = 100agent2.sinks.sink2.channel = channel2# using memory pipe Agent2.channels.channel2.type = memoryagent2.channels.channel2.capacity = 1000agent2.channels.channel2.transactioncapacity = 100

The above configuration file I believe you can see very clearly, I briefly introduce:

The data source is Source2, and SOURCE2 is defined as receiving data from the native 192.168.6.198:44444 port and storing it in the Channel2 buffer pipeline.
What is Channel2? Look at the bottom of the definition of Channel2, it is a memory buffer queue, the capacity is 1000 data, 100 will be processed by the sink

What about the definition of sink? Here we use Leon Lee (Li long? Maybe it's the big God in the country, thank you. A Mongosink is written to save data taken from Channel2 into MongoDB and accumulate 100 data before submitting

Mongosink please download here: Https://github.com/leonlee/flume-ng-mongodb-sink

Mongosink I simply say, just put him into a jar bag and throw it into the/home/flume/lib inside, of course, don't forget to put the MongoDB driver also thrown in. In the future, if you develop other expansion packs, you can just throw them into lib.

Well, then we know that Flume's role is to get the data from source, deposit the channel buffer queue, and finally put the sink into permanent storage

Run the following command to start flume

/home/flume/bin/flume-ng Agent--conf/home/flume/conf--conf-file/home/flume/conf/netcat.conf--name Agent2- dflume.monitoring.type=http-dflume.monitoring.port=34545

A general explanation:

--name Agent2 Specifies the name of the agent that is currently running

--conf/home/flume/conf This parameter is best to specify the absolute path, indicating your configuration file directory, not only refers to the agent configuration, which also has log4j configuration, otherwise unable to log

--conf-file/home/flume/conf/netcat.conf this refers to the configuration file used by the agent that is currently running

-DFLUME.MONITORING.TYPE=HTTP specifies that HTTP monitoring is turned on, and you can access the native HTTP address directly from the browser to view the flume running status

-dflume.monitoring.port=34545 specifying the port for HTTP monitoring

If you need to display some debugging information in the console, please configure the/home/flume/conf/log4j.properties yourself.

OK, wait for the success.

Now we mainly solve the log data do not have to directly write MONGO library, as long as the IP and port to inform other project team, they directly sent data here on the line

After considering MongoDB may have some limitations, we can very flexibly rewrite the sink, put the data into HDFs, and then close contact with the high-end Hadoop pull

Flume 1.5 log capture and deposit to MongoDB installation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More