Flume 1.5 log capture and deposit to MongoDB installation

Source: Internet
Author: User
Tags mongodb driver log4j

Flume's introduction is not much to say, we can search by ourselves. But the internet is mostly Flume 1.4 version or before the material, Flume 1.5 feeling change is very big, if you are ready to try, I here to introduce you to the minimization of the construction scheme, and use the Mongosink to the data into MongoDB. Completely single-machine operation, no master, no collector (plainly collector is an agent, just data from a number of other agents), only one agent. If you understand this set of things, you're free to play.


Flume is the Java runtime environment that must be required, the JDK installation is not explained, recommended Yum installation.

In addition, the JDK installed in the future without setting any environment variables, Flume can actually find themselves, but will issue a warning, ignoring the


First go to download the Flume 1.5 installation package

Http://flume.apache.org/download.html

Only need to download apache-flume-1.5.0.1-bin.tar.gz, 32-bit 64-bit system take-all


Here we put the installation package in the/home directory, and then unzip

Tar zxvf apache-flume-1.5.0-bin.tar.gz

Rename the extracted Apache-flume-1.5.0-bin folder to Flume, so we flume the path is/home/flume


Online a lot of said to set flume home environment variable, actually do not


Here we add a configuration file to/home/flume/conf/netcat.conf (where the agent2/source2/sink2/channel2 are defined by their own name, arbitrarily changed)

# define Component Name Agent2.sources = Source2agent2.sinks = Sink2agent2.channels = channel2# Define data Entry Agent2.sources.source2.type = Netcatagent2.sources.source2.bind = 192.168.6.198agent2.sources.source2.port = 44444agent2.sources.source2.channels = channel2# define data Exit Agent2.sinks.sink2.type = Org.riderzen.flume.sink.MongoSinkagent2.sinks.sink2.host = 192.168.6.222agent2.sinks.sink2.port = 27017agent2.sinks.sink2.model = Singleagent2.sinks.sink2.collection = Eventsagent2.sinks.sink2.batch = 100agent2.sinks.sink2.channel = channel2# using memory pipe Agent2.channels.channel2.type = memoryagent2.channels.channel2.capacity = 1000agent2.channels.channel2.transactioncapacity = 100


The above configuration file I believe you can see very clearly, I briefly introduce:

The data source is Source2, and SOURCE2 is defined as receiving data from the native 192.168.6.198:44444 port and storing it in the Channel2 buffer pipeline.
What is Channel2? Look at the bottom of the definition of Channel2, it is a memory buffer queue, the capacity is 1000 data, 100 will be processed by the sink

What about the definition of sink? Here we use Leon Lee (Li long? Maybe it's the big God in the country, thank you. A Mongosink is written to save data taken from Channel2 into MongoDB and accumulate 100 data before submitting

Mongosink please download here: Https://github.com/leonlee/flume-ng-mongodb-sink


Mongosink I simply say, just put him into a jar bag and throw it into the/home/flume/lib inside, of course, don't forget to put the MongoDB driver also thrown in. In the future, if you develop other expansion packs, you can just throw them into lib.


Well, then we know that Flume's role is to get the data from source, deposit the channel buffer queue, and finally put the sink into permanent storage


Run the following command to start flume

/home/flume/bin/flume-ng Agent--conf/home/flume/conf--conf-file/home/flume/conf/netcat.conf--name Agent2- dflume.monitoring.type=http-dflume.monitoring.port=34545

A general explanation:

--name Agent2 Specifies the name of the agent that is currently running

--conf/home/flume/conf This parameter is best to specify the absolute path, indicating your configuration file directory, not only refers to the agent configuration, which also has log4j configuration, otherwise unable to log

--conf-file/home/flume/conf/netcat.conf this refers to the configuration file used by the agent that is currently running

-DFLUME.MONITORING.TYPE=HTTP specifies that HTTP monitoring is turned on, and you can access the native HTTP address directly from the browser to view the flume running status

-dflume.monitoring.port=34545 specifying the port for HTTP monitoring


If you need to display some debugging information in the console, please configure the/home/flume/conf/log4j.properties yourself.

OK, wait for the success.


Now we mainly solve the log data do not have to directly write MONGO library, as long as the IP and port to inform other project team, they directly sent data here on the line


After considering MongoDB may have some limitations, we can very flexibly rewrite the sink, put the data into HDFs, and then close contact with the high-end Hadoop pull


Flume 1.5 log capture and deposit to MongoDB installation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.