Flume 1.5 Log collection and deposit MONGODB installation structure

Source: Internet
Author: User
Tags mongodb driver log4j

Flume The demo is not saying. You can search by yourself.

But now the internet is mainly Flume 1.4 version number of information. Flume 1.5 In a sensational big change. Assuming you're ready to try, I'm here to introduce you to the program minimization structure, and the data that uses Mongosink is stored in MongoDB. Completely independent of execution, without master. There is no collector (plainly collector is an agent, just data from multiple other agents), only one agent. If you understand this set of things, you'll be free to play.


Flume is required for the Java execution Environment, the JDK installation is not explained. Recommended Yum installation.

In addition to installing the JDK in the future without setting any environment variables, flume in fact can find themselves, just send a warning. Ignore it


First go to download the Flume 1.5 installation package

Http://flume.apache.org/download.html

Only need to download apache-flume-1.5.0.1-bin.tar.gz to be able, 32-bit 64-bit system takes all


Here we put the installation package in the/home folder, and then unzip

Tar zxvf apache-flume-1.5.0-bin.tar.gz

The extracted Apache-flume-1.5.0-bin directory renamed to Flume, so we flume the path is/home/flume


Online very much said to set the flume home environment variable, in fact, no


Here we add a configuration file to/home/flume/conf/netcat.conf (in which the Agent2/source2/sink2/channel2 are custom names. Random change)

# define Component Name Agent2.sources = Source2agent2.sinks = Sink2agent2.channels = channel2# Define data Entry Agent2.sources.source2.type = Netcatagent2.sources.source2.bind = 192.168.6.198agent2.sources.source2.port = 44444agent2.sources.source2.channels = channel2# define data Exit Agent2.sinks.sink2.type = Org.riderzen.flume.sink.MongoSinkagent2.sinks.sink2.host = 192.168.6.222agent2.sinks.sink2.port = 27017agent2.sinks.sink2.model = Singleagent2.sinks.sink2.collection = Eventsagent2.sinks.sink2.batch = 100agent2.sinks.sink2.channel = channel2# using memory pipe Agent2.channels.channel2.type = memoryagent2.channels.channel2.capacity = 1000agent2.channels.channel2.transactioncapacity = 100


The above configuration file is believed to be very clear to everyone at a glance. Let me briefly introduce:

The data source is source2. The definition of SOURCE2 is to receive the data sent by the 192.168.6.198:44444port, and then store it in the CHANNEL2 buffer pipe.
What is Channel2? Look at the following definition of Channel2, which is a memory buffer queue. The capacity is 1000 data, the full 100 will be sink disposed of

What about the definition of sink? Here we use Leon Lee (Li long? It may be the great God of the country. Thank you ha) wrote a mongosink, the role of the data taken from the Channel2 into MongoDB, and accumulated 100 data before submission

Mongosink please download here: Https://github.com/leonlee/flume-ng-mongodb-sink


Mongosink I simply say, just need to put him into a jar bag into the/home/flume/lib inside can, of course, don't forget to put the MongoDB driver also thrown in. In the future, if you develop other expansion packs, you can just throw them into lib.


Good. Then we know that Flume's role is to get data from source and into the channel buffer queue. Finally put into permanent storage by sink


Execute the following command to start flume

/home/flume/bin/flume-ng Agent--conf/home/flume/conf--conf-file/home/flume/conf/netcat.conf--name Agent2- dflume.monitoring.type=http-dflume.monitoring.port=34545

A general explanation:

--name Agent2 Specifies the name of the agent that is currently executing

--conf/home/flume/conf This parameter is the best way to specify an absolute path, which means that your profile is stored in a folder, not just the agent's configuration. There is also the log4j configuration. Or you can't log the logs.

--conf-file/home/flume/conf/netcat.conf this refers to the configuration file used by the agent that is currently executing

-DFLUME.MONITORING.TYPE=HTTP specifies that HTTP monitoring is enabled, allowing direct access to native HTTP addresses via the browser to view Flume execution status

-dflume.monitoring.port=34545 specifying the port for HTTP monitoring


If you need to display some debugging information on the console, configure the/home/flume/conf/log4j.properties yourself.

OK, wait for the success.


Now we are mainly to conquer the log data do not have to directly write MONGO library, only to the IP and port to inform other project team, they directly send data here can be


After considering MongoDB may have some limitations, we were able to rewrite the sink, put the data deposit HDFs, and then, and big Hadoop with pull intimate contact


Copyright notice: This article blog original articles, blogs, without consent, may not be reproduced.

Flume 1.5 Log collection and deposit MONGODB installation structure

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.