Flume's introduction is not much to say, we can search by ourselves. But the internet is mostly Flume 1.4 version or before the material, Flume 1.5 feeling change is very big, if you are ready to try, I here to introduce you to the minimization of the construction scheme, and use the Mongosink to the data into MongoDB. Completely single-machine operation, no master, no collector (plainly collector is an agent, just data from a number of other agents), only one agent. If you understand this set of things, you're free to play.
Flume is the Java runtime environment that must be required, the JDK installation is not explained, recommended Yum installation.
In addition, the JDK installed in the future without setting any environment variables, Flume can actually find themselves, but will issue a warning, ignoring the
First go to download the Flume 1.5 installation package
Http://flume.apache.org/download.html
Only need to download apache-flume-1.5.0.1-bin.tar.gz, 32-bit 64-bit system take-all
Here we put the installation package in the/home directory, and then unzip
Tar zxvf apache-flume-1.5.0-bin.tar.gz
Rename the extracted Apache-flume-1.5.0-bin folder to Flume, so we flume the path is/home/flume
Online a lot of said to set flume home environment variable, actually do not
Here we add a configuration file to/home/flume/conf/netcat.conf (where the agent2/source2/sink2/channel2 are defined by their own name, arbitrarily changed)
# define Component Name Agent2.sources = Source2agent2.sinks = Sink2agent2.channels = channel2# Define data Entry Agent2.sources.source2.type = Netcatagent2.sources.source2.bind = 192.168.6.198agent2.sources.source2.port = 44444agent2.sources.source2.channels = channel2# define data Exit Agent2.sinks.sink2.type = Org.riderzen.flume.sink.MongoSinkagent2.sinks.sink2.host = 192.168.6.222agent2.sinks.sink2.port = 27017agent2.sinks.sink2.model = Singleagent2.sinks.sink2.collection = Eventsagent2.sinks.sink2.batch = 100agent2.sinks.sink2.channel = channel2# using memory pipe Agent2.channels.channel2.type = memoryagent2.channels.channel2.capacity = 1000agent2.channels.channel2.transactioncapacity = 100
The above configuration file I believe you can see very clearly, I briefly introduce:
The data source is Source2, and SOURCE2 is defined as receiving data from the native 192.168.6.198:44444 port and storing it in the Channel2 buffer pipeline.
What is Channel2? Look at the bottom of the definition of Channel2, it is a memory buffer queue, the capacity is 1000 data, 100 will be processed by the sink
What about the definition of sink? Here we use Leon Lee (Li long? Maybe it's the big God in the country, thank you. A Mongosink is written to save data taken from Channel2 into MongoDB and accumulate 100 data before submitting
Mongosink please download here: Https://github.com/leonlee/flume-ng-mongodb-sink
Mongosink I simply say, just put him into a jar bag and throw it into the/home/flume/lib inside, of course, don't forget to put the MongoDB driver also thrown in. In the future, if you develop other expansion packs, you can just throw them into lib.
Well, then we know that Flume's role is to get the data from source, deposit the channel buffer queue, and finally put the sink into permanent storage
Run the following command to start flume
/home/flume/bin/flume-ng Agent--conf/home/flume/conf--conf-file/home/flume/conf/netcat.conf--name Agent2- dflume.monitoring.type=http-dflume.monitoring.port=34545
A general explanation:
--name Agent2 Specifies the name of the agent that is currently running
--conf/home/flume/conf This parameter is best to specify the absolute path, indicating your configuration file directory, not only refers to the agent configuration, which also has log4j configuration, otherwise unable to log
--conf-file/home/flume/conf/netcat.conf this refers to the configuration file used by the agent that is currently running
-DFLUME.MONITORING.TYPE=HTTP specifies that HTTP monitoring is turned on, and you can access the native HTTP address directly from the browser to view the flume running status
-dflume.monitoring.port=34545 specifying the port for HTTP monitoring
If you need to display some debugging information in the console, please configure the/home/flume/conf/log4j.properties yourself.
OK, wait for the success.
Now we mainly solve the log data do not have to directly write MONGO library, as long as the IP and port to inform other project team, they directly sent data here on the line
After considering MongoDB may have some limitations, we can very flexibly rewrite the sink, put the data into HDFs, and then close contact with the high-end Hadoop pull
Flume 1.5 log capture and deposit to MongoDB installation