This article mainly describes the process of using flume to transfer data to MongoDB, which involves environment deployment and considerations.
First, Environment construction
1, flune-ng:http://www.apache.org/dyn/closer.cgi/flume/1.5.2/apache-flume-1.5.2-bin.tar.gz
2. MongoDB Java driver jar package: https://oss.sonatype.org/content/repositories/releases/org/mongodb/mongo-java-driver/ 2.13.0/mongo-java-driver-2.13.0.jar
3, Flume-ng-mongodb-sink Source: Https://github.com/leonlee/flume-ng-mongodb-sink
Flume-ng-mongodb-sink needs to compile the jar itself, download the code from GitHub, unzip it and execute the MVN package and build. Need to install MAVEN to compile the jar package first
Second, flume configuration
1. ENV Configuration
Put the Mongo-java-driver and flume-ng-mongodb-sink two jar packages into the Flume\lib directory and add the path to the Flume_classpath variable of the flume-env.sh file;
Java_opts variable: Plus-dflume.monitoring.type=http-dflume.monitoring.port=xxxx, you can see the monitoring information on [Hostname:xxxx]/metrics; XMS specifies JVM initial memory,-XMX specifies JVM max memory
Flume_home variable: Set FLUME root directory
Java_home variable: Setting the Java root directory
2. Log Configuration
When debugging, set the log to debug and hit the file: Flume.root.logger=debug,logfile
3. Transmission Configuration
With Exec Source, File-channel, Flume-ng-mongodb-sink
my_agent.sources.my_source_1.channels = my_channel_1
my_agent.sources.my_source_1.type = exec
my_agent.sources.my_source_1.command = python xxx.py
my_agent.sources.my_source_1.shell = /bin/bash -c
my_agent.sources.my_source_1.restartThrottle = 10000
my_agent.sources.my_source_1.restart = true
my_agent.sources.my_source_1.logStdErr = true
my_agent.sources.my_source_1.batchSize = 1000
my_agent.sources.my_source_1.interceptors = i1 i2 i3
my_agent.sources.my_source_1.interceptors.i1.type = static
my_agent.sources.my_source_1.interceptors.i1.key = db
my_agent.sources.my_source_1.interceptors.i1.value = cswuyg_test
my_agent.sources.my_source_1.interceptors.i2.type = static
my_agent.sources.my_source_1.interceptors.i2.key = collection
my_agent.sources.my_source_1.interceptors.i2.value = cswuyg_test
my_agent.sources.my_source_1.interceptors.i3.type = static
my_agent.sources.my_source_1.interceptors.i3.key = op
my_agent.sources.my_source_1.interceptors.i3.value = upsert
Field Description: use exec Source to specify the Execute command behavior python xxx.py, process the log in the xxx.py code and print out the JSON-formatted data according to the Convention with Flume-ng-mongodb-sink, if the update class operation must take the _id field, The print log is treated as the body of the event, and I add a custom event Header to it via interceptors;
The static interceptors is used to add information to the event header, and here I add db=cswuyg_test, Collection=cswuyg_test, Op=upsert, These three keys are agreed with Flume-ng-mongodb-sink to specify the DB, collection name in MongoDB, and the operation type is update.
my_agent.channels.my_channel_1.type = file
my_agent.channels.my_channel_1.checkpointDir = /home/work/flume/file-channel/my_channel_1/checkPoint
my_agent.channels.my_channel_1.useDualCheckpoints = true
my_agent.channels.my_channel_1.backupCheckpointDir = /home/work/flume/file-channel/my_channel_1/checkPoint2
my_agent.channels.my_channel_1.dataDirs = /home/work/flume/file-channel/my_channel_1/data
my_agent.channels.my_channel_1.transactionCapacity = 10000
my_agent.channels.my_channel_1.checkpointInterval = 30000
my_agent.channels.my_channel_1.maxFileSize = 4292870142
my_agent.channels.my_channel_1.minimumRequiredSpace = 524288000
my_agent.channels.my_channel_1.capacity = 100000
Sink configuration:
my_agent.sinks.my_mongo_1.type = org.riderzen.flume.sink.MongoSink
my_agent.sinks.my_mongo_1.host = xxxhost
my_agent.sinks.my_mongo_1.port = yyyport
my_agent.sinks.my_mongo_1.model = DYNAMIC/SINGLE ---View source code only supports these two methods, and must be the size
my_agent.sinks.my_mongo_1.db = XXX --mongo table name, the default name is events
my_agent.sinks.my_mongo_1.username = XXX --mongo username
my_agent.sinks.my_mongo_1.password = YYY --mongo password
my_agent.sinks.my_mongo_1.collecion = log
my_agent.sinks.my_mongo_1.batch = 10
my_agent.sinks.my_mongo_1.channel = my_channel_1
my_agent.sinks.my_mongo_1.timestampField = _S
See also: http://www.cnblogs.com/cswuyg/p/4498804.html
Flume-ng-mongodb-sink