Flume-ng-mongodb-sink

Source: Internet
Author: User



This article mainly describes the process of using flume to transfer data to MongoDB, which involves environment deployment and considerations.



First, Environment construction



1, flune-ng:http://www.apache.org/dyn/closer.cgi/flume/1.5.2/apache-flume-1.5.2-bin.tar.gz
2. MongoDB Java driver jar package: https://oss.sonatype.org/content/repositories/releases/org/mongodb/mongo-java-driver/ 2.13.0/mongo-java-driver-2.13.0.jar
3, Flume-ng-mongodb-sink Source: Https://github.com/leonlee/flume-ng-mongodb-sink
Flume-ng-mongodb-sink needs to compile the jar itself, download the code from GitHub, unzip it and execute the MVN package and build. Need to install MAVEN to compile the jar package first



Second, flume configuration



1. ENV Configuration



Put the Mongo-java-driver and flume-ng-mongodb-sink two jar packages into the Flume\lib directory and add the path to the Flume_classpath variable of the flume-env.sh file;
Java_opts variable: Plus-dflume.monitoring.type=http-dflume.monitoring.port=xxxx, you can see the monitoring information on [Hostname:xxxx]/metrics; XMS specifies JVM initial memory,-XMX specifies JVM max memory
Flume_home variable: Set FLUME root directory
Java_home variable: Setting the Java root directory



2. Log Configuration



When debugging, set the log to debug and hit the file: Flume.root.logger=debug,logfile



3. Transmission Configuration



With Exec Source, File-channel, Flume-ng-mongodb-sink


my_agent.sources.my_source_1.channels = my_channel_1
my_agent.sources.my_source_1.type = exec
my_agent.sources.my_source_1.command = python  xxx.py
my_agent.sources.my_source_1.shell = /bin/bash -c
my_agent.sources.my_source_1.restartThrottle = 10000
my_agent.sources.my_source_1.restart = true
my_agent.sources.my_source_1.logStdErr = true
my_agent.sources.my_source_1.batchSize = 1000
my_agent.sources.my_source_1.interceptors = i1 i2 i3
my_agent.sources.my_source_1.interceptors.i1.type = static
my_agent.sources.my_source_1.interceptors.i1.key = db
my_agent.sources.my_source_1.interceptors.i1.value = cswuyg_test
my_agent.sources.my_source_1.interceptors.i2.type = static
my_agent.sources.my_source_1.interceptors.i2.key = collection
my_agent.sources.my_source_1.interceptors.i2.value = cswuyg_test
my_agent.sources.my_source_1.interceptors.i3.type = static
my_agent.sources.my_source_1.interceptors.i3.key = op
my_agent.sources.my_source_1.interceptors.i3.value = upsert


Field Description: use exec Source to specify the Execute command behavior python xxx.py, process the log in the xxx.py code and print out the JSON-formatted data according to the Convention with Flume-ng-mongodb-sink, if the update class operation must take the _id field, The print log is treated as the body of the event, and I add a custom event Header to it via interceptors;



The static interceptors is used to add information to the event header, and here I add db=cswuyg_test, Collection=cswuyg_test, Op=upsert, These three keys are agreed with Flume-ng-mongodb-sink to specify the DB, collection name in MongoDB, and the operation type is update.


my_agent.channels.my_channel_1.type = file
my_agent.channels.my_channel_1.checkpointDir = /home/work/flume/file-channel/my_channel_1/checkPoint
my_agent.channels.my_channel_1.useDualCheckpoints = true
my_agent.channels.my_channel_1.backupCheckpointDir = /home/work/flume/file-channel/my_channel_1/checkPoint2
my_agent.channels.my_channel_1.dataDirs = /home/work/flume/file-channel/my_channel_1/data
my_agent.channels.my_channel_1.transactionCapacity = 10000
my_agent.channels.my_channel_1.checkpointInterval = 30000
my_agent.channels.my_channel_1.maxFileSize = 4292870142
my_agent.channels.my_channel_1.minimumRequiredSpace = 524288000
my_agent.channels.my_channel_1.capacity = 100000


Sink configuration:


my_agent.sinks.my_mongo_1.type = org.riderzen.flume.sink.MongoSink
my_agent.sinks.my_mongo_1.host = xxxhost
my_agent.sinks.my_mongo_1.port = yyyport
my_agent.sinks.my_mongo_1.model = DYNAMIC/SINGLE ---View source code only supports these two methods, and must be the size
my_agent.sinks.my_mongo_1.db = XXX --mongo table name, the default name is events
my_agent.sinks.my_mongo_1.username = XXX --mongo username
my_agent.sinks.my_mongo_1.password = YYY --mongo password
my_agent.sinks.my_mongo_1.collecion = log
my_agent.sinks.my_mongo_1.batch = 10
my_agent.sinks.my_mongo_1.channel = my_channel_1
my_agent.sinks.my_mongo_1.timestampField = _S 


See also: http://www.cnblogs.com/cswuyg/p/4498804.html



Flume-ng-mongodb-sink


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.