Flume is an excellent data acquisition component, some heavyweight, its nature is based on the query results of SQL statements assembled into OPENCSV format data, the default separator symbol is a comma (,), you can rewrite opencsv some classes to modify
1, download
[Root@hadoop0 bigdata]# wget http://apache.fayea.com/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz
2, decompression
[Root@hadoop0 bigdata]# TAR-ZXVF apache-flume-1.6.0-bin.tar.gz
[Root@hadoop0 bigdata]# ls
Apache-flume-1.6.0-bin apache-hive-2.0.1-bin.tar.gz hadoop272 hbase-1.1.5-bin.tar.gz Kafka sqoop-1.4.6 . bin__hadoop-2.0.4-alpha.tar.gz taokeeper-monitor.tar.gz Zookeeper
apache-flume-1.6.0-bin.tar.gz apache-tomcat-7.0.69.zip hbase-1.1.5 hive2.0 sqoop-1.4.6 stomr096 TOMCAT7 Zookeeper.out
3, compile Flume-ng-sql.jar
flume-ng-sql-source-develop_1.2.1 Author: @author Luis Lázaro <lalazaro@keedio.com>
<groupId>org.keedio.flume.flume-ng-sources</groupId>
<artifactId>flume-ng-sql-source</artifactId>
<version>1.2.1-SNAPSHOT</version>
4, configuration data source (two author's flumesink)
[Root@hadoop0 apache-flume-1.6.0-bin]# VI conf/agent.conf
Agent.sources = Sql-source
Agent.channels=c1
Agent.sinks=r
Agent.sources.sql-source.type = Org.keedio.flume.source.SQLSource
# URL to-to-database (currently only MySQL is supported)
Agent.sources.sql-source.connection.url = Jdbc:mysql://192.168.1.100:3306/test
# Database Connection Properties
Agent.sources.sql-source.user = root
Agent.sources.sql-source.password = 123
Agent.sources.sql-source.table = SDFs
Agent.sources.sql-source.database = Database
# Columns to import to Kafka (default * import entire row)
Agent.sources.sql-source.columns.to.select = *
# Increment Column Properties
Agent.sources.sql-source.incremental.column.name = ID
# Increment value is ' from ' to ' want to ' start taking data from tables (0 'll import entire table)
Agent.sources.sql-source.incremental.value = 0
# query delay, each configured Milisecond the query would be sent
agent.sources.sql-source.run.query.delay=10000
# Status file is used to save last readed row
Agent.sources.sql-source.status.file.path =/tmp
Agent.sources.sql-source.status.file.name = Sql-source.status
#Custom Query
Agent.sources.sql-source.custom.query = SELECT * from users WHERE 1=1 and @
agent.sources.sql-source.batch.size = 1000
Agent.sources.sql-source.max.rows = 10000
Agent.channels.c1.type = Memory
agent.channels.c1.capacity = 100
agent.channels.c1.transactionCapacity = 100
Agent.channels.c1.byteCapacityBufferPercentage = 20
agent.channels.c1.byteCapacity = 800
#flume-ng-kafka-sink-1.6.0.jar
#agent. Sinks.r.type = Org.apache.flume.sink.kafka.KafkaSink
#agent. sinks.r.brokerlist=localhost:9092
#agent. sinks.r.batchsize=1
#agent. sinks.r.partitioner.class=org.apache.flume.plugins.singlepartition
#agent. Sinks.r.serializer.class=kafka.serializer.stringencoder
#agent. sinks.r.requiredacks=0
#agent. sinks.r.topic=test
#gitHub BEYONDJ2EE Flumeng-kafka-plugin.jar
Agent.sinks.r.type = Org.apache.flume.plugins.KafkaSink
agent.sinks.r.metadata.broker.list=localhost:9092
Agent.sinks.r.partition.key=0
Agent.sinks.r.partitioner.class=org.apache.flume.plugins.singlepartition
Agent.sinks.r.serializer.class=kafka.serializer.stringencoder
Agent.sinks.r.request.required.acks=0
agent.sinks.r.max.message.size=1000000
Agent.sinks.r.producer.type=sync
Agent.sinks.r.custom.encoding=utf-8
Agent.sinks.r.custom.topic.name=test
5. Prepare database
6, Start zookeeper
[Root@hadoop0 ~]# cd/opt/bigdata/
[Root@hadoop0 bigdata]# ls
Apache-flume-1.6.0-bin apache-hive-2.0.1-bin.tar.gz hadoop272 hbase-1.1.5-bin.tar.gz Kafka sqoop-1.4.6 . bin__hadoop-2.0.4-alpha.tar.gz taokeeper-monitor.tar.gz Zookeeper
apache-flume-1.6.0-bin.tar.gz apache-tomcat-7.0.69.zip hbase-1.1.5 hive2.0 sqoop-1.4.6 stomr096 TOMCAT7 Zookeeper.out
[Root@hadoop0 bigdata]# CD zookeeper/bin/
[Root@hadoop0 bin]#./zkserver.sh start
JMX enabled by default
Using config:/opt/bigdata/zookeeper/bin/. /conf/zoo.cfg
Starting zookeeper ... Started
7, start Kafka
[Root@hadoop0 bin]# CD ... /.. /kafka/bin/
[Root@hadoop0 bin]#./kafka-server-start.sh. /config/server.properties &
[1] 32613
[Root@hadoop0 bin]# [1999-05-25 12:34:44,651] INFO kafkaconfig values:
request.timeout.ms = 30000
Log.roll.hours = 168
Inter.broker.protocol.version = 0.9.0.X
Log.preallocate = False
Security.inter.broker.protocol = PlainText
controller.socket.timeout.ms = 30000
Broker.id.generation.enable = True
Ssl.keymanager.algorithm = SunX509
Ssl.key.password = null
Log.cleaner.enable = True
Ssl.provider = null
[Root@hadoop0 bin]#./kafka-topics.sh--zookeeper localhost--list
Test
8, start flume
[Root@hadoop0 apache-flume-1.6.0-bin]# Rm-rf/tmp/sql-source.status
[Root@hadoop0 apache-flume-1.6.0-bin]#./bin/flume-ng agent-n agent-c conf-f conf/agent.conf-dflume.root.logger=info , console
Info:including Hadoop libraries found via (/opt/bigdata/hadoop272/bin/hadoop) for HDFS access
Info:excluding/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-api-1.7.10.jar from Classpath
Info:excluding/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar from Classpath
Info:including HBASE libraries found via (/opt/bigdata/hbase-1.1.5/bin/hbase) for HBASE access
Info:excluding/opt/bigdata/hbase-1.1.5/lib/slf4j-api-1.7.7.jar from Classpath
Info:excluding/opt/bigdata/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar from Classpath
Info:excluding/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-api-1.7.10.jar from Classpath
Info:excluding/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar from Classpath
Info:including Hive libraries found via (/opt/bigdata/hive2.0) for Hive access
9, the preparation of consumer data
[Root@hadoop0 bin]#./kafka-console-consumer.sh--zookeeper localhost--topic test--from-beginning
Test-message
Gaojs
Qi Huan Ms. Clever donated
1
2
Gaojs
Nihao
Tesdhdhsdhgf
Vdxgdgsdg
Dfhfdhd
Gaojs
Gaojingsong
2015-09-02342
535435353
"1", "Zhangsan", "a", "17-may-2016 20:06:38"
"3", "444", "17-may-2016", "20:06:38"
"4", "Wan-flume", "17-may-2016", "20:06:38"
"5", "Gaojs-flume", "17-may-2016", "20:06:38"
"1", "Zhangsan", "a", "17-may-2016 20:06:38"
"3", "444", "17-may-2016", "20:06:38"
"4", "Wan-flume", "17-may-2016", "20:06:38"
"5", "Gaojs-flume", "17-may-2016", "20:06:38"
10. Result verification
start the flume consumption process log
Thirdjar.rar (1.7 MB) Description: Third party jar downloads: 17 View picture Attachments