Actual combat Apache-flume Collect db data to Kafka

Source: Internet
Author: User
Tags zookeeper sqoop

Flume is an excellent data acquisition component, some heavyweight, its nature is based on the query results of SQL statements assembled into OPENCSV format data, the default separator symbol is a comma (,), you can rewrite opencsv some classes to modify

1, download

[Root@hadoop0 bigdata]# wget http://apache.fayea.com/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz

2, decompression

[Root@hadoop0 bigdata]# TAR-ZXVF apache-flume-1.6.0-bin.tar.gz

[Root@hadoop0 bigdata]# ls

Apache-flume-1.6.0-bin apache-hive-2.0.1-bin.tar.gz hadoop272 hbase-1.1.5-bin.tar.gz Kafka sqoop-1.4.6 . bin__hadoop-2.0.4-alpha.tar.gz taokeeper-monitor.tar.gz Zookeeper

apache-flume-1.6.0-bin.tar.gz apache-tomcat-7.0.69.zip hbase-1.1.5 hive2.0 sqoop-1.4.6 stomr096 TOMCAT7 Zookeeper.out

3, compile Flume-ng-sql.jar

flume-ng-sql-source-develop_1.2.1 Author: @author Luis Lázaro <lalazaro@keedio.com>

<groupId>org.keedio.flume.flume-ng-sources</groupId>

<artifactId>flume-ng-sql-source</artifactId>

<version>1.2.1-SNAPSHOT</version>



4, configuration data source (two author's flumesink)

[Root@hadoop0 apache-flume-1.6.0-bin]# VI conf/agent.conf

Agent.sources = Sql-source

Agent.channels=c1

Agent.sinks=r

Agent.sources.sql-source.type = Org.keedio.flume.source.SQLSource

# URL to-to-database (currently only MySQL is supported)

Agent.sources.sql-source.connection.url = Jdbc:mysql://192.168.1.100:3306/test

# Database Connection Properties

Agent.sources.sql-source.user = root

Agent.sources.sql-source.password = 123

Agent.sources.sql-source.table = SDFs

Agent.sources.sql-source.database = Database

# Columns to import to Kafka (default * import entire row)

Agent.sources.sql-source.columns.to.select = *

# Increment Column Properties

Agent.sources.sql-source.incremental.column.name = ID

# Increment value is ' from ' to ' want to ' start taking data from tables (0 'll import entire table)

Agent.sources.sql-source.incremental.value = 0

# query delay, each configured Milisecond the query would be sent

agent.sources.sql-source.run.query.delay=10000

# Status file is used to save last readed row

Agent.sources.sql-source.status.file.path =/tmp

Agent.sources.sql-source.status.file.name = Sql-source.status

#Custom Query

Agent.sources.sql-source.custom.query = SELECT * from users WHERE 1=1 and @

agent.sources.sql-source.batch.size = 1000

Agent.sources.sql-source.max.rows = 10000

Agent.channels.c1.type = Memory

agent.channels.c1.capacity = 100

agent.channels.c1.transactionCapacity = 100

Agent.channels.c1.byteCapacityBufferPercentage = 20

agent.channels.c1.byteCapacity = 800

#flume-ng-kafka-sink-1.6.0.jar

#agent. Sinks.r.type = Org.apache.flume.sink.kafka.KafkaSink

#agent. sinks.r.brokerlist=localhost:9092

#agent. sinks.r.batchsize=1

#agent. sinks.r.partitioner.class=org.apache.flume.plugins.singlepartition

#agent. Sinks.r.serializer.class=kafka.serializer.stringencoder

#agent. sinks.r.requiredacks=0

#agent. sinks.r.topic=test

#gitHub BEYONDJ2EE Flumeng-kafka-plugin.jar

Agent.sinks.r.type = Org.apache.flume.plugins.KafkaSink

agent.sinks.r.metadata.broker.list=localhost:9092

Agent.sinks.r.partition.key=0

Agent.sinks.r.partitioner.class=org.apache.flume.plugins.singlepartition

Agent.sinks.r.serializer.class=kafka.serializer.stringencoder

Agent.sinks.r.request.required.acks=0

agent.sinks.r.max.message.size=1000000

Agent.sinks.r.producer.type=sync

Agent.sinks.r.custom.encoding=utf-8

Agent.sinks.r.custom.topic.name=test

5. Prepare database



6, Start zookeeper

[Root@hadoop0 ~]# cd/opt/bigdata/

[Root@hadoop0 bigdata]# ls

Apache-flume-1.6.0-bin apache-hive-2.0.1-bin.tar.gz hadoop272 hbase-1.1.5-bin.tar.gz Kafka sqoop-1.4.6 . bin__hadoop-2.0.4-alpha.tar.gz taokeeper-monitor.tar.gz Zookeeper

apache-flume-1.6.0-bin.tar.gz apache-tomcat-7.0.69.zip hbase-1.1.5 hive2.0 sqoop-1.4.6 stomr096 TOMCAT7 Zookeeper.out

[Root@hadoop0 bigdata]# CD zookeeper/bin/

[Root@hadoop0 bin]#./zkserver.sh start

JMX enabled by default

Using config:/opt/bigdata/zookeeper/bin/. /conf/zoo.cfg

Starting zookeeper ... Started

7, start Kafka

[Root@hadoop0 bin]# CD ... /.. /kafka/bin/

[Root@hadoop0 bin]#./kafka-server-start.sh. /config/server.properties &

[1] 32613

[Root@hadoop0 bin]# [1999-05-25 12:34:44,651] INFO kafkaconfig values:

request.timeout.ms = 30000

Log.roll.hours = 168

Inter.broker.protocol.version = 0.9.0.X

Log.preallocate = False

Security.inter.broker.protocol = PlainText

controller.socket.timeout.ms = 30000

Broker.id.generation.enable = True

Ssl.keymanager.algorithm = SunX509

Ssl.key.password = null

Log.cleaner.enable = True

Ssl.provider = null

[Root@hadoop0 bin]#./kafka-topics.sh--zookeeper localhost--list

Test

8, start flume

[Root@hadoop0 apache-flume-1.6.0-bin]# Rm-rf/tmp/sql-source.status

[Root@hadoop0 apache-flume-1.6.0-bin]#./bin/flume-ng agent-n agent-c conf-f conf/agent.conf-dflume.root.logger=info , console

Info:including Hadoop libraries found via (/opt/bigdata/hadoop272/bin/hadoop) for HDFS access

Info:excluding/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-api-1.7.10.jar from Classpath

Info:excluding/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar from Classpath

Info:including HBASE libraries found via (/opt/bigdata/hbase-1.1.5/bin/hbase) for HBASE access

Info:excluding/opt/bigdata/hbase-1.1.5/lib/slf4j-api-1.7.7.jar from Classpath

Info:excluding/opt/bigdata/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar from Classpath

Info:excluding/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-api-1.7.10.jar from Classpath

Info:excluding/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar from Classpath

Info:including Hive libraries found via (/opt/bigdata/hive2.0) for Hive access



9, the preparation of consumer data

[Root@hadoop0 bin]#./kafka-console-consumer.sh--zookeeper localhost--topic test--from-beginning

Test-message

Gaojs

Qi  Huan Ms. Clever donated 

1

2

Gaojs

Nihao

Tesdhdhsdhgf

Vdxgdgsdg

Dfhfdhd

Gaojs

Gaojingsong

2015-09-02342

535435353

"1", "Zhangsan", "a", "17-may-2016 20:06:38"

"3", "444", "17-may-2016", "20:06:38"

"4", "Wan-flume", "17-may-2016", "20:06:38"

"5", "Gaojs-flume", "17-may-2016", "20:06:38"

"1", "Zhangsan", "a", "17-may-2016 20:06:38"

"3", "444", "17-may-2016", "20:06:38"

"4", "Wan-flume", "17-may-2016", "20:06:38"

"5", "Gaojs-flume", "17-may-2016", "20:06:38"

10. Result verification



start the flume consumption process log



Thirdjar.rar (1.7 MB) Description: Third party jar downloads: 17 View picture Attachments

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.