Flume+kafka Integration

Last Update:2017-06-12 Source: Internet

Author: User

Tags unpack

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Flume+kafka Integration

First, the preparatory work

Prepare 5 intranet servers to create Zookeeper and Kafka clusters

Server address:

192.168.2.240

192.168.2.241

192.168.2.242

192.168.2.243

192.168.2.244

Server System: Centos 6.5

Download the installation package

Zookeeper:http://apache.fayea.com/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz

Flume:http://apache.fayea.com/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz

Kafka:http://apache.fayea.com/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz

Zookeeper,Flume,Kafka need to use the Java environment, so install the JDk first

Yum Install Java-1.7.0-openjdk-devel

Second, installation configurationZookeeper

Select 3 servers as the Zookeeper cluster with their IP :

192.168.2.240

192.168.2.241

192.168.2.242

Note: Perform (1)-(3) steps separately on the first server 192.168.2.240 .

(1) Decompression: put zookeeper-3.4.6.tar.gz into the/opt directory

Tar zxf zookeeper-3.4.6.tar.gz

(2) Create a configuration file: A copy of the conf/zoo_sample.cfg is named Zoo.cfg, also placed in the Conf directory. Then modify the configuration in the following values:

ticktime=2000

Datadir=/opt/zookeeper/data

Initlimit=5

synclimit=2

clientport=2181

server.1=192.168.2.240:2888:3888

server.2=192.168.2.241:2888:3888

server.3=192.168.2.242:2888:3888

The meaning of each parameter:

Ticktime: Heartbeat detection interval (milliseconds), default:2000

ClientPort: Other applications (such as SOLR) accessing the ZooKeeper port, default:2181

Initlimit: Initial synchronization phase (followers connected to leader phase), allowable duration (tick count), default:10

Synclimit: Allow followers to sync to ZooKeeper length (tick count), default:5

DataDir: The storage path for data (such as the managed configuration file)

Server. X:x is the ID of a server in the cluster that is identical to the ID in the myID file . On the right can be configured with two ports, the first port for data synchronization between Fllower and Leader and other communications, and the second port is used to Leader voting communication during the election process.

(3) Create the/opt/zookeeper/data snapshot directory and create my id file, which writes 1.

Mkdir/opt/zookeeper/data Vi/opt/zookeeper/data/myid 1

(4) Copy the/opt/zookeeper/directories already configured on the 192.168.2.240 to 192.168.2.241 and 192.168.2.242. Then Change the contents of the corresponding myID to 2 and 3.

(5) Start zookeeper cluster

execute the Start command on 3 servers, respectively

/opt/zookeeper/bin/zkserver.sh start

Three, installation configurationKafkaCluster

altogether 5 servers, server IP address:

192.168.2.240 Node1

192.168.2.241 Node2

192.168.2.242 Node3

192.168.2.243 Node4

192.168.2.244 NODE5

1. Unzip the installation file to the /opt/ directory

CD/OPTTAR-ZXVF KAFKA_2.10-0.10.0.0.TAR.GZMV kafka_2.10-0.10.0.0 Kafka

2, modify the server. Properties File

#node1 Configuration

Broker.id=0

port=9092

advertised.listeners=plaintext://58.246.xx.xx:9092

advertised.host.name=58.246.xx.xx

# hit the pit, because I was from the online to pull the nginx log back to the company local server, so these two options must be configured as router extranet IP address, otherwise the online Flume report cannot connect Kafka node, reported unable to transfer log messages

advertised.port=9092

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181

#node2 Configuration

Broker.id=1

port=9093

advertised.listeners=plaintext://58.246.xx.xx:9093

advertised.host.name=58.246.xx.xx

advertised.port=9093

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181

#node3 Configuration

broker.id=2

port=9094

advertised.listeners=plaintext://58.246.xx.xx:9094

advertised.host.name=58.246.xx.xx

advertised.port=9094

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181

#node4 Configuration

broker.id=2

port=9095

advertised.listeners=plaintext://58.246.xx.xx:9095

advertised.host.name=58.246.xx.xx

advertised.port=9095

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181

#node5 Configuration

broker.id=2

port=9096

advertised.listeners=plaintext://58.246.xx.xx:9096

advertised.host.name=58.246.xx.xx

advertised.port=9096

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181

Start the Kafka cluster

Execute the following command on all nodes, respectively, to start the service

/opt/kafka/bin/kafka-server-start.sh/opt/kafka/config/server.properties &

Iv. Installation ConfigurationFlume

installs two flume, one installs the online, passes the log of the line back to the local Kafka, the other one installs locally, the Kafka cluster log information dumps to the HDFS

4.1, on-line server installationFlume

Collect nginx logs and pass them to internal Kafka.

1. Unpack the installation package

Cd/opt

TAR–ZXVF apache-flume-1.7.0-bin.tar.gz

2. Create a configuration file

Vi flume-conf.properties Add the following content

A1.sources = R1

A1.sinks = K1

A1.channels = C1

# Describe/configure The source

A1.sources.r1.type = Exec

A1.sources.r1.command = Tail-f/unilifedata/logs/nginx/access.log

A1.sources.r1.channels = C1

# Use a channel which buffers events in memory

A1.channels.c1.type = Memory

A1.channels.c1.capacity = 100000

A1.channels.c1.transactionCapacity = 100000

#sinks

A1.sinks.k1.type =org.apache.flume.sink.kafka.kafkasink

A1.sinks.k1.kafka.topic = Unilife_nginx_production

A1.sinks.k1.kafka.bootstrap.servers = 58.246.xx.xx:9092,58.246.xx.xx:9093,58.246.xx.xx:9094

A1.sinks.k1.brokerList = 58.246.xx.xx:9092,58.246.xx.xx:9093,58.246.xx.xx:9094

A1.sinks.k1.kafka.producer.acks = 1

a1.sinks.k1.flumeBatchSize = 2000

A1.sinks.k1.channel = C1

start The flume service

/opt/flume/bin/flume-ng Agent--conf/opt/flume/conf/--conf-file/opt/flume/conf/flume-conf.properties--name A1-dflume.root.logger=info,logfile &

4.2, Local installationFlume

Dump logs to Hdfs

1. Unpack the installation package

Cd/opt

TAR–ZXVF apache-flume-1.7.0-bin.tar.gz

3. Create a configuration file

Nginx.sources = Source1

Nginx.channels = Channel1

Nginx.sinks = Sink1

Nginx.sources.source1.type =org.apache.flume.source.kafka.kafkasource

Nginx.sources.source1.zookeeperConnect =master:2181,slave1:2181,slave2:2181

Nginx.sources.source1.topic =unilife_nginx_production

Nginx.sources.source1.groupId =flume_unilife_nginx_production

Nginx.sources.source1.channels = Channel1

Nginx.sources.source1.interceptors = I1

Nginx.sources.source1.interceptors.i1.type =timestamp

nginx.sources.source1.kafka.consumer.timeout.ms = 100

Nginx.channels.channel1.type = Memory

Nginx.channels.channel1.capacity = 10000000

nginx.channels.channel1.transactionCapacity = 1000

Nginx.sinks.sink1.type = HDFs

Nginx.sinks.sink1.hdfs.path =hdfs://192.168.2.240:8020/user/hive/warehouse/nginx_log

Nginx.sinks.sink1.hdfs.writeformat=text

Nginx.sinks.sink1.hdfs.inuseprefix=_

Nginx.sinks.sink1.hdfs.rollInterval = 3600

Nginx.sinks.sink1.hdfs.rollSize = 0

Nginx.sinks.sink1.hdfs.rollCount = 0

Nginx.sinks.sink1.hdfs.fileType = DataStream

Nginx.sinks.sink1.hdfs.minblockreplicas=1

Nginx.sinks.sink1.channel = Channel1

Start the service

/opt/flume/bin/flume-ng Agent--conf/opt/flume/conf/--conf-file/opt/flume/conf/flume-nginx-log.properties--name Nginx-dflume.root.logger=info,logfile &

Flume+kafka Integration

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More