Flume+kafka Integration

Source: Internet
Author: User
Tags unpack

Flume+kafka Integration

First, the preparatory work

Prepare 5 intranet servers to create Zookeeper and Kafka clusters

Server address:

192.168.2.240

192.168.2.241

192.168.2.242

192.168.2.243

192.168.2.244

Server System: Centos 6.5

Download the installation package

Zookeeper:http://apache.fayea.com/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz

Flume:http://apache.fayea.com/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz

Kafka:http://apache.fayea.com/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz

Zookeeper,Flume,Kafka need to use the Java environment, so install the JDk first

Yum Install Java-1.7.0-openjdk-devel



Second, installation configurationZookeeper

Select 3 servers as the Zookeeper cluster with their IP :

192.168.2.240

192.168.2.241

192.168.2.242

Note: Perform (1)-(3) steps separately on the first server 192.168.2.240 .

(1) Decompression: put zookeeper-3.4.6.tar.gz into the/opt directory

Tar zxf zookeeper-3.4.6.tar.gz

(2) Create a configuration file: A copy of the conf/zoo_sample.cfg is named Zoo.cfg, also placed in the Conf directory. Then modify the configuration in the following values:

ticktime=2000

Datadir=/opt/zookeeper/data

Initlimit=5

synclimit=2

clientport=2181

server.1=192.168.2.240:2888:3888

server.2=192.168.2.241:2888:3888

server.3=192.168.2.242:2888:3888

The meaning of each parameter:

Ticktime: Heartbeat detection interval (milliseconds), default:2000

ClientPort: Other applications (such as SOLR) accessing the ZooKeeper port, default:2181

Initlimit: Initial synchronization phase (followers connected to leader phase), allowable duration (tick count), default:10

Synclimit: Allow followers to sync to ZooKeeper length (tick count), default:5

DataDir: The storage path for data (such as the managed configuration file)

Server. X:x is the ID of a server in the cluster that is identical to the ID in the myID file . On the right can be configured with two ports, the first port for data synchronization between Fllower and Leader and other communications, and the second port is used to Leader voting communication during the election process.

(3) Create the/opt/zookeeper/data snapshot directory and create my id file, which writes 1.

Mkdir/opt/zookeeper/data Vi/opt/zookeeper/data/myid 1

(4) Copy the/opt/zookeeper/directories already configured on the 192.168.2.240 to 192.168.2.241 and 192.168.2.242. Then Change the contents of the corresponding myID to 2 and 3.

(5) Start zookeeper cluster

execute the Start command on 3 servers, respectively

/opt/zookeeper/bin/zkserver.sh start

Three, installation configurationKafkaCluster

altogether 5 servers, server IP address:

192.168.2.240 Node1

192.168.2.241 Node2

192.168.2.242 Node3

192.168.2.243 Node4

192.168.2.244 NODE5

1. Unzip the installation file to the /opt/ directory

CD/OPTTAR-ZXVF KAFKA_2.10-0.10.0.0.TAR.GZMV kafka_2.10-0.10.0.0 Kafka

2, modify the server. Properties File

#node1 Configuration

Broker.id=0

port=9092

advertised.listeners=plaintext://58.246.xx.xx:9092

advertised.host.name=58.246.xx.xx

# hit the pit, because I was from the online to pull the nginx log back to the company local server, so these two options must be configured as router extranet IP address, otherwise the online Flume report cannot connect Kafka node, reported unable to transfer log messages

advertised.port=9092

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181


#node2 Configuration

Broker.id=1

port=9093

advertised.listeners=plaintext://58.246.xx.xx:9093

advertised.host.name=58.246.xx.xx

advertised.port=9093

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181


#node3 Configuration

broker.id=2

port=9094

advertised.listeners=plaintext://58.246.xx.xx:9094

advertised.host.name=58.246.xx.xx

advertised.port=9094

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181



#node4 Configuration

broker.id=2

port=9095

advertised.listeners=plaintext://58.246.xx.xx:9095

advertised.host.name=58.246.xx.xx

advertised.port=9095

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181


#node5 Configuration

broker.id=2

port=9096

advertised.listeners=plaintext://58.246.xx.xx:9096

advertised.host.name=58.246.xx.xx

advertised.port=9096

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181

Start the Kafka cluster

Execute the following command on all nodes, respectively, to start the service

/opt/kafka/bin/kafka-server-start.sh/opt/kafka/config/server.properties &


Iv. Installation ConfigurationFlume

installs two flume, one installs the online, passes the log of the line back to the local Kafka, the other one installs locally, the Kafka cluster log information dumps to the HDFS

4.1, on-line server installationFlume

Collect nginx logs and pass them to internal Kafka.

1. Unpack the installation package

Cd/opt

TAR–ZXVF apache-flume-1.7.0-bin.tar.gz

2. Create a configuration file

Vi flume-conf.properties Add the following content

A1.sources = R1

A1.sinks = K1

A1.channels = C1


# Describe/configure The source

A1.sources.r1.type = Exec

A1.sources.r1.command = Tail-f/unilifedata/logs/nginx/access.log

A1.sources.r1.channels = C1


# Use a channel which buffers events in memory

A1.channels.c1.type = Memory

A1.channels.c1.capacity = 100000

A1.channels.c1.transactionCapacity = 100000


#sinks

A1.sinks.k1.type =org.apache.flume.sink.kafka.kafkasink

A1.sinks.k1.kafka.topic = Unilife_nginx_production

A1.sinks.k1.kafka.bootstrap.servers = 58.246.xx.xx:9092,58.246.xx.xx:9093,58.246.xx.xx:9094

A1.sinks.k1.brokerList = 58.246.xx.xx:9092,58.246.xx.xx:9093,58.246.xx.xx:9094

A1.sinks.k1.kafka.producer.acks = 1

a1.sinks.k1.flumeBatchSize = 2000

A1.sinks.k1.channel = C1

start The flume service

/opt/flume/bin/flume-ng Agent--conf/opt/flume/conf/--conf-file/opt/flume/conf/flume-conf.properties--name A1-dflume.root.logger=info,logfile &


4.2, Local installationFlume

Dump logs to Hdfs

1. Unpack the installation package

Cd/opt

TAR–ZXVF apache-flume-1.7.0-bin.tar.gz

3. Create a configuration file

Nginx.sources = Source1

Nginx.channels = Channel1

Nginx.sinks = Sink1

Nginx.sources.source1.type =org.apache.flume.source.kafka.kafkasource

Nginx.sources.source1.zookeeperConnect =master:2181,slave1:2181,slave2:2181

Nginx.sources.source1.topic =unilife_nginx_production

Nginx.sources.source1.groupId =flume_unilife_nginx_production

Nginx.sources.source1.channels = Channel1

Nginx.sources.source1.interceptors = I1

Nginx.sources.source1.interceptors.i1.type =timestamp

nginx.sources.source1.kafka.consumer.timeout.ms = 100


Nginx.channels.channel1.type = Memory

Nginx.channels.channel1.capacity = 10000000

nginx.channels.channel1.transactionCapacity = 1000


Nginx.sinks.sink1.type = HDFs

Nginx.sinks.sink1.hdfs.path =hdfs://192.168.2.240:8020/user/hive/warehouse/nginx_log

Nginx.sinks.sink1.hdfs.writeformat=text

Nginx.sinks.sink1.hdfs.inuseprefix=_

Nginx.sinks.sink1.hdfs.rollInterval = 3600

Nginx.sinks.sink1.hdfs.rollSize = 0

Nginx.sinks.sink1.hdfs.rollCount = 0

Nginx.sinks.sink1.hdfs.fileType = DataStream

Nginx.sinks.sink1.hdfs.minblockreplicas=1

Nginx.sinks.sink1.channel = Channel1


Start the service

/opt/flume/bin/flume-ng Agent--conf/opt/flume/conf/--conf-file/opt/flume/conf/flume-nginx-log.properties--name Nginx-dflume.root.logger=info,logfile &



Flume+kafka Integration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.