Flume+kafka Integration
First, the preparatory work
Prepare 5 intranet servers to create Zookeeper and Kafka clusters
Server address:
192.168.2.240
192.168.2.241
192.168.2.242
192.168.2.243
192.168.2.244
Server System: Centos 6.5
Download the installation package
Zookeeper:http://apache.fayea.com/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
Flume:http://apache.fayea.com/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz
Kafka:http://apache.fayea.com/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
Zookeeper,Flume,Kafka need to use the Java environment, so install the JDk first
Yum Install Java-1.7.0-openjdk-devel
Second, installation configurationZookeeper
Select 3 servers as the Zookeeper cluster with their IP :
192.168.2.240
192.168.2.241
192.168.2.242
Note: Perform (1)-(3) steps separately on the first server 192.168.2.240 .
(1) Decompression: put zookeeper-3.4.6.tar.gz into the/opt directory
Tar zxf zookeeper-3.4.6.tar.gz
(2) Create a configuration file: A copy of the conf/zoo_sample.cfg is named Zoo.cfg, also placed in the Conf directory. Then modify the configuration in the following values:
ticktime=2000
Datadir=/opt/zookeeper/data
Initlimit=5
synclimit=2
clientport=2181
server.1=192.168.2.240:2888:3888
server.2=192.168.2.241:2888:3888
server.3=192.168.2.242:2888:3888
The meaning of each parameter:
Ticktime: Heartbeat detection interval (milliseconds), default:2000
ClientPort: Other applications (such as SOLR) accessing the ZooKeeper port, default:2181
Initlimit: Initial synchronization phase (followers connected to leader phase), allowable duration (tick count), default:10
Synclimit: Allow followers to sync to ZooKeeper length (tick count), default:5
DataDir: The storage path for data (such as the managed configuration file)
Server. X:x is the ID of a server in the cluster that is identical to the ID in the myID file . On the right can be configured with two ports, the first port for data synchronization between Fllower and Leader and other communications, and the second port is used to Leader voting communication during the election process.
(3) Create the/opt/zookeeper/data snapshot directory and create my id file, which writes 1.
Mkdir/opt/zookeeper/data Vi/opt/zookeeper/data/myid 1
(4) Copy the/opt/zookeeper/directories already configured on the 192.168.2.240 to 192.168.2.241 and 192.168.2.242. Then Change the contents of the corresponding myID to 2 and 3.
(5) Start zookeeper cluster
execute the Start command on 3 servers, respectively
/opt/zookeeper/bin/zkserver.sh start
Three, installation configurationKafkaCluster
altogether 5 servers, server IP address:
192.168.2.240 Node1
192.168.2.241 Node2
192.168.2.242 Node3
192.168.2.243 Node4
192.168.2.244 NODE5
1. Unzip the installation file to the /opt/ directory
CD/OPTTAR-ZXVF KAFKA_2.10-0.10.0.0.TAR.GZMV kafka_2.10-0.10.0.0 Kafka
2, modify the server. Properties File
#node1 Configuration
Broker.id=0
port=9092
advertised.listeners=plaintext://58.246.xx.xx:9092
advertised.host.name=58.246.xx.xx
# hit the pit, because I was from the online to pull the nginx log back to the company local server, so these two options must be configured as router extranet IP address, otherwise the online Flume report cannot connect Kafka node, reported unable to transfer log messages
advertised.port=9092
Num.network.threads=3
Num.io.threads=8
Num.partitions=5
zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181
#node2 Configuration
Broker.id=1
port=9093
advertised.listeners=plaintext://58.246.xx.xx:9093
advertised.host.name=58.246.xx.xx
advertised.port=9093
Num.network.threads=3
Num.io.threads=8
Num.partitions=5
zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181
#node3 Configuration
broker.id=2
port=9094
advertised.listeners=plaintext://58.246.xx.xx:9094
advertised.host.name=58.246.xx.xx
advertised.port=9094
Num.network.threads=3
Num.io.threads=8
Num.partitions=5
zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181
#node4 Configuration
broker.id=2
port=9095
advertised.listeners=plaintext://58.246.xx.xx:9095
advertised.host.name=58.246.xx.xx
advertised.port=9095
Num.network.threads=3
Num.io.threads=8
Num.partitions=5
zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181
#node5 Configuration
broker.id=2
port=9096
advertised.listeners=plaintext://58.246.xx.xx:9096
advertised.host.name=58.246.xx.xx
advertised.port=9096
Num.network.threads=3
Num.io.threads=8
Num.partitions=5
zookeeper.connect=192.168.2.240:2181,192.168.2.241:2181,192.168.2.242:2181
Start the Kafka cluster
Execute the following command on all nodes, respectively, to start the service
/opt/kafka/bin/kafka-server-start.sh/opt/kafka/config/server.properties &
Iv. Installation ConfigurationFlume
installs two flume, one installs the online, passes the log of the line back to the local Kafka, the other one installs locally, the Kafka cluster log information dumps to the HDFS
4.1, on-line server installationFlume
Collect nginx logs and pass them to internal Kafka.
1. Unpack the installation package
Cd/opt
TAR–ZXVF apache-flume-1.7.0-bin.tar.gz
2. Create a configuration file
Vi flume-conf.properties Add the following content
A1.sources = R1
A1.sinks = K1
A1.channels = C1
# Describe/configure The source
A1.sources.r1.type = Exec
A1.sources.r1.command = Tail-f/unilifedata/logs/nginx/access.log
A1.sources.r1.channels = C1
# Use a channel which buffers events in memory
A1.channels.c1.type = Memory
A1.channels.c1.capacity = 100000
A1.channels.c1.transactionCapacity = 100000
#sinks
A1.sinks.k1.type =org.apache.flume.sink.kafka.kafkasink
A1.sinks.k1.kafka.topic = Unilife_nginx_production
A1.sinks.k1.kafka.bootstrap.servers = 58.246.xx.xx:9092,58.246.xx.xx:9093,58.246.xx.xx:9094
A1.sinks.k1.brokerList = 58.246.xx.xx:9092,58.246.xx.xx:9093,58.246.xx.xx:9094
A1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.flumeBatchSize = 2000
A1.sinks.k1.channel = C1
start The flume service
/opt/flume/bin/flume-ng Agent--conf/opt/flume/conf/--conf-file/opt/flume/conf/flume-conf.properties--name A1-dflume.root.logger=info,logfile &
4.2, Local installationFlume
Dump logs to Hdfs
1. Unpack the installation package
Cd/opt
TAR–ZXVF apache-flume-1.7.0-bin.tar.gz
3. Create a configuration file
Nginx.sources = Source1
Nginx.channels = Channel1
Nginx.sinks = Sink1
Nginx.sources.source1.type =org.apache.flume.source.kafka.kafkasource
Nginx.sources.source1.zookeeperConnect =master:2181,slave1:2181,slave2:2181
Nginx.sources.source1.topic =unilife_nginx_production
Nginx.sources.source1.groupId =flume_unilife_nginx_production
Nginx.sources.source1.channels = Channel1
Nginx.sources.source1.interceptors = I1
Nginx.sources.source1.interceptors.i1.type =timestamp
nginx.sources.source1.kafka.consumer.timeout.ms = 100
Nginx.channels.channel1.type = Memory
Nginx.channels.channel1.capacity = 10000000
nginx.channels.channel1.transactionCapacity = 1000
Nginx.sinks.sink1.type = HDFs
Nginx.sinks.sink1.hdfs.path =hdfs://192.168.2.240:8020/user/hive/warehouse/nginx_log
Nginx.sinks.sink1.hdfs.writeformat=text
Nginx.sinks.sink1.hdfs.inuseprefix=_
Nginx.sinks.sink1.hdfs.rollInterval = 3600
Nginx.sinks.sink1.hdfs.rollSize = 0
Nginx.sinks.sink1.hdfs.rollCount = 0
Nginx.sinks.sink1.hdfs.fileType = DataStream
Nginx.sinks.sink1.hdfs.minblockreplicas=1
Nginx.sinks.sink1.channel = Channel1
Start the service
/opt/flume/bin/flume-ng Agent--conf/opt/flume/conf/--conf-file/opt/flume/conf/flume-nginx-log.properties--name Nginx-dflume.root.logger=info,logfile &
Flume+kafka Integration