Kafka--The cluster builds the __kafka

Source: Internet
Author: User
Tags zookeeper server port log4j

Reprint Please specify: http://blog.csdn.net/l1028386804/article/details/78374836
first, the Zookeeper cluster build

Kafka cluster is to save the state in zookeeper, the first to build zookeeper cluster.
1. Software Environment

(3 Servers-my tests)
192.168.7.100 Server1
192.168.7.101 Server2
192.168.7.107 Server3
1-1, Linux Server One, three, five, (2*n+1), zookeeper cluster of work is more than half to provide services, 3 Taichung more than two units more than half, allow 1 to hang out, whether can use even, in fact, not necessary.
If there are four of them, then hang up a set of three servers, if the hanging off a will not, here remember is more than half.
1-2, Java jdk1.7 zookeeper is written in Java, so he needs Java environment, Java is running on the Java Virtual machine
1-3, zookeeper stable version zookeeper 3.4.9 version
2, Configuration & Installation Zookeeper

The following actions are: 3 Server unified operation

1. Configure Zookeeper Directory

The first thing to note is that the directory structure should be defined in the production environment to prevent the project from being found when there are too many projects.

#我的目录统一放在/opt below
#首先创建Zookeeper项目目录
mkdir zookeeper #项目目录
mkdir zkdata #存放快照日志
mkdir zkdatalog# Store things in a log
Download Zookeeper
#下载软件
cd/opt/zookeeper/

wget http://mirrors.cnnic.cn/apache/zookeeper/zookeeper-3.4.9/ zookeeper-3.4.9.tar.gz

#解压软件
tar-zxvf zookeeper-3.4.9tar.gz

3, modify the configuration file

Go to the Conf directory in the unpacked directory and view

#进入conf目录
/opt/zookeeper/zookeeper-3.4.9/conf
#查看
[root@192.168.7.107]$ ll
-rw-rw-r--. 1 1000 1000  535 Feb  2014 configuration.xsl
-rw-rw-r--. 1 1000 1000  2161 Feb 2014
- Rw-rw-r--. 1 1000 1000  922 Feb 2014  zoo_sample.cfg
#zoo_sample. cfg This file is the official zookeeper to our sample file, to give him a copy named Zoo.cfg,zoo.cfg is officially designated file naming rules

Configuration files for 3 servers

ticktime=2000
initlimit=10
synclimit=5
datadir=/opt/zookeeper/zkdata
datalogdir=/opt/zookeeper /zkdatalog
clientport=12181
server.1=192.168.7.100:12888:13888
server.2=192.168.7.101:12888:13888
server.3=192.168.7.107:12888:13888
#server. 1 This 1 is the identity of the server and can be other numbers, indicating this is the number of servers, to identify the server, This ID is to be written to the snapshot directory below the myID file
#192.168.7.107 is the IP address in the cluster, the first port is the communication port between master and slave, the default is 2888, and the second port is the leader port. Cluster just started when the election or leader hung up after the new election of the port default is 3888
configuration file Explanation:
#tickTime: This is the time between the
zookeeper server or between the client and the server to maintain the heartbeat interval, that is, each ticktime time will send a heartbeat.
#initLimit:
This configuration entry is used to configure the Zookeeper accept client (where the client is not the client that connects the zookeeper server, but zookeeper the server cluster to the Leader Follower server) The maximum number of heartbeat intervals that can be tolerated when a connection is initialized. This client connection fails when the Zookeeper server has not received the return information of the client after the 5 heartbeat time (i.e. ticktime) length has expired. The total length of time is 5*2000=10 seconds
#syncLimit:
This configuration item identifies Leader and follower send messages between the length of the request and reply, the longest can not exceed how many ticktime length of time, the total length of time is 5 *2000=10 sec
#dataDir:
The storage path of the snapshot log
#dataLogDir: The
storage path of the thing log, if not configured then the event log will be stored by default to the DataDir directory. This can seriously affect ZK performance, when ZK throughput is large, the resulting things log, snapshot log too much
#clientPort:
This port is the client connection Zookeeper server port, zookeeper will listen to this port, Accept the client's access request. Modify his port and change the size.
Creating myID Files
#server1
echo "1" >/opt/zookeeper/zkdata/myid
#server2
echo "2" >/opt/zookeeper/zkdata/myid
#server3
echo "3" >/opt/zookeeper/zkdata/myid

4. Important Configuration Instructions

1, myID files, and server.myid  files that identify the server in the snapshot directory, which is an important identification of the entire ZK cluster used to discover each other.
2, zoo.cfg files are zookeeper configuration files in the Conf directory.
3, log4j.properties file is ZK of the log output file in the Conf directory with Java written program basically have a common ground log are used log4j, to manage.

# Define Some default values that can is overridden by System Properties Zookeeper.root.logger=info, CONSOLE #日志级别 Zookee    Per.console.threshold=info #使用下面的console来打印日志 zookeeper.log.dir=.
#日志打印到那里, it's the directory we started zookeeper (we recommend setting up a unified log directory path) Zookeeper.log.file=zookeeper.log Zookeeper.log.threshold=debug
Zookeeper.tracelog.dir=.  Zookeeper.tracelog.file=zookeeper_trace.log # # Zookeeper Logging Configuration # Format is ' <default threshold>  (<appender>) + # Default:console Appender only Log4j.rootlogger=${zookeeper.root.logger} # Example with rolling Log file #log4j. rootlogger=debug, CONSOLE, Rollingfile # Example with rolling log file and tracing #log4j. Rootlogger=tra CE, console, Rollingfile, Tracefile # Log INFO level and above messages to the console # Log4j.appender.console=org.apa
Che.log4j.ConsoleAppender Log4j.appender.console.threshold=${zookeeper.console.threshold} Log4j.appender.console.layout=org.apache.log4j.patternlayout Log4j.appender.CONSOLE.layout.Conversionpattern=%d{iso8601} [Myid:%x{myid}]-%-5p [%t:%c{1}@%l]-%m%n # ADD Rollingfile to Rootlogger to get log file output # log DEBUG level and above messages to a log file Log4j.appender.rollingfile=org.apache.log4j.rollingfileappender log 4j.appender.rollingfile.threshold=${zookeeper.log.threshold} log4j.appender.rollingfile.file=${ Zookeeper.log.dir}/${zookeeper.log.file} # Max log file size of 10MB LOG4J.APPENDER.ROLLINGFILE.MAXFILESIZE=10MB # uncom ment the next line to limit number of backup files #log4j. appender.rollingfile.maxbackupindex=10 Log4j.appender.ROLLINGFI Le.layout=org.apache.log4j.patternlayout log4j.appender.rollingfile.layout.conversionpattern=%d{iso8601} [myid:% X{myid}]-%-5p [%t:%c{1}@%l]-%m%n # ADD tracefile to Rootlogger to get log file output # Log DEBUG level and ABO ve messages to a log file Log4j.appender.tracefile=org.apache.log4j.fileappender log4j.appender.tracefile.threshold= TRACE Log4j.appender.tracefile.file=${zookeeper.tracelog.dir}/${zookEeper.tracelog.file} log4j.appender.tracefile.layout=org.apache.log4j.patternlayout ### Notice we are including Log4j ' s NDC here (%x) log4j.appender.tracefile.layout.conversionpattern=%d{iso8601} [Myid:%x{myid}]-%-5p [%t:%C{1}@% L][%X]-%m%n configuration for log4j
4, zkenv.sh and zkserver.sh documents
ZKSERVER.SH Master's management program files
Zkenv.sh is the primary configuration, zookeeper a file that configures environment variables at cluster startup
5, there is another need to note
Zookeeper server would not remove old snapshots and log files when using the default configuration (= Autopurge below), t His is the responsibility of the operator
Zookeeper does not actively erase old snapshots and log files, which is the responsibility of the operator.

However, you can use the command to periodically clean up, for example, the following script can be used to define cleanup.

#!/bin/bash 
 
#snapshot file dir 
datadir=/opt/zookeeper/zkdata/version-2
#tran log dir 
datalogdir=/ Opt/zookeeper/zkdatalog/version-2

#Leave count=$[files 
count=66 
$count +1] 
ls-t $dataLogDir/log .* | Tail-n + $count | Xargs rm-f 
ls-t $dataDir/snapshot.* | tail-n + $count | xargs rm-f 

#以上这个脚本定义了删除对应两个目录中的文件, keep the latest 66 files that can be written to cron tab, set it to be done once a day 2 o'clock in the morning.


#zk log dir   del the zookeeper log
#logDir =
#ls-t $logDir/zookeeper.log.* | tail-n + $count | xargs r M-f

Other methods:

The second: Using ZK's Tool class Purgetxnlog, it implements a simple historical file cleanup strategy, where you can look at how he uses it http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html

Third: For the above implementation, ZK has written the script, in bin/zkcleanup.sh, so the direct use of this script can also perform cleanup work.

Fourth: Starting with 3.4.0, Zookeeper provides the ability to automatically clean up snapshot and transaction logs by configuring Autopurge.snapretaincount and Autopurge.purgeinterval These two parameters can be implemented in a timed cleanup. Both parameters are configured in Zoo.cfg: autopurge.purgeinterval This parameter specifies the cleanup frequency, in units of hours, to fill in a 1 or greater integer, the default is 0, which means that you do not open your own cleanup function. Autopurge.snapretaincount This parameter is used in combination with the above parameter, which specifies the number of files that need to be retained.   The default is to keep 3. Recommended to use the first method, for operators, the log cleanup work independent, easy to unified management and more controllable. After all, ZK brought some of the tools is not how to force. 5, start the service and view

1, start the service

#进入到Zookeeper的bin目录下
cd/opt/zookeeper/zookeeper-3.4.9/bin
#启动服务 (3 units are required to operate)
./zkserver.sh Start
2. Check Service status
#检查服务器状态
./zkserver.sh status
You can see the status through Status:
./zkserver.sh status
JMX enabled by default
Using config:/opt/zookeeper/zookeeper-3.4.9/bin/. /conf/zoo.cfg  #配置文件
mode:follower  #他是否为领导

ZK cluster is generally only a leader, multiple follower, the main is the corresponding client's read and write requests, and from the main synchronization data, when the main hung out after the election from the follower vote a leader out.

You can view the ZK process with "JPS," which is the main of the ZK's entire project.

#执行命令jps
20348 Jps
second, the Kafka cluster build 1. Software Environment

1. Linux one or more units, greater than or equal to 2
2, has set up a good zookeeper cluster
3, software version kafka_2.11-0.9.0.1.tgz
2, create the directory and download the installation software

#创建目录
cd/opt/
mkdir Kafka #创建项目目录
cd Kafka mkdir kafkalogs
#创建kafka消息目录, mainly storing Kafka messages

#下载软件
wget  http://apache.opencas.org/kafka/0.9.0.1/kafka_2.11-0.9.0.1.tgz

#解压软件
tar-zxvf Kafka_ 2.11-0.9.0.1.tgz

3, modify the configuration file

Go to config directory

cd/opt/kafka/kafka_2.11-0.9.0.1/config/

Main concern: Server.properties This file, we can find in the directory:

There are a lot of files, here we can find zookeeper files, we can start according to the ZK cluster within the Kafka, but we recommend using a separate ZK cluster

-rw-r--r--. 1 root 5699 Feb 09:41 192.168.7.101-rw-r--r--. 1 root 906 Feb 08:37 connect-console-sink.properties-rw-r--r--. 1 root 909 Feb 08:37 connect-console-source.properties-rw-r--r--. 1 root 2110 Feb 08:37 connect-distributed.properties-rw-r--r--. 1 root 922 Feb 08:38 connect-file-sink.properties-rw-r--r--. 1 root 920 Feb 08:38 connect-file-source.properties-rw-r--r--. 1 root 1074 Feb 08:37 connect-log4j.properties-rw-r--r--. 1 root 2055 Feb 08:37 connect-standalone.properties-rw-r--r--. 1 root 1199 Feb 08:37 consumer.properties-rw-r--r--. 1 root 4369 Feb 08:37 log4j.properties-rw-r--r--. 1 root 2228 Feb 08:38 producer.properties-rw-r--r--. 1 root root 5699 Feb 18:10 server.properties-rw-r--r--. 1 root 3325 Feb 08:37 test-log4j.properties-rw-r--r--. 1 root 1032 Feb 08:37 tools-log4j.properties-rw-r--r--. 1 root 1023 Feb 08:37 zookeeper.properties
To modify a configuration file:
Broker.id=0 #当前机器在集群中的唯一标识, like the myid nature of zookeeper port=19092 #当前kafka对外提供服务的端口默认是9092 host.name=192.168.7.100
This parameter is closed by default, and there is a bug,dns resolution problem in 0.8.1, failure rate problem. num.network.threads=3 #这个是borker进行网络处理的线程数 num.io.threads=8 #这个是borker进行I/O processing number of threads log.dirs=/opt/kafka/kafkalogs/# The directory where messages are stored, which can be configured as "," comma-separated expressions, The above num.io.threads is larger than the number of this directory, if you configure more than one directory, the newly created topic he persisted the message is that the current comma-separated directory where the number of partitions is at least the one socket.send.buffer.bytes =102400 #发送缓冲区buffer大小, the data is not sent in a flash, the first store to the buffer to reach a certain size after the delivery, can raise High-performance socket.receive.buffer.bytes=102400 #kafka接收缓冲区大小,
This value cannot exceed the Java stack size when the data reaches a certain size and is serialized to disk socket.request.max.bytes=104857600 #这个参数是向kafka请求消息或者向kafka发送消息的请请求的最大数 Num.partitions=1 #默认的分区数, a topic default 1 partition number log.retention.hours=168 #默认消息的最大持久化时间, 168 hours, 7 days message.max.byte=5242880 # Maximum message saved 5M default.replication.factor=2 #kafka保存消息的副本数, if one copy fails, the other can continue to provide services replica.fetch.max.bytes=5242880 #
Cancel the maximum number of direct log.segment.bytes=1073741824 #这个参数是: Because Kafka message is in the form of appending to the file, when this value, Kafka will start a new file log.retention.check.interval.ms=300000 #每隔300000毫秒去检查上面配置的logExpiration Time (log.retention.hours=168), to the directory to see if there are expired messages if any, delete the Log.cleaner.enable=false #是否启用log压缩, generally do not have to enable, can improve performance zookeeper.connect=192.168.7.100:12181,192.168.7.101:12181,192.168.7.107:1218 #设置zookeeper的连接端口
The above is an explanation of the parameters, and the actual modifications are:
#broker. id=0  broker.id For each server cannot be the same


#hostname
host.name=192.168.7.100

#在log. retention.hours=168 The following add the following three
message.max.byte=5242880
default.replication.factor=2
replica.fetch.max.bytes=5242880

#设置zookeeper的连接端口
zookeeper.connect=192.168.7.100:12181,192.168.7.101:12181,192.168.7.107:12181
4, start the Kafka cluster and test

1, start the service

#从后台启动Kafka集群 (3 All need to start)
CD

/opt/kafka/kafka_2.11-0.9.0.1//bin #进入到kafka的bin目录 
./ Kafka-server-start.sh-daemon.. /config/server.properties
2, check whether the service is started
#执行命令jps
20348 Jps
4233 quorumpeermain
18991 Kafka

3, create topic to verify the success of the creation

For more, see official documentation: http://kafka.apache.org/documentation.html

#创建Topic
./kafka-topics.sh--create--zookeeper 192.168.7.100:12181--replication-factor 2--partitions 1--topic Shuaige
#解释
--replication-factor 2   #复制两份
--partitions 1 #创建1个分区
--topic #主题为shuaige

' " Create a Publisher ' #创建一个broker on a single server
, publisher
./kafka-console-producer.sh--broker-list 192.168.7.100:19092--topic Shuaige

' Create a Subscriber on a server '
./kafka-console-consumer.sh--zookeeper localhost:12181--topic Shuaige-- From-beginning
Test (post a message to the publisher to see if the Subscriber has received it properly):
4. Other Orders
Most of the commands can go to the official document view
4.1, view topic
./kafka-topics.sh--list--zookeeper localhost:12181
#就会显示我们创建的所有topic
4.2. View topic status
/kafka-topics.sh--describe--zookeeper localhost:12181--topic shuaige
#下面是显示信息
topic:ssports    Partitioncount:1    replicationfactor:2    configs:
    topic:shuaige    partition:0    leader:1    replicas:0,1    isr:1
#分区为为1  replication factor is 2   his  shuaige partition is 0 
#Replicas: 0,1   replicated for 0,1
#

At this point, Kafka cluster set up 5, other instructions annotated

5.1, log Notes

The default Kafka log is saved in the/opt/kafka/kafka_2.10-0.9.0.0/logs directory, where you'll notice a few logs that need attention.

Server.log #kafka的运行日志
state-change.log  #kafka他是用zookeeper来保存状态, so he might switch, and the switch log will be saved here

Controller.log #kafka选择一个节点作为 "Controller", which is responsible for selecting new leader in all nodes of the swimming Division when the node is down, which allows Kafka to efficiently manage the master-slave relationships of all partitioned nodes in batches. If controller down, one of the surviving nodes will be ready to switch to the new controller.
5.2, login ZK to view the directory situation of ZK
#使用客户端进入zk./zkcli.sh-server 127.0.0.1:12181 #默认是不用加 '-server ' parameter because we modified his port #查看目录情况 execute "LS/" [Zk:127.0.0.1:12181 (CONN ected) 0] ls/#显示结果: [Consumers, config, controller, isr_change_notification, admin, brokers, zookeeper, Controller_epoch ] "The display result above: Only zookeeper is, zookeeper native, the others are created by Kafka" ' #标注一个重要的 [zk:127.0.0.1:12181 (CONNECTED) 1] get/brokers/ids/ 0 {"Jmx_port":-1, "timestamp": "1456125963355", "Endpoints": ["plaintext://192.168.7.100:19092"], "host": " 192.168.7.100 "," Version ": 2," Port ": 19092} czxid = 0x1000001c1 CTime = Mon Feb CST 15:26:03 2016 = Mzxid 0x1000001c1 E = Mon Feb 15:26:03 CST 2016 Pzxid = 0x1000001c1 cversion = 0 dataversion = 0 aclversion = 0 Ephemeralowner = 0x152e40 aead20016 datalength = 139 Numchildren = 0 [zk:127.0.0.1:12181 (CONNECTED) 2] #还有一个是查看partion [zk:127.0.0.1:12181 (Conne CTED) 7] get/brokers/topics/shuaige/partitions/0 null CZXID = 0x100000029 CTime = Mon Feb-10:05:11 CST 2016 MZXID = 0x 100000029 mtime = Mon Feb 10:05:11 CST 2016 pZxid = 0x10000002a Cversion = 1 dataversion = 0 aclversion = 0 Ephemeralowner = 0x0 datalength = 0 Numchildren = 1 [zk:12 7.0.0.1:12181 (CONNECTED) 8]


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.