First, Kafka use the background
There are a number of issues that can be encountered when using distributed databases and distributed computing clusters:
Need to analyze user behavior (pageviews);
The user's search keywords are counted to analyze the current trends
Some data, storage database waste, direct storage drive efficiency and low
These scenarios have one thing in common:
Data is generated by the upstream module, upstream module, using the upstream module data calculation, statistics, analysis, this time can use the message system, especially the distributed message system.
second, the definition of Kafka
What is Kafka: It is a distributed messaging system, written by LinkedIn using Scala, as the basis for the activity stream of LinkedIn and the Operational Data Processing pipeline (Pipeline). With high levels of scale and throughput.
Three, Kafka cluster configuration
Kafka cluster is to save the state in the zookeeper, the first to build zookeeper cluster.
1, zookeeper cluster installation
a), Software Environment
(3 servers-test)
192.168.1.110 Server1
192.168.1.111 server2
192.168.1.112 server3
b), Java JDK installation , Zookeeper is written in Java, so he needs the Java environment, Java is the
configuration JDK environment variable that runs on the Java Virtual machine, and then creates the directory,
Unified operation for 3 servers
mkdir zkdata #存放快照日志
mkdir #存放事物日志
such as:
/usr/software/zookeeper/zkdata
/usr/software/zookeeper/zkdatalog
c), Modify the configuration file
$ tar-zxvf zookeeper.tar.gz
into the Conf directory in the Zookeeper extracted directory, view
#进入zookeeper的conf目录
$ cd /usr/software/zookeeper/conf
#zoo_sample. CFG This file is an official template file for our zookeeper, Give him a copy. Named Zoo.cfg,zoo.cfg is the official file naming convention.
# cp zoo_sample.cfg zoo.cfg
3 server configuration file
ticktime=2000
initlimit=10
synclimit=5
datadir=/usr/software/zookeeper/zkdata
datalogdir=/usr/ Software/zookeeper/zkdatalog
clientport=2181
server.1=192.168.1.110:2888:3888
server.2= 192.168.1.111:2888:3888
server.3=192.168.1.112:2888:3888
#server. 1 This 1 is the server's identity can also be other numbers, indicating this is the number of servers, used to identify the server, this identity to write to the snapshot directory under the myID file
#192.168.1.110 is the IP address in the cluster,
Port Description:
The first port is the communication port between master and slave, which is 2888 by default.
The second port is the port of the leader election, when the cluster is just started, when the election or leader is hung, the port default is 3888 for the new election.
configuration file Explanation:
#tickTime:
This time is the interval between Zookeeper servers or between the client and server, which means that each ticktime time sends a heartbeat.
#initLimit:
This configuration item is used to configure the Zookeeper accept client (the client here is not the client that connects the Zookeeper server, but the Follower that is connected to Leader in the Zookeeper server cluster) Server) The maximum number of heartbeat intervals that can be tolerated when a connection is initialized. The client connection failed when the Zookeeper server has not received the return information of the client after 5 heartbeats (that is, ticktime) length. The total length of time is 5*2000=10 seconds.
#syncLimit:
This configuration entry identifies the length of time that a message is sent between Leader and follower, the duration of the request and the response, and the maximum number of ticktime, the total length of time is 5*2000=10 seconds
#dataDir:
Storage path for snapshot logs
#dataLogDir:
The storage path of the thing log, if you do not configure this then the thing log will be stored by default to the DataDir-developed directory, which will seriously affect the performance of ZK, when ZK throughput is large, the resulting thing log, snapshot log too many
#clientPort:
This port is the port where the client connects to the Zookeeper server, and Zookeeper listens to the port, accepting requests for client access. Change his port to bigger.
Create a myID file, each configured with one myID for Server1 to 3 machines
#server1
echo "1" >/usr/software/zookeeper/zkdata/myid
#server2
echo "2" >/usr/software/zookeeper/zkdata/myid
#server3
echo "3" >/usr/software/zookeeper/zkdata/myid
d), important configuration instructions
1. myid files and Server.myid files that identify the server in the snapshot directory, he is an important identifier used by the entire ZK cluster to discover each other.
2. The zoo.cfg file is the zookeeper configuration file in the Conf directory.
3, log4j.properties file is ZK's log output file in the Conf directory with Java written program basically have a common log all use log4j, to manage.
4. zkenv.sh and zkserver.sh files
ZKSERVER.SH Master's management program files
Zkenv.sh is the primary configuration, zookeeper a file that configures environment variables when the cluster starts
e), start the service and view
1. Start the service
#进入到zookeeper的bin目录下
Cd/usr/software/zookeeper/bin
#启动服务 (all 3 units need to be operated)
./zkserver.sh Start
f), check service status
#检查服务器状态
#./ZKSERVER.SH Status
JMX enabled by default
Using config:/usr/software/zookeeper/bin/. /conf/zoo.cfg
Client Port found:2181
Mode:leader
View on another ZK machine
#./ZKSERVER.SH Status
JMX enabled by default
Using config:/usr/software/zookeeper/bin/. /conf/zoo.cfg
Client Port found:2181
Mode:follower
Can see the zookeeper cluster after the master and slave points,
You can also view the boot log print
# Cat/usr/software/zookeeper/bin/zookeeper.out
Test:
Three machines can be shut down and restarted for testing, and they can be seen to switch between leader and follower.
iv. Kafka Cluster Construction
1. Software Environment
1, more than 2 sets of machines
2. The zookeeper cluster has been built
3. Kafka installation Package
2. Create a directory and download the installation software
#创建目录
# mkdir Kafkalogs #创建kafka消息目录, main store Kafka messages
#解压软件
# TAR-ZXVF Kafka_2.11-1.0.0.tgz
3. Modify the configuration file
Go to config directory
# cd/user/kafka/config/
Main concern: Server.properties This file can be found in the directory:
[Root@centos1 config]# LL
Total Dosage 64
-rw-r--r--. 1 root root 906 October 23:56 connect-console-sink.properties
-rw-r--r--. 1 root root 909 October 23:56 connect-console-source.properties
-rw-r--r--. 1 root root 5807 October 23:56 connect-distributed.properties
-rw-r--r--. 1 root root 883 October 23:56 connect-file-sink.properties
-rw-r--r--. 1 root root 881 October 23:56 connect-file-source.properties
-rw-r--r--. 1 root root 1111 October 23:56 connect-log4j.properties
-rw-r--r--. 1 root root 2730 October 23:56 connect-standalone.properties
-rw-r--r--. 1 root root 1221 October 23:56 consumer.properties
-rw-r--r--. 1 root root 4727 October 23:56 log4j.properties
-rw-r--r--. 1 root root 1919 October 23:56 producer.properties
-rw-r--r--. 1 root root 6852 October 23:56 server.properties
-rw-r--r--. 1 root root 1032 October 23:56 tools-log4j.properties
-rw-r--r--. 1 root root 1023 October 23:56 zookeeper.properties
To modify the server.properties configuration file:
#在log. retention.hours=168 the following three additions
#hostname
host.name=192.168.1.110
message.max.byte=5242880
default.replication.factor=2
replica.fetch.max.bytes=5242880
Set zookeeper connection parameters at the same time:
#设置zookeeper的连接端口
zookeeper.connect=192.168.1.110:2181,192.168.1.111:2181,192.168.1.112:2181
4. Start Kafka cluster and test 1. Start the service
#从后台启动Kafka集群 (all 3 units need to be started)
#进入到kafka的bin目录
#./kafka-server-start.sh-daemon. /config/server.properties
2. Check whether the service is started
[Root@centos1 config]# JPS
1800 Kafka
1873 Jps
1515 Quorumpeermain
3. Create topic to verify that the creation is successful
#创建Topic
#./kafka-topics.sh--create--zookeeper 192.168.1.110:2181--replication-factor 2--partitions 1--topic test
#解释
--replication-factor 2 #复制两份
--partitions 1 #创建1个分区
--topic #主题为shuaige
#在一台服务器上创建一个发布者
#创建一个broker, publisher
./kafka-console-producer.sh--broker-list 192.168.1.111:9092--topic test
>
> Can send messages
#在一台服务器上创建一个订阅者
#./kafka-console-consumer.sh--zookeeper 192.168.1.112:2181--topic test--from-beginning
> Receive messages
>
>
cluster configuration is complete;
Five, other instructions callout
1, log description
The default Kafka log is saved in the/user/software/kafka/logs directory. , here are a few notes that you need to be aware of
Server.log #kafka的运行日志
State-change.log #kafka他是用zookeeper来保存状态, so he might switch, The log of the switch is saved here
Controller.log #kafka选择一个节点作为 "Controller", which is responsible for selecting a new leader in all nodes of the swimming partition when a node is found to be down. This allows Kafka to efficiently manage the master-slave relationships of all partition nodes in batches. If the controller is down, one of the surviving nodes will switch to the new controller.