Kafka 0.9 version of the Java Client API made a large adjustment, this article mainly summarizes the Kafka 0.9 in the cluster construction, high availability, the new API related processes and details, as well as I in the installation and commissioning process to step out of the various pits.
About Kafka structure, function, characteristics, application scenarios, etc., all over the Internet, I will not repeat, directly into the text
Kafka 0.9 cluster Installation configuration
Operating system: CentOS 6.5
1. Installing the Java Environment
Both zookeeper and Kafka require a Java environment, so installing Jre,kafka defaults to using the G1 garbage collector, and if you do not change the garbage collector, it is recommended that you use the JRE above 7u51 version. If you are using an older version of the JRE, you need to change the Kafka startup script to specify a garbage collector other than G1.
The installation process for the Java environment does not dwell on this.
2. Zookeeper Cluster construction
Kafka relies on zookeeper to manage its own clusters (Broker, Offset, Producer, consumer, etc.), so install zookeeper first. Naturally, for high-availability purposes, the zookeeper itself cannot be a single point, and the next step is to introduce how to build a minimal zookeeper cluster (3 ZK nodes)
The version of Zookeeper chosen here is 3.4.6, which is the zookeeper version recommended in Kafka0.9.
1) First Unzip
TAR-XZVF zookeeper-3.4.6.tar.gz
2) Enter the Conf directory of the Zookeeper, copy the Zoo_sample.cfg, and name it zoo.cfg, which is the zookeeper configuration file
CP Zoo_sample.cfg Zoo.cfg
3) Edit Zoo.cfg
# the number of milliseconds of each ticktickTime=2000# The number of ticks that the initial# Synchronization phase can takeinitlimit=10# the number of ticks that can pass between# sending a request and getting an acknowledgementsynclimit=5# the directory where the snapshot is Stored.datadir=/data/zk/zk0/datadatalogdir=/data/zk/zk0/logs# the port at which the clients will connectclientport=2181server.0=10.0.0.100:4001:4002server.1= 10.0.0.101:4001:4002server.2=10.0.0.102:4001:4002
The paths of DataDir and datalogdir need to be created before starting.
ClientPort Service port for zookeeper
SERVER.0/1/2 is the three node information in the ZK cluster, defined as HOSTNAME:PORT1:PORT2, where Port1 is the port used by node communication, and Port2 is the port that node elects to use. Make sure that the two ports on each of the three hosts are interoperable
4) Perform the same operation on the other two hosts, install and configure the Zookeeper
5) A file named myID is created under the DataDir path of the three hosts, and the file content is the number of the ZK node. For example, the contents of the myID file created on the first host are 0, and the second one is 1.
6) Start the Zookeeper service on three hosts:
bin/zkserver.sh start
After the 3 nodes are started, you can view the cluster status by executing the following command:
bin/zkserver.sh status
The command output is as follows:
Mode:leader or Mode:follower |
Of the 3 nodes, there should be 1 leader and two follower
7) Verify zookeeper cluster high availability:
Assuming the current 3 ZK nodes, Server0 is Leader,server1 and Server2 as follower
We stop the Zookeeper service on SERVER0:
bin/zkserver.sh stop
Then to Server1 and server2 to view the status of the cluster, you will find that at this time Server1 (also may be Server2) is leader, and the other is follower.
Start the Server0 Zookeeper service again, run the zkserver.sh status check, and discover that the new boot Server0 is also follower
At this point, the installation and high availability validation of the zookeeper cluster is complete.
Attached: Zookeeper the console information is output to the zookeeper.out in the boot path by default, and obviously in production we cannot allow zookeeper to do so, and you can let the zookeeper output the log files by size in the following way:
Modify the Conf/log4j.properties file to Zookeeper.root.logger=info, CONSOLE Switch Zookeeper.root.logger=info, rollingfile
Modify the bin/zkenv.sh file to Zoo_log4j_prop= "INFO,CONSOLE" Switch Zoo_log4j_prop= "INFO,rollingfile"
And then restart Zookeeper, it's OK. |
3. Kafka Cluster Construction
In this example, we will install a Kafka cluster with two brokers and create a two-partition topic on it.
The latest version of Kafka is used in this example 0.9.0.1
1) First Unzip
TAR-XZVF kafka_2.11-0.9.0.1.tgz
2) Edit the Config/server.properties file and the key parameters listed below
#此Broker的ID, the ID of each broker in the cluster is not the same broker.id=0# listener, port number is consistent with port to Listeners=plaintext://:9092#broker listening ports port=9092# Broker's hostname, fill in the host IP can host.name=10.0.0.100# to producer and consumer recommended connection hostname and port (here there is a pit, See below) advertised.host.name=10.0.0.100advertised.port=9092# the number of threads in the IO should be greater than the number of host disks num.io.threads=8# the path to the message file store log.dirs=/ data/kafka-logs# the message file cleanup cycle, that is, cleanup x hours before the message record log.retention.hours=168# each topic the default number of partitions, generally in the creation of topic will specify the number of partitions. So this is going to be 1. Num.partitions=1#zookeeper connection string, here fill in the previous section of the three ZK node installed IP and port can be zookeeper.connect= 10.0.0.100:2181,10.0.0.101:2181,10.0.0.102:2181
For a detailed description of the configuration item, see the official documentation:
Http://kafka.apache.org/documentation.html#brokerconfigs
Here's The pit: According to the official documentation, The Advertised.host.name and Advertised.port parameters are used to define the node host and port that the cluster broadcasts to producer and consumer, and if not defined, the definition of host.name and port is used by default. But in practical applications, I found that if you do not define the Advertised.host.name parameter, the connection timeout occurs when using a Java client to connect to the cluster from the remote end, throwing an exception: Org.apache.kafka.common.errors.TimeoutException: Batch Expired
After debug discovery, the connection to the cluster was successful, but the cluster meta information that was updated after it was connected to the cluster was wrong: 650) this.width=650; "style=" Margin-left:auto;margin-right:auto; "src=" http://dl2.iteye.com/upload/attachment/ 0116/2134/643043e9-08d4-3402-9471-4c4adaaba323.png "/> Can see, metadata in the cluster information, the node hostname is iz25wuzqk91z such a string of numbers, rather than the actual IP address 10.0.0.100 and 101. IZ25WUZQK91Z is actually the hostname of the remote host, This means that without the configuration of Advertised.host.name, Kafka did not broadcast the host.name that we configured, as the official document claimed, but instead broadcast the hostname of the host configuration. The remote client does not have the hosts configured, so it is naturally not connected to this hostname. To solve this problem, the Host.name and Advertised.host.name are configured to the absolute IP address.
|
3) Install and configure the Kafka on the other host, and then start the Kafka on the two hosts separately:
Bin/kafka-server-start.sh-daemon config/server.properties
Here's The pit: The official way to start Kafka in the background is to:
bin/kafka-server-start.sh Config/server.properties & But after starting this way, just disconnect the shell or log out, the Kafka service will automatically shutdown, do not know whether it is the OS problem or SSH problem or Kafka own problems, Anyway, I switched to-daemon mode to start Kafka without automatically shutdown after disconnecting the shell. |
4) Create a topic named Test with two partitions and two replicas:
bin/kafka-topics.sh--create--zookeeper 10.0.0.100:2181,10.0.0.101:2181,10.0.0.102:2181--replication-factor 2-- Partitions 2--topic Test
Once created, use the following command to view the topic status:
bin/kafka-topics.sh--describe--zookeeper 10.0.0.100:2181,10.0.0.101:2181,10.0.0.102:2181--topic test
Output:
Topic:test partitioncount:2 replicationfactor:2 Configs:Topic:test partition:0 leader:1 replicas:1,0 isr:0,1 Top Ic:test partition:1 leader:0 replicas:0,1 isr:0,1
Interpretation: Test This topic, currently has 2 partitions, respectively 0 and 1, Partition 0 leader is 1 (this 1 is broker.id), partition 0 has two replica (replicas), respectively 1 and 0, the two replicas, the ISR (In-sync) is 0 and 1. Partition 2 leader is 0, there are two replica, also two replica are in-sync states
At this point, the construction of the Kafka 0.9 cluster is complete, and in the next section we will cover the use of the new Java API and the validation tests for cluster high availability.
This article is from the "OCD Severe patients" blog, reproduced please contact the author!
Kafka 0.9+zookeeper3.4.6 Cluster setup, configuration, use points for new Java client, high availability test, and various pits (i)