Summary of daily work experience of Kafka cluster in mission 800 operation and Maintenance summary

Last Update:2016-03-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Some of the important principles
The basic principle what is called Broker Partition CG I'm not here to say, say some of the principles I have summed up

1.kafka has the concept of a copy, each of which is divided into different partition, which is split between leader and Fllower
2.kafka consumption end of the program must be consistent with the number of partition, can not be more, there will be some consumer get
The phenomenon of no data
3.producer principle
Producer through zookeeper to get the connected topic are in those Partiton, each parition of leader is that
For leader, Prodcer through the zookeeper watch mechanism to record the above information, pro
Ducer to save the IO on the network, it will also buffer the messages locally and send them to the broker in bulk.
4.consumer principle
Consumer sends a FETCH request to the broker and informs the obtained message that offset, in Kafka, uses pull, and the consumer
Active pull messages, advantages: Consumers can control the amount of consumption

Summary of common commands in 2.kafka production environment

1. Simulate the production side, push the data

./bin/kafka-console-producer.sh--broker-list 172.16.10.130:9092--topic deal_exposure_origin

2. Analog consumer, consumer data

./bin/kafka-console-consumer.sh--zookeeper 1172.16.10.140:2181--topic deal_exposure_origin

3. Create Topic,topic Partiton Number of copies data expiration time

./kafka-topics.sh--zookeeper spark:2181--create--topic deal_task_log--partitions 1--replication-factor . Ms 1296000000

3.kafka How to add a copy dynamically

1. Copy, Kafka must set up a copy, if after the addition will be related to the synchronization of data, the cluster IO will be raised up

3. How to enlarge a copy

2. Record all topic information into the JSON file with the topic name, which partition, the copy in the partition,

and modify the JSON data to add the number of replicas

#!/usr/bin/python

From kazoo.client import kazooclient

Import Random

Import JSON

ZK = kazooclient (hosts= ' 172.16.11.73:2181 ')

Zk.start ()

For I in Zk.get_children ('/brokers/topics '):

b= zk.get ('/brokers/topics/' +i) [0]

A = eval (b) [' Partitions ']

list = []

Dict = {}

For Key,value in A.items ():

If Len (value) = = 1:

c = {}

c[' topic ' = I.encode (' Utf-8 ')

c[' partition ' = Int (key)

List1 = []

For II in range (0,3):

While True:

If List1:

Pass

Else

For III in Value:

List1.append (iii)

If Len (list1) = = 3:

Break

num = Random.randint (0,4)

#print ' num= ' +str (num), ' value= ' +str (value)

If num not in List1:

List1.append (num)

#print List1

c[' replicas '] = List1

List.append (c)

Version = eval (b) [' Version ']

dict[' Version ' = version

dict[' partitions '] = List

#jsondata = Json.dumps (dict)

Json.dump (Dict,open ('/opt/json/' +i+ '. Json ', ' W '))

3. Loading the JSON file

/usr/local/kafka_2.9.2-0.8.1.1/bin/kafka-reassign-partitions.sh--zookeeper 192.168.5.159:2181-- Reassignment-json-file/opt/test.json--execute

4. See if a copy has been added

usr/local/kafka_2.9.2-0.8.1.1/bin/kafka-topics.sh--describe--zookeeper 192.168.5.159:2181--topic testtest

Topic:testtest partitioncount:15 replicationfactor:2 configs:

Topic:testtest partition:0 leader:0 replicas:0,1 isr:0,1

Topic:testtest partition:1 leader:0 replicas:0,1 isr:0,1

Topic:testtest Partition:2 leader:0 replicas:0,1 isr:0,1

Topic:testtest Partition:3 leader:0 replicas:0,1 isr:0,1

Topic:testtest Partition:4 leader:0 replicas:0,1 isr:0,1

Topic:testtest Partition:5 leader:0 replicas:0,1 isr:0,1

Topic:testtest Partition:6 leader:0 replicas:0,1 isr:0,1

Topic:testtest Partition:7 leader:0 replicas:0,1 isr:0,1

Topic:testtest Partition:8 leader:0 replicas:0,1 isr:0,1

Topic:testtest Partition:9 leader:0 replicas:0,1 isr:0,1

Topic:testtest partition:10 leader:0 replicas:0,1 isr:0,1

Topic:testtest partition:11 leader:0 replicas:0,1 isr:0,1

Topic:testtest partition:12 leader:0 replicas:0,1 isr:0,1

Topic:testtest partition:13 leader:0 replicas:0,1 isr:0,1

Topic:testtest partition:14 leader:0 replicas:0,1 isr:0,1

Data synchronization between 4.kafka clusters

Find a broker node to synchronize

1. Create a configuration file Mirror_consumer.config

The local Kafka cluster is written in the configuration file zookeeper

Define a group to consume all the topic and synchronize

zookeeper.connect=172.16.11.43:2181,172.16.11.46:2181,172.16.11.60:2181,172.16.11.67:2181,172.16.11.73:2181

Group.id=backup-mirror-consumer-group

2. Create a configuration file Mirror_producer.config

IP of Zookeeper,kafka IP write-to-end cluster

zookeeper.connect=172.17.1.159:2181,172.17.1.160:2181

metadata.broker.list=172.17.1.159:9092,172.17.1.160:9092

3. Synchronize commands

$KAFKA _home/bin/kafka-run-class.sh kafka.tools.MirrorMaker--consumer.config sourceclusterconsumer.config-- Num.streams 2--producer.config targetclusterproducer.config--whitelist= ". *"

Detailed parameters

1. White list (whitelist) blacklist (blacklist)

Mirror-maker accepts the whitelist and blacklist of the exact specified sync topic. Using the Java standard Regular expression, for convenience, the comma (', ') is compiled into the Java Regular (' | ').

2. Producer Timeout

To support high throughput, you might want to use the asynchronous built-in producer and set the built-in producer to block mode (QUEUE.ENQUEUETIMEOUT.MS=-1). This guarantees that the data (messages) will not be lost. Otherwise, the asynchronous producer default Enqueuetimeout is 0, and if the producer internal queue is full, the data (messages) is discarded and a queuefullexceptions exception is thrown. For the producer of blocking mode, if the internal queue is full, it will wait, thus effectively control the internal consumer consumption speed. You can open producer's Trace logging and view the remaining amount of the internal queue at any time. If the internal queue of the producer is full for a long time, this means that for mirror-maker, pushing the message back to the target Kafka cluster or writing the message to disk is a bottleneck.

For detailed configuration of KAFKA producer synchronous Async, refer to the $kafka_home/config/producer.properties file. Focus on the two fields of Producer.type and queue.enqueueTimeout.ms.

3. Producer retry attempts (retries)

If you use Broker.list in the producer configuration, you can set the number of retries to fail when the data is published. The retry parameter is used only when using broker.list, because the broker is re-selected when retrying.

4. Number of Producer

By setting the-num.producers parameter, you can use a producer pool to increase the throughput of mirror maker. The producer on the broker that accepts the data (messages) is handled using only a single thread. Even if you have multiple consumption streams, throughput will be limited when producer processing requests.

5. Number of consumption streams (consumption streams)

Use-num.streams to specify the number of threads for consumer. Note that if you start multiple mirror maker processes, you may need to look at the distribution of their partitions in the source Kafka cluster. If the number of consumption flows (consumption streams) on each mirror maker process is too large, some consumer processes will be put in an idle state if they do not own any partition, mainly because of the consumer load balancing algorithm.

6. Shallow iteration (shallow iteration) and producer compression

We recommend that you turn on shallow iterations (shallow iteration) in the consumer of mirror maker. This means that mirror maker's consumer does not decompress the compressed message set (Message-sets), but synchronizes the captured message set data directly to producer.

If you turn on shallow iterations (shallow iteration), you must turn off producer compression in mirror maker, otherwise the message set (Message-sets) will be compressed repeatedly.

7. Socket buffer sizes for Consumer and source Kafka cluster (source cluster)

Mirroring is often used in cross-cluster scenarios, you may want to optimize communication latency and specific hardware performance bottlenecks for internal clusters with some configuration options. In general, you should set a high value for the consumer socket.buffersize in Mirror-maker and the socket.send.buffer of the source cluster broker. In addition, the fetch.size of the consumer (consumer) in Mirror-maker should set a higher value than socket.buffersize. Note that the socket buffer size (socket-sized size) is the parameter of the operating system network layer. If you enable trace-level logging, you can check the actual received buffer size (buffer sizes) to determine whether the operating system's network layer is tuned.

4. How to check Mirrormaker health

The Consumer Offset checker tool can be used to check the consumption progress of the mirror to the source cluster. For example:

bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker--group kafkamirror--zkconnect localhost:2181--topic Test-topic

kafkamirror,topic1,0-0 (Group,topic,brokerid-partitionid)

Owner = kafkamirror_jkoshy-ld-1320972386342-beb4bfc9-0

Consumer offset = 561154288

= 561,154,288 (0.52G)

Log size = 2231392259

= 2,231,392,259 (2.08G)

Consumer lag = 1670237971

= 1,670,237,971 (1.56G)

BROKER INFO

0-127.0.0.1:9092

Note that the –zkconnect parameter needs to be specified to the zookeeper of the source cluster. In addition, if the specified topic is not specified, all topic information under the current consumer group is printed.

5.kafka Disk IO High resolution method used

Problem: Kafka disk IO is too high

We have 5 Kafka machines on the production platform and 2 disks per machine for parition

Recently discovered that the disk IO used by Kafka is very high, affecting the performance of the production-side push data

At first thought was due to a push log topic, because the push data per second about about 2w,

This topic was later migrated to other Kafka clusters or not seen.

The final iotop discovery is actually caused by zookeeper persistence.

Zookeeper is also written to the disk used by Kafka when it is persisted.

Use this issue to illustrate a few questions

1.kafka with zookeeper, and other applications familiar to us such as Solrcloud Codis otter not quite the same

The general use of zookeeper is to manage cluster nodes, while Kafka with Zookeeper is the core, both production and consumption will go

Link Zookeeper get the response information

Production end through link zookeeper get topic all use those parition, each parition copy of leader is that

Consumer end link Zookeeper get offset, consumer consumption will operate on the zookeeper data modification, the operation of IO

Very often

Workaround:

Prohibit zookeeper from doing persistent operations

Add a row to the configuration file

Forcesync=no

Problem solving

This article from "Expect volume synchronization data" blog, declined reprint!

Summary of daily work experience of Kafka cluster in mission 800 operation and Maintenance summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Summary of daily work experience of Kafka cluster in mission 800 operation and Maintenance summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Summary of daily work experience of Kafka cluster in mission 800 operation and Maintenance summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support