Some of the important principles
The basic principle what is called Broker Partition CG I'm not here to say, say some of the principles I have summed up
1.kafka has the concept of a copy, each of which is divided into different partition, which is split between leader and Fllower
2.kafka consumption end of the program must be consistent with the number of partition, can not be more, there will be some consumer get
The phenomenon of no data
3.producer principle
Producer through zookeeper to get the connected topic are in those Partiton, each parition of leader is that
For leader, Prodcer through the zookeeper watch mechanism to record the above information, pro
Ducer to save the IO on the network, it will also buffer the messages locally and send them to the broker in bulk.
4.consumer principle
Consumer sends a FETCH request to the broker and informs the obtained message that offset, in Kafka, uses pull, and the consumer
Active pull messages, advantages: Consumers can control the amount of consumption
Summary of common commands in 2.kafka production environment
1. Simulate the production side, push the data
./bin/kafka-console-producer.sh--broker-list 172.16.10.130:9092--topic deal_exposure_origin
2. Analog consumer, consumer data
./bin/kafka-console-consumer.sh--zookeeper 1172.16.10.140:2181--topic deal_exposure_origin
3. Create Topic,topic Partiton Number of copies data expiration time
./kafka-topics.sh--zookeeper spark:2181--create--topic deal_task_log--partitions 1--replication-factor . Ms 1296000000
3.kafka How to add a copy dynamically
1. Copy, Kafka must set up a copy, if after the addition will be related to the synchronization of data, the cluster IO will be raised up
3. How to enlarge a copy
2. Record all topic information into the JSON file with the topic name, which partition, the copy in the partition,
and modify the JSON data to add the number of replicas
#!/usr/bin/python
From kazoo.client import kazooclient
Import Random
Import JSON
ZK = kazooclient (hosts= ' 172.16.11.73:2181 ')
Zk.start ()
For I in Zk.get_children ('/brokers/topics '):
b= zk.get ('/brokers/topics/' +i) [0]
A = eval (b) [' Partitions ']
list = []
Dict = {}
For Key,value in A.items ():
If Len (value) = = 1:
c = {}
c[' topic ' = I.encode (' Utf-8 ')
c[' partition ' = Int (key)
List1 = []
For II in range (0,3):
While True:
If List1:
Pass
Else
For III in Value:
List1.append (iii)
If Len (list1) = = 3:
Break
num = Random.randint (0,4)
#print ' num= ' +str (num), ' value= ' +str (value)
If num not in List1:
List1.append (num)
#print List1
c[' replicas '] = List1
List.append (c)
Version = eval (b) [' Version ']
dict[' Version ' = version
dict[' partitions '] = List
#jsondata = Json.dumps (dict)
Json.dump (Dict,open ('/opt/json/' +i+ '. Json ', ' W '))
3. Loading the JSON file
/usr/local/kafka_2.9.2-0.8.1.1/bin/kafka-reassign-partitions.sh--zookeeper 192.168.5.159:2181-- Reassignment-json-file/opt/test.json--execute
4. See if a copy has been added
usr/local/kafka_2.9.2-0.8.1.1/bin/kafka-topics.sh--describe--zookeeper 192.168.5.159:2181--topic testtest
Topic:testtest partitioncount:15 replicationfactor:2 configs:
Topic:testtest partition:0 leader:0 replicas:0,1 isr:0,1
Topic:testtest partition:1 leader:0 replicas:0,1 isr:0,1
Topic:testtest Partition:2 leader:0 replicas:0,1 isr:0,1
Topic:testtest Partition:3 leader:0 replicas:0,1 isr:0,1
Topic:testtest Partition:4 leader:0 replicas:0,1 isr:0,1
Topic:testtest Partition:5 leader:0 replicas:0,1 isr:0,1
Topic:testtest Partition:6 leader:0 replicas:0,1 isr:0,1
Topic:testtest Partition:7 leader:0 replicas:0,1 isr:0,1
Topic:testtest Partition:8 leader:0 replicas:0,1 isr:0,1
Topic:testtest Partition:9 leader:0 replicas:0,1 isr:0,1
Topic:testtest partition:10 leader:0 replicas:0,1 isr:0,1
Topic:testtest partition:11 leader:0 replicas:0,1 isr:0,1
Topic:testtest partition:12 leader:0 replicas:0,1 isr:0,1
Topic:testtest partition:13 leader:0 replicas:0,1 isr:0,1
Topic:testtest partition:14 leader:0 replicas:0,1 isr:0,1
Data synchronization between 4.kafka clusters
Find a broker node to synchronize
1. Create a configuration file Mirror_consumer.config
The local Kafka cluster is written in the configuration file zookeeper
Define a group to consume all the topic and synchronize
zookeeper.connect=172.16.11.43:2181,172.16.11.46:2181,172.16.11.60:2181,172.16.11.67:2181,172.16.11.73:2181
Group.id=backup-mirror-consumer-group
2. Create a configuration file Mirror_producer.config
IP of Zookeeper,kafka IP write-to-end cluster
zookeeper.connect=172.17.1.159:2181,172.17.1.160:2181
metadata.broker.list=172.17.1.159:9092,172.17.1.160:9092
3. Synchronize commands
$KAFKA _home/bin/kafka-run-class.sh kafka.tools.MirrorMaker--consumer.config sourceclusterconsumer.config-- Num.streams 2--producer.config targetclusterproducer.config--whitelist= ". *"
Detailed parameters
1. White list (whitelist) blacklist (blacklist)
Mirror-maker accepts the whitelist and blacklist of the exact specified sync topic. Using the Java standard Regular expression, for convenience, the comma (', ') is compiled into the Java Regular (' | ').
2. Producer Timeout
To support high throughput, you might want to use the asynchronous built-in producer and set the built-in producer to block mode (QUEUE.ENQUEUETIMEOUT.MS=-1). This guarantees that the data (messages) will not be lost. Otherwise, the asynchronous producer default Enqueuetimeout is 0, and if the producer internal queue is full, the data (messages) is discarded and a queuefullexceptions exception is thrown. For the producer of blocking mode, if the internal queue is full, it will wait, thus effectively control the internal consumer consumption speed. You can open producer's Trace logging and view the remaining amount of the internal queue at any time. If the internal queue of the producer is full for a long time, this means that for mirror-maker, pushing the message back to the target Kafka cluster or writing the message to disk is a bottleneck.
For detailed configuration of KAFKA producer synchronous Async, refer to the $kafka_home/config/producer.properties file. Focus on the two fields of Producer.type and queue.enqueueTimeout.ms.
3. Producer retry attempts (retries)
If you use Broker.list in the producer configuration, you can set the number of retries to fail when the data is published. The retry parameter is used only when using broker.list, because the broker is re-selected when retrying.
4. Number of Producer
By setting the-num.producers parameter, you can use a producer pool to increase the throughput of mirror maker. The producer on the broker that accepts the data (messages) is handled using only a single thread. Even if you have multiple consumption streams, throughput will be limited when producer processing requests.
5. Number of consumption streams (consumption streams)
Use-num.streams to specify the number of threads for consumer. Note that if you start multiple mirror maker processes, you may need to look at the distribution of their partitions in the source Kafka cluster. If the number of consumption flows (consumption streams) on each mirror maker process is too large, some consumer processes will be put in an idle state if they do not own any partition, mainly because of the consumer load balancing algorithm.
6. Shallow iteration (shallow iteration) and producer compression
We recommend that you turn on shallow iterations (shallow iteration) in the consumer of mirror maker. This means that mirror maker's consumer does not decompress the compressed message set (Message-sets), but synchronizes the captured message set data directly to producer.
If you turn on shallow iterations (shallow iteration), you must turn off producer compression in mirror maker, otherwise the message set (Message-sets) will be compressed repeatedly.
7. Socket buffer sizes for Consumer and source Kafka cluster (source cluster)
Mirroring is often used in cross-cluster scenarios, you may want to optimize communication latency and specific hardware performance bottlenecks for internal clusters with some configuration options. In general, you should set a high value for the consumer socket.buffersize in Mirror-maker and the socket.send.buffer of the source cluster broker. In addition, the fetch.size of the consumer (consumer) in Mirror-maker should set a higher value than socket.buffersize. Note that the socket buffer size (socket-sized size) is the parameter of the operating system network layer. If you enable trace-level logging, you can check the actual received buffer size (buffer sizes) to determine whether the operating system's network layer is tuned.
4. How to check Mirrormaker health
The Consumer Offset checker tool can be used to check the consumption progress of the mirror to the source cluster. For example:
bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker--group kafkamirror--zkconnect localhost:2181--topic Test-topic
kafkamirror,topic1,0-0 (Group,topic,brokerid-partitionid)
Owner = kafkamirror_jkoshy-ld-1320972386342-beb4bfc9-0
Consumer offset = 561154288
= 561,154,288 (0.52G)
Log size = 2231392259
= 2,231,392,259 (2.08G)
Consumer lag = 1670237971
= 1,670,237,971 (1.56G)
BROKER INFO
0-127.0.0.1:9092
Note that the –zkconnect parameter needs to be specified to the zookeeper of the source cluster. In addition, if the specified topic is not specified, all topic information under the current consumer group is printed.
5.kafka Disk IO High resolution method used
Problem: Kafka disk IO is too high
We have 5 Kafka machines on the production platform and 2 disks per machine for parition
Recently discovered that the disk IO used by Kafka is very high, affecting the performance of the production-side push data
At first thought was due to a push log topic, because the push data per second about about 2w,
This topic was later migrated to other Kafka clusters or not seen.
The final iotop discovery is actually caused by zookeeper persistence.
Zookeeper is also written to the disk used by Kafka when it is persisted.
Use this issue to illustrate a few questions
1.kafka with zookeeper, and other applications familiar to us such as Solrcloud Codis otter not quite the same
The general use of zookeeper is to manage cluster nodes, while Kafka with Zookeeper is the core, both production and consumption will go
Link Zookeeper get the response information
Production end through link zookeeper get topic all use those parition, each parition copy of leader is that
Consumer end link Zookeeper get offset, consumer consumption will operate on the zookeeper data modification, the operation of IO
Very often
Workaround:
Prohibit zookeeper from doing persistent operations
Add a row to the configuration file
Forcesync=no
Problem solving
This article from "Expect volume synchronization data" blog, declined reprint!
Summary of daily work experience of Kafka cluster in mission 800 operation and Maintenance summary