Kafka cross-cluster synchronization solution-Kafka's built-in producer maker Tool
This solution solves problems related to Kafka cross-cluster synchronization and creation of Kafka cluster images. It is mainly implemented using the built-in producer maker tool of Kafka.
A Kafka image is a copy of an existing Kafka cluster. Shows how to use the mongomaker tool to create an image from the source Kafka cluster to the target Kafka cluster. This tool uses Kafka consumer to consume data from the source Kafka cluster, and then pushes the data to the target Kafka cluster through a built-in Kafka producer.
I. How to create an image
It is relatively easy to create an image by using mongomaker. After setting up the target Kafka cluster, you only need to start the mirror-Maker program. One or more consumer configuration files and one producer configuration file are required. whitelist and blacklist are optional. In the consumer configuration, specify the zookeeper of the source Kafka cluster, and specify the zookeeper (or broker. list) of the target cluster in the producer configuration ).
1 |
Kafka-run-class.sh Kafka. Tools. Consumer maker -- consumer. config sourcecluster1consumer. config -- consumer. config consumer. config -- num. Streams 2 -- producer. config targetclusterproducer. config -- whitelist = ".*" |
For example, if you need to create an image for the S cluster and the target cluster T has been set up, the simple method is as follows:
1. Create a consumer configuration file: sourceclusterconsumer. config
12 |
Zk. Connect = szk0: 2181, szk1: 2181, szk2: 2181 groupid = test-mirror-consumer-group |
2. Create the producer configuration file: targetclusterproducer. config
1 |
Zk. Connect = MAID: 2181, tzk1: 2181 |
3. Create a STARTUP script: Start. Sh
1 |
$ Kafka_home/bin/kafka-run-class.sh Kafka. Tools. Consumer maker -- consumer. config sourceclusterconsumer. config -- num. Streams 2 -- producer. config targetclusterproducer. config -- whitelist = ".*" |
4. Execute the script
Run start. Sh to view the running status through the log information. The synchronized data is displayed in log. dir of the target Kafka cluster.
Ii. parameter description of consumer maker
1 |
$ Kafka_home/bin/kafka-run-class.sh Kafka. Tools. Producer maker -- Help |
Run the preceding command to view the description of each parameter:
1. whitelist blacklist)
Mirror-maker allows you to precisely specify the whitelist and blacklist of topics to be synchronized. Java standard regular expressions are used. For convenience, commas (,) are compiled into ('|') in Java regular expressions ').
2. Producer timeout
To support high throughput, you 'd better use Asynchronous built-in producer and set the built-in producer to blocking mode (queue. enqueuetimeout. Ms =-1 ). This ensures that the data (messages) will not be lost. Otherwise, the default enqueuetimeout value of asynchronous producer is 0. If the internal queue of the producer is full, the data (messages) will be discarded and a queuefullexceptions exception will be thrown. For the producer in the blocking mode, if the internal queue is full, it will wait until the consumption speed of the built-in consumer is effectively controlled. You can open the trace logging of the producer to view the remaining quantity of internal queues at any time. If the internal queue of the producer is in full state for a long time, it indicates that for mirror-maker, pushing messages back to the target Kafka cluster or writing messages to the disk is a bottleneck.
For detailed configuration of synchronous and asynchronous producer of Kafka, see the $ kafka_home/config/producer. properties file. Follow these two fields: producer. Type and queue. enqueuetimeout. Ms.
3. retries)
If you use broker. List in the producer configuration, you can set the number of retries when the published data fails. The retry parameter is only used when broker. List is used, because the broker is re-selected during retry.
4. Number of producer instances
By setting the-num. Producers parameter, you can use a producer pool to increase the throughput of the mirror maker. The producer on the broker that accepts data (messages) is processed only by a single thread. Even if you have multiple consumption streams, the throughput will be limited when the producer processes the request.
5. Number of consumer streams
You can use-num. streams to specify the number of consumer threads. Note that if you start multiple mirror maker processes, you may need to check the distribution of partitions in the source Kafka cluster. If there are too many consumption streams (consumption streams) on each mirror maker process, some consumption processes will be idle if they do not have the consumption permission for any partition, the main reason is the load balancing algorithm of consumer.
6. Shallow iteration and producer Compression
We recommend that you enable shallow iteration in the consumer of the mirror maker ). This means that the consumer of mirror maker does not decompress the compressed Message sets, but directly synchronizes the acquired message set data to the producer.
If shallow iteration is enabled, you must disable the compression function of the producer in the mirror maker. Otherwise, the message set will be compressed repeatedly.
7. Socket buffer sizes of consumer and source Kafka Cluster
Images are often used in cross-cluster scenarios. You may want to use some configuration options to optimize the communication latency of internal clusters and specific hardware performance bottlenecks. In general, you should set a high value for the socket. buffersize of consumer in mirror-maker and the socket. Send. Buffer of the source cluster broker. In addition, the fetch. Size of the consumer (consumer) in mirror-maker should be set to a value higher than socket. buffersize. Note that the socket buffer size is a parameter at the operating system network layer. If you enable trace-level logs, you can check the buffer size actually received to determine whether to adjust the network layer of the operating system.
Iii. How to check the running status of the maker
The consumer offset checker tool can be used to check the consumption progress of the image to the source cluster. For example:
1234567891011 |
Bin/kafka-run-class.sh Kafka. tools. consumeroffsetchecker -- group kafkamirror -- zkconnect localhost: 2181 -- Topic test-topickafkamirror, topic1, 0-0 (group, topic, brokerid-partitionid) owner = KafkaMirror_jkoshy-ld-1320972386342-beb4bfc9-0 consumer offset = 561154288 = 561,154,288 (0.52g) log size = 2231392259 = 2,231,392,259 (2.08g) Consumer Lag = 1670237971 = 1,670,237,971 (1.56g) Broker info0-> 127.0.0.1: 9092 |
Note: The-zkconnect parameter must be specified to the zookeeper of the source cluster. In addition, if the specified topic is not specified, the information of all topics in the current consumer group is printed.
References
- Http://kafka.apache.org/documentation.html#configuration
- Https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring+)