Kafka cross-cluster synchronization scheme

Source: Internet
Author: User

该方案解决Kafka跨集群同步、创建Kafka集群镜像等相关问题,主要使用Kafka内置的MirrorMaker工具实现。Kafka镜像即已有Kafka集群的副本。展示如何使用MirrorMaker工具创建从源Kafka集群(source cluster)到目标Kafka集群(target cluster)的镜像。该工具通过Kafka consumer从源Kafka集群消费数据,然后通过一个内置的Kafka producer将数据重新推送到目标Kafka集群。

Picture description (max. 50 words)

一、如何创建镜像使用MirrorMaker创建镜像是比较简单的,搭建好目标Kafka集群后,只需要启动mirror-maker程序即可。其中,一个或多个consumer配置文件、一个producer配置文件是必须的,whitelist、blacklist是可选的。在consumer的配置中指定源Kafka集群的Zookeeper,在producer的配置中指定目标集群的Zookeeper(或者broker.list)。

kafka-run-class.sh Kafka.tools.mirrormaker–consumer.config Sourcecluster1consumer.config–consumer.config Sourcecluster2consumer.config–num.streams 2–producer.config targetclusterproducer.config–whitelist= ". *"
For example, you need to create a mirror of the S cluster, and the target cluster T is already set up, as follows:

1. 创建consumer配置文件:sourceClusterConsumer.config

zk.connect=szk0:2181,szk1:2181,szk2:2181
Groupid=test-mirror-consumer-group

    1. Create producer Profile: Targetclusterproducer.config

zk.connect=tzk0:2181,tzk1:2181

    1. To create a startup script: start.sh

$KAFKA _home/bin/kafka-run-class.sh Kafka.tools.mirrormaker–consumer.config sourceclusterconsumer.config– Num.streams 2–producer.config targetclusterproducer.config–whitelist= ". *"

    1. Execute script

      Perform start.sh to view the health status through log information, to the target Kafka cluster Log.dir to see the synchronized data.

      Second, the parameter description of Mirrormaker

$KAFKA _home/bin/kafka-run-class.sh Kafka.tools.mirrormaker–help
Executes the above command to see a description of each parameter:

1. Whitelist (whitelist) blacklist (blacklist) Mirror-maker accept the whitelist and blacklist that specify synchronization topic precisely. Using the Java standard Regular expression, for convenience, the comma (', ') is compiled into the Java Regular (' | '). 2. Producer timeout in order to support high throughput, you'd better use the asynchronous built-in Producer and set the built-in Producer to block mode (QUEUE.ENQUEUETIMEOUT.MS=-1). This guarantees that the data (messages) will not be lost. Otherwise, the asynchronous producer default Enqueuetimeout is 0, and if the producer internal queue is full, the data (messages) is discarded and a queuefullexceptions exception is thrown. For the producer of blocking mode, if the internal queue is full, it will wait, thus effectively control the internal consumer consumption speed. You can open producer's Trace logging and view the remaining amount of the internal queue at any time. If the internal queue of the producer is full for a long time, this means that for mirror-maker, pushing the message back to the target Kafka cluster or writing the message to disk is a bottleneck. For detailed configuration of KAFKA producer synchronous Async, refer to the $kafka_home/config/producer.properties file. Focus on the two fields of Producer.type and queue.enqueueTimeout.ms. 3. Producer Retries (retries) If you use Broker.list in Producer configuration, you can set the number of retries to fail when the data is published. The retry parameter is used only when using broker.list, because the broker is re-selected when retrying. 4. Number of Producer by setting the-num.producers parameter, you can use a Producer pool to increase the throughput of mirror maker. The producer on the broker that accepts the data (messages) is handled using only a single thread. Even if you have multiple consumption streams, throughput will be limited when producer processing requests. 5. Number of consumption streams (consumption streams) use-num.streams to specify the number of threads for consumer. Note that if you start multiple mirror maker processes, you may need to look at their distribution in the source Kafka cluster partitionsCase If the number of consumption flows (consumption streams) on each mirror maker process is too large, some consumer processes will be put in an idle state if they do not own any partition, mainly because of the consumer load balancing algorithm. 6. Shallow iteration (shallow iteration) and producer compression We recommend that you turn on shallow iterations (consumer shallow) in the iteration of mirror maker. This means that mirror maker's consumer does not decompress the compressed message set (Message-sets), but synchronizes the captured message set data directly to producer. If you turn on shallow iterations (shallow iteration), you must turn off producer compression in mirror maker, otherwise the message set (Message-sets) will be compressed repeatedly. 7. Socket buffer sizes images for Consumer and source Kafka clusters (source cluster) are often used in cross-cluster scenarios, and you may want to optimize communication latency and specific hardware performance bottlenecks for internal clusters with some configuration options. In general, you should set a high value for the consumer socket.buffersize in Mirror-maker and the socket.send.buffer of the source cluster broker. In addition, the fetch.size of the consumer (consumer) in Mirror-maker should set a higher value than socket.buffersize. Note that the socket buffer size (socket-sized size) is the parameter of the operating system network layer. If you enable trace-level logging, you can check the actual received buffer size (buffer sizes) to determine whether the operating system's network layer is tuned. Iii. How to verify the Mirrormaker Health consumer The Offset checker tool can be used to check the consumption progress of the mirror to the source cluster. For example:

bin/kafka-run-class.sh Kafka.tools.consumeroffsetchecker–group Kafkamirror–zkconnect Localhost:2181–topic Test-topic
kafkamirror,topic1,0-0 (Group,topic,brokerid-partitionid)
Owner = kafkamirror_jkoshy-ld-1320972386342-beb4bfc9-0
Consumer offset = 561154288
= 561,154,288 (0.52G)
Log size = 2231392259
= 2,231,392,259 (2.08G)
Consumer lag = 1670237971
= 1,670,237,971 (1.56G)
BROKER INFO
0-127.0.0.1:9092
Note that the –zkconnect parameter needs to be specified to the zookeeper of the source cluster. In addition, if the specified topic is not specified, all topic information under the current consumer group is printed.
1-5 years of Java engineers Welcome to join Java Architecture development: jq.qq.com/?_wv=1027&k ...

The Group provides free learning guidance structure materials and free answers

Do not know the problem can be raised in the group after the career planning and interview guidance

At the same time, we can pay more attention to the small series of people to learn progress together

Kafka cross-cluster synchronization scheme

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.