Relationship between Kafka partitions and consumers

Source: Internet
Author: User

Tag: ing relationship UIL mon push common sig package plugins work

1 .? Preface

We know that the producer sends a message to the topic, and the consumer subscribes to the topic (subscribed in the name of the consumer group). The topic is partitioned, and the message is stored in the partition, so in fact, when the producer sends a message to the partition and the consumer reads the message from the partition, the question is, which partition does the producer ship the message? How do consumer instances in a consumer group distribute partitions? Next, let's look around these two questions.

2 .? Number of partitions in a topic

You can specify a global number of partitions in the server. properties configuration file. This is the default setting for the number of partitions under each topic. The default value is 1.

Of course, you can also set the number of partitions for each topic. If the number of partitions is not specified during topic creation, the settings in server. properties will be used.

Bin/kafka-topics.sh -- zookeeper localhost: 2181 -- create -- topic my-topic -- partitions 2 -- replication-factor 1
When creating a topic, you can use the -- partitions option to specify the number of partitions of the topic.

[[Email protected] kafka_2.11-2.0.0] # bin/kafka-topics.sh -- describe -- zookeeper localhost: 2181 -- Topic ABC
Topic: ABC partitioncount: 2 replicationfactor: 1 configs:
Topic: ABC partition: 0 leader: 0 replicas: 0 ISR: 0
Topic: ABC partition: 1 leader: 0 replicas: 0 ISR: 0
3 .? Producer and partition

First, a question is raised: Is there a rule for the producer to ship messages to the partition? If so, how does one determine the partition to which a message should be delivered?

3.1 .? Default partition Policy

The default partitioning strategy:

If a partition is specified in the record, use it
If no partition is specified but a key is present choose a partition based on a hash of the key
If no partition or key is present choose a partition in a round-robin fashion
Org. Apache. Kafka. Clients. Producer. internals. defaultpartitioner

The default partition policy is:

If a partition is specified during message sending, the message is delivered to the specified partition.
If no partition is specified but the Message key is not empty, select a partition based on the hash value of the key.
If no partition is specified and the Message key is empty, select a partition by polling.

/** * Compute the partition for the given record. * * @param topic The topic name * @param key The key to partition on (or null if no key) * @param keyBytes serialized key to partition on (or null if no key) * @param value The value to partition on or null * @param valueBytes serialized value to partition on or null * @param cluster The current cluster metadata */public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {    List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);    int numPartitions = partitions.size();    if (keyBytes == null) {        int nextValue = nextValue(topic);        List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);        if (availablePartitions.size() > 0) {            int part = Utils.toPositive(nextValue) % availablePartitions.size();            return availablePartitions.get(part).partition();        } else {            // no partitions are available, give a non-available partition            return Utils.toPositive(nextValue) % numPartitions;        }    } else {        // hash the keyBytes to choose a partition        return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;    }}

The source code can prove this point more.

4 .? Partition and consumer
A consumer subscribes to a topic in the name of a group. The topic has multiple partitions and the consumer group has multiple consumer instances. What is the relationship between the consumer instance and the shard?

In other words, how is the allocation relationship determined when every consumer in the group is responsible for those partitions?

At the same time, a message can only be consumed by one consumer instance in the group.

The consumer group subscribes to this topic, which means that all the partitions under the topic will be consumed by the consumers in the group. If it is based on the subordination, each partition under the topic only belongs to one consumer in the group. It is impossible for two consumers in the group to take charge of the same partition.

The problem arises. If the number of partitions is greater than or equal to the number of consumer instances in the group, there is no problem. A consumer is responsible for multiple partitions (PS: of course, the ideal situation is that the number of the two is equal, which is equivalent to a consumer responsible for a partition); however, if the number of consumer instances is greater than the number of partitions, then follow the Default policy (PS: the default policy is emphasized because you can also customize the policy). Some consumers are redundant, and the messages are in idle state directly.

Then again, if multiple consumers are responsible for the same partition, what is the problem?

We know that Kafka is designed to ensure the order of messages in a partition, that is, the order of messages in a partition, so what order does the consumer see during consumption? To achieve this, we must first ensure that the message is actively pulled by the consumer (pull ), second, ensure that only one consumer is responsible for a partition. If the two consumers are responsible for the same partition, it means that the two consumers read messages in the partition at the same time. Because the consumer can control the offset of the read message, it is possible that C1 reads 2, when C1 reads 1, C1 has not been processed, and C2 has read 3, it will cause a lot of waste, because it is equivalent to reading the same message through multiple threads, which will lead to repeated message processing, the order of messages cannot be guaranteed, which is the same as that of push.

4.1 .? Consumer partition Allocation Policy

Org. Apache. Kafka. Clients. Consumer. internals. abstractpartitionassignor

If you use a custom allocation policy, you can inherit the abstractpartitionassignor class. It has three implementations by default.

4.1.1 .?? Range

The implementation class corresponding to the range policy is org. Apache. Kafka. Clients. Consumer. rangeassignor.

This is the default allocation policy.

You can specify the allocation policy through the partition. assignment. strategy parameter in the consumer configuration. Its value is the full path of the class and is an array

/** * The range assignor works on a per-topic basis. For each topic, we lay out the available partitions in numeric order * and the consumers in lexicographic order. We then divide the number of partitions by the total number of * consumers to determine the number of partitions to assign to each consumer. If it does not evenly * divide, then the first few consumers will have one extra partition. * * For example, suppose there are two consumers C0 and C1, two topics t0 and t1, and each topic has 3 partitions, * resulting in partitions t0p0, t0p1, t0p2, t1p0, t1p1, and t1p2. * * The assignment will be: * C0: [t0p0, t0p1, t1p0, t1p1] * C1: [t0p2, t1p2] */

The range policy is based on each topic.

For each topic, we arrange available partitions in numerical order and consumers in dictionary order. Then, divide the number of partitions by the total number of consumers to determine the number of partitions allocated to each consumer. If there is no average partition (PS: not all Division), then the first few consumers will have an additional partition.

In short,

1. The range allocation policy is for the topic (PS: that is, the partition refers to the partition of a topic, and the consumer value refers to the consumer instance in the consumer group that subscribes to the topic)

2. First, sort the partitions in the order of numbers, and consumers in the Lexicographic Order of consumer names.

3. Divide the total number of partitions by the total number of consumers. If the partition can be exhausted, the consumers are happy with the average distribution. If the Division is not complete, the consumers located before the sorting will be responsible for one partition.

For example, assume that there are two consumers C0 and C1, two themes T0 and T1, and each topic has three partitions. The partition is like this: t0p0, t0p1, t0p2, t1p0, t1p1, t1p2

Then, based on the above information, the final consumer allocates partitions as follows:

C0: [t0p0, t0p1, t1p0, t1p1]

C1: [t0p2, t1p2]

Why is this?

For topic T0, C0 is responsible for P0 and P1, and C1 is responsible for P2. For topic T2, this is also the result.

The above process is graphically represented like this:

Read the code for better understanding:

Public Map <string, list <topicpartition> assign (Map <string, integer> partitionspertopic, Map <string, subcategory> subscriptions) {// map a topic to a consumer <string, list <string> consumerspertopic = consumerspertopic (subscriptions); Map <string, list <topicpartition> assignment = new hashmap <> (); For (string memberid: subscriptions. keyset () assignment. put (memberid, new arraylist <topicpartition> (); For (M AP. entry <string, list <string> topicentry: consumerspertopic. entryset () {string topic = topicentry. getkey (); // topic list <string> consumersfortopic = topicentry. getvalue (); // consumer list // partitionspertopic indicates the ing between the topic and the number of partitions. // obtain the number of partitions under the topic. Integer numpartitionsfortopic = partitionspertopic. get (topic); If (numpartitionsfortopic = NULL) continue; // The Consumer orders collections in lexicographically. sort (consumersfortopic); // Number of partitions divided by consumption Number of workers int numpartitionsperconsumer = numpartitionsfortopic/consumersfortopic. size (); // modulo. The remainder is the int consumerswithextrapartition = numpartitionsfortopic % consumersfortopic. size (); List <topicpartition> partitions = abstractpartitionassignor. partitions (topic, numpartitionsfortopic); For (INT I = 0, n = consumersfortopic. size (); I <n; I ++) {int start = numpartitionsperconsumer * I + math. min (I, Consumerswithextrapartition); int length = numpartitionsperconsumer + (I + 1> consumerswithextrapartition? 0: 1); // allocate the partition assignment. get (consumersfortopic. get (I )). addall (partitions. sublist (start, start + length) ;}return assignment ;}

4.1.2. roundrobin (polling)

The specific implementation of the roundronbin allocation policy is org. Apache. Kafka. Clients. Consumer. roundrobinassignor.

/** * The round robin assignor lays out all the available partitions and all the available consumers. It * then proceeds to do a round robin assignment from partition to consumer. If the subscriptions of all consumer * instances are identical, then the partitions will be uniformly distributed. (i.e., the partition ownership counts * will be within a delta of exactly one across all consumers.) * * For example, suppose there are two consumers C0 and C1, two topics t0 and t1, and each topic has 3 partitions, * resulting in partitions t0p0, t0p1, t0p2, t1p0, t1p1, and t1p2. * * The assignment will be: * C0: [t0p0, t0p2, t1p1] * C1: [t0p1, t1p0, t1p2] * * When subscriptions differ across consumer instances, the assignment process still considers each * consumer instance in round robin fashion but skips over an instance if it is not subscribed to * the topic. Unlike the case when subscriptions are identical, this can result in imbalanced * assignments. For example, we have three consumers C0, C1, C2, and three topics t0, t1, t2, * with 1, 2, and 3 partitions, respectively. Therefore, the partitions are t0p0, t1p0, t1p1, t2p0, * t2p1, t2p2. C0 is subscribed to t0; C1 is subscribed to t0, t1; and C2 is subscribed to t0, t1, t2. * * Tha assignment will be: * C0: [t0p0] * C1: [t1p0] * C2: [t1p1, t2p0, t2p1, t2p2] */

The Round-Robin allocation policy is based on all available consumers and all available partitions.

The biggest difference from the previous range policy is that it is no longer limited to a topic.

If the subscription of all consumer instances is the same, it is best to achieve unified allocation and balanced allocation.

For example, assume that there are two consumers C0 and C1, two themes T0 and T1, and each topic has three partitions: t0p0, t0p1, t0p2, t1p0, t1p1, t1p2

The final allocation result is as follows:

C0: [t0p0, t0p2, t1p1]

C1: [t0p1, t1p0, t1p2]

It is shown in the following figure:

Assume that the topics subscribed by each consumer in the group are different, and the allocation process still considers each consumer instance in polling mode. However, if no topic is subscribed, the instance is skipped. Of course, the allocation will be unbalanced.

What does it mean? That is to say, a consumer group is a logical concept. The same group means that the shard can only be consumed by one consumer instance at the same time. In other words, the same group means that one shard can only be allocated to one consumer in the group. In fact, you can subscribe to different topics in the same group.

For example, suppose there are three theme T0, T1, T2; where t0 has one partition P0, T1 has two partitions P0 and P1, T2 has three partitions P0, p1 and P2; there are three consumers: C0, C1 and C2; C0 subscribe to T0, C1 subscribe to T0 and T1, C2 subscribe to T0, T1 and T2. In this case, C0 is responsible for the round-robin allocation.

First, it must be the round-robin method. Second, for example, there are topics T0, T1, and T2. They have 1, 2, and 3 partitions respectively, that is, t0 has 1 partition, t1 has two partitions, T2 has three partitions, and three consumers subscribe to T0, C1 subscribe T0, T1, C2 subscribe T0, T1, t2; then, C0 should be responsible for t0p0, C1 should be responsible for t1p0, and C2 should be responsible for the rest.

The above process is graphically represented as follows:

Why is the final result

C0: [t0p0]

C1: [t1p0]

C2: [t1p1, t2p0, t2p1, t2p2]

This is because C0 is responsible for t0p1 according to round-robin, t1p0 is responsible for C1, C2 is only responsible for t1p1 because only C2 subscribes to T2, so C2 is responsible for all partitions of T2, this is the result.

After a bit of detail, we can find that the result is the same as that of range allocation in this case.

5 .? Test code

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">    <modelVersion>4.0.0</modelVersion>    <groupId>com.cjs.example</groupId>    <artifactId>kafka-demo</artifactId>    <version>0.0.1-SNAPSHOT</version>    <packaging>jar</packaging>    <name>kafka-demo</name>    <description></description>    <parent>        <groupId>org.springframework.boot</groupId>        <artifactId>spring-boot-starter-parent</artifactId>        <version>2.0.5.RELEASE</version>        <relativePath/> <!-- lookup parent from repository -->    </parent>    <properties>        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>        <java.version>1.8</java.version>    </properties>    <dependencies>        <dependency>            <groupId>org.springframework.boot</groupId>            <artifactId>spring-boot-starter-web</artifactId>        </dependency>        <dependency>            <groupId>org.springframework.kafka</groupId>            <artifactId>spring-kafka</artifactId>        </dependency>        <dependency>            <groupId>org.springframework.boot</groupId>            <artifactId>spring-boot-starter-test</artifactId>            <scope>test</scope>        </dependency>    </dependencies>    <build>        <plugins>            <plugin>                <groupId>org.springframework.boot</groupId>                <artifactId>spring-boot-maven-plugin</artifactId>            </plugin>        </plugins>    </build></project>?package com.cjs.kafka.producer;import org.apache.kafka.clients.producer.*;import java.util.Properties;public class HelloProducer {    public static void main(String[] args) {        Properties props = new Properties();        props.put("bootstrap.servers", "192.168.1.133:9092");        props.put("acks", "all");        props.put("retries", 0);        props.put("batch.size", 16384);        props.put("linger.ms", 1);        props.put("buffer.memory", 33554432);        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");        Producer<String, String> producer = new KafkaProducer<String, String>(props);        for (int i = 0; i < 100; i++) {            producer.send(new ProducerRecord<String, String>("abc", Integer.toString(i), Integer.toString(i)), new Callback() {                @Override                public void onCompletion(RecordMetadata recordMetadata, Exception e) {                    if (null != e) {                        e.printStackTrace();                    }else {                        System.out.println("callback: " + recordMetadata.topic() + " " + recordMetadata.partition() + " " + recordMetadata.offset());                    }                }            });        }        producer.close();    }}
package com.cjs.kafka.consumer;import org.apache.kafka.clients.consumer.ConsumerRecord;import org.apache.kafka.clients.consumer.ConsumerRecords;import org.apache.kafka.clients.consumer.KafkaConsumer;import java.util.Arrays;import java.util.Properties;public class HelloConsumer {    public static void main(String[] args) {        Properties props = new Properties();        props.put("bootstrap.servers", "192.168.1.133:9092");        props.put("group.id", "test");        props.put("enable.auto.commit", "true");        props.put("auto.commit.interval.ms", "1000");//        props.put("partition.assignment.strategy", "org.apache.kafka.clients.consumer.RoundRobinAssignor");        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");        KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);        consumer.subscribe(Arrays.asList("foo", "bar", "abc"));        while (true) {            ConsumerRecords<String, String> records = consumer.poll(100);            for (ConsumerRecord<String, String> record : records) {                System.out.printf("partition = %s, offset = %d, key = %s, value = %s%n", record.partition(), record.offset(), record.key(), record.value());            }        }    }}

Relationship between Kafka partitions and consumers

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.