Kafka (consumer group)

Last Update:2016-12-26 Source: Internet

Author: User

Tags compact

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Have always wanted to write a little about Kafka consumer, especially about the new version of the consumer Chinese information is very few. Recently, the Kafka Community Mail group has been discussing whether to formally use the new version consumer replace the old version, I also think the time is ripe, so write this article to discuss and summarize the new version consumer a little design concept, I hope to be able to consumer this thing clear, Thus to the majority of users to help.

Before I start, I'd like to take a moment to clarify some of the concepts and terminology, which would be a great place for us to discuss below. In addition, please forgive this article a bit long, after all, to discuss a lot of things, although already deleted a lot of too much detail things.

I. Clarification of misunderstanding and clear concept

1 version of Kafka

A lot of people in the Kafka Chinese community (for the group to do a propaganda, QQ number: 162272557) The beginning of the question is often like this: "I use the Kafka version is 2.10/2.11, now encounter a strange problem ...." "No offense, but the 2.10/2.11 here is not a Kafka version, but a Scala version of compiling Kafka. Kafka's server-side code is written in the Scala language, with 3 versions of the Scala mainstream currently 2.10, 2.11, and 2.12, respectively. In fact Kafka now each pull request has automatically added these three versions of the check. is one of my pull request, you can see this fix will use 3 Scala version to do the compilation check:

The current widely used version of Kafka should be these three large versions: 0.8.x, 0.9.x and 0.10.*. These three editions have changed a lot for consumer and consumer group, and we'll talk about them later.

2 new version VS old version

"Why is my kafkaoffsetmonitor unable to monitor the offset?" "-This is the most problem I have seen in the Kafka Chinese community, no one! In fact, Kafka 0.9 began to provide a new version of the consumer and consumer group, the displacement of the management and preservation mechanism has changed a lot-the new version consumer default will no longer save displacement to zookeeper, At present, Kafkaoffsetmonitor has not responded to this change (although many people are asking them to change, see HTTPS://GITHUB.COM/QUANTIFIND/KAFKAOFFSETMONITOR/ISSUES/79), So it's probably because you're using a new version of consumer that you can't see. As for the old and new versions, here is a unified explanation: kafka0.9 before the consumer was written in Scala, the package name structure is kafka.consumer.*, divided into high-level consumer and low-level consumer two kinds. Our well-known consumerconnector, Zookeeperconsumerconnector and Simpleconsumer are available in this version. Starting with version 0.9, Kafka provides a Java version of consumer, the package name structure is o.a.k.clients.consumer.*, and the familiar classes include Kafkaconsumer and Consumerrecord. The new version of consumer can be deployed separately, eliminating the need to rely on server-side code.

Ii. consumer groups (Consumer Group)

1 What is a consumer group

In fact, there is too much information on the internet for the popularization of these basic concepts. I should not have done it again, but for the sake of the completeness of this article, I would like to devote some space to the consumer group, at least to say what I understand. It is worth mentioning that since we are basically only discussing consumer group today, we do not have much discussion about individual consumers.

What is consumer group? Word, consumer group is a scalable and fault-tolerant consumer mechanism provided by Kafka. Since it is a group, there must be multiple consumer or consumer instances within the group (consumer instance), which share a common ID, the group ID. All the consumers within the group are coordinated to consume all the partitions (partition) of the subscription topic (subscribed topics). Of course, each partition can only be consumed by one consumer within the same consumer group. (in the article on the Internet, here are a variety of dazzling and colorful pictures will be immediately thrown, I do not draw here, please forgive). Personally, it would be nice to understand consumer group by remembering these three features:

Consumer group can have one or more consumer Instance,consumer instance can be a process, or it can be a thread
Group.id is a string that uniquely identifies a consumer group
Each partition under the topic that is subscribed under consumer group can only be assigned to one consumer under a group (of course, the partition can also be assigned to another group)

2 Consumer position (consumer position)

Consumers in the process of consumption need to record how much data they consume, that is, the consumption of location information. In Kafka, this position information has a special term: displacement (offset). Many messaging engines keep this part of the information on the server side (broker side). The benefit of this is of course simple to implement, but there are three main issues: 1. The broker becomes stateful and can affect scalability; 2. A response mechanism (acknowledgement) needs to be introduced to confirm consumption success. 3. In order to save many consumer offset information, it is necessary to introduce complex data structure, resulting in waste of resources. And Kafka chose a different way: each consumer group to save its own displacement information, then only need a simple integer representation of the position is enough, and can introduce checkpoint mechanism to periodically persist, simplifying the implementation of the response mechanism.

3-bit shift management (offset management)

3.1 Auto vs Manual

Kafka default is to help you automatically submit the displacement (Enable.auto.commit = True), you can of course choose to manually submit the displacement to achieve their own control. In addition, Kafka will periodically save the group's consumption to make an offset map, as shown in:

indicates the current consumption of the Test-group group.

3.2-bit Move commit

The old version of the displacement is submitted to the zookeeper, the figure is not drawn, in short, the directory structure is:/CONSUMERS/<GROUP.ID>/OFFSETS/<TOPIC>/<PARTITIONID> But zookeeper is not really suitable for high-volume read and write operations, especially for write operations. So Kafka offers another solution: increase the __consumeroffsets topic, write the offset information to the topic, and get rid of the dependency on zookeeper (the thing that holds the offset). The message in __consumer_offsets holds the offset information submitted by each consumer group at a time. In the case of consumer group, the format is probably as follows:

__consumers_offsets Topic has a compact strategy that allows it to always keep up-to-date displacement information, controlling both the overall log capacity of the topic and the purpose of saving up-to-date offset. The specific principles of the compact are described in: Log compaction

For a question about which partition each group is saved to __consumers_offsets, see this article: Kafka How to read offset topic content (__consumer_offsets)

4 Rebalance

4.1 What is rebalance?

Rebalance is essentially a protocol that stipulates how all consumer under a consumer group can agree to allocate each partition of a subscription topic. For example, there are 20 consumer under a group that subscribe to a topic with 100 partitions. Normally, Kafka will allocate 5 partitions per consumer on average. This allocation process is called rebalance.

4.2 When's rebalance?

This is also a question that is often mentioned. There are three trigger conditions for rebalance:

Group member changes (new consumer join group, existing consumer active leave group or existing consumer crash-the difference between the two will be discussed later)
The number of subscription topics is changed-this is of course possible, and if you subscribe using a regular expression, the new topic that matches the regular expression will trigger rebalance
Changes to the number of partitions subscribed to the topic

4.3 How do I assign a partition within a group?

It was mentioned that all consumer under the group would coordinate and participate in the distribution together, how was this done? Kafka the new version of consumer provides two allocation policies by default: Range and Round-robin. Of course Kafka uses a pluggable allocation strategy, and you can create your own allocator to implement different allocation strategies. In fact, since the current range and Round-robin two allocators have some drawbacks, the Kafka community has proposed a third allocator to achieve a more equitable distribution strategy, but is still in development. All we need to know here is that the consumer group has already helped us get the partition assignment for the subscription topic done by default.

For example, assuming that there are currently two consumer:a and B under a consumer group, when the third member joins, Kafka will trigger rebalance and re-allocate partitions for a, B, and C based on the default allocation policy, as shown in:

4.4 Who will perform rebalance and consumer group management?

Kafka provides a role: Coordinator to perform administration for consumer group. Frankly speaking, Kafka is a long story about the design and modification of coordinator. The latest version of coordinator also differs greatly from the original design. I just want to mention two more big changes here.

The first is the 0.8 version of coordinator, when Coordinator was dependent on zookeeper to implement the management of the consumer group. Coordinator monitors zookeeper/consumers/<group>/ids's child node changes and/brokers/topics/<topic> data changes to determine if rebalance is required. Each consumer under the group decides which partitions to consume and takes its own decisions in zookeeper/consumers/<group>/owners/<topic>/<partition > Under registration. Obviously, this kind of scheme relies on the help of zookeeper, and each consumer is decided alone, there is no kind of "everyone belongs to a group, to negotiate to do things" spirit.

Based on these potential drawbacks, the 0.9 version of Kafka improved the coordinator design, proposing that group coordinator--each consumer group be assigned one such coordinator for Group management and displacement management. This group coordinator more responsibility than the original, such as Group member management, displacement submission protection mechanism. When the first consumer of the new version consumer group starts, it goes to and Kafka server to determine who is the coordinator of their group. All members of the group will then communicate with the Coordinator in a coordinated communication. Obviously, this coordinator design no longer needs zookeeper, and can be greatly improved in performance. We will discuss the latest version of the coordinator design in all the sections that follow.

4.5 How to determine coordinator?

The above is a brief discussion of the new coordinator design, then consumer group how to determine their own coordinator who? In simple terms, it is divided into two steps:

Determines which partition of the consumer group displacement information is written to __consumers_offsets. Specific calculation formula:
- 　　__consumers_offsets partition# = Math.Abs (Groupid.hashcode ()% groupmetadatatopicpartitioncount) Note: Groupmetadatatopicpartitioncount is specified by Offsets.topic.num.partitions and is 50 partitions by default.
The broker where the partition leader is selected coordinator

4.6 Rebalance Generation

The generational collection of the JVM GC is the word (strictly generational), which I translated into "session" here, which represents a member after rebalance, primarily for the protection of consumer group, which isolates invalid offset submissions. For example, the previous consumer member was unable to submit the displacement to the new term of the consumer group. There are times when we can see Illegal_generation's mistakes, that is, Kafka is complaining about the matter. Each time the group is rebalance, the Generation will be added 1, indicating that the group has entered a new version, as shown in: Generation 1 O'Clock Group has 3 members, followed by Member 2 exit Group, Coordinator triggers Rebalance,consumer group to enter Generation 2, after which member 4 joins, again triggering rebalance,group into generation 3.

4.7 Protocol (Protocol)

As I said earlier, rebalance is essentially a set of protocols. Group and coordinator used it to complete the group's rebalance. Currently, Kafka provides 5 protocols to deal with issues related to consumer group coordination:

Heartbeat Request: Consumer need to send a heartbeat to coordinator regularly to show that he is alive
Leavegroup Request: Active tell coordinator I want to leave consumer group
Syncgroup Request: Group leader to inform all members of the group of the allocation plan
Joingroup Request: Member request join group
Describegroup Request: Displays all information about the group, including member information, protocol name, allocation scheme, subscription information, and so on. Typically, the request is for the administrator to use

Coordinator in the rebalance when the main use of the previous 4 kinds of requests.
4.8 liveness

How does consumer prove to coordinator that he is still alive? Send heartbeat requests to coordinator by timing. If the set timeout is exceeded, then coordinator thinks the consumer is dead. Once coordinator thinks a consumer is hanging, it will open a new round of REBALANCE and add "response" to the current heartbeat rebalance_in_progress of other consumer. Tell other consumer: Sorry everyone, you re-apply to join the group!

4.9 Rebalance Process

Finally speaking of the consumer group to implement the rebalance of the specific process. Many user estimates are also very interested in the working mechanism within consumer. Here is a discussion with you. Of course I must make it clear that the premise of rebalance is that coordinator has been determined.

Overall, rebalance is divided into 2 steps: Join and Sync

1 joins, as the name implies, join groups. In this step, all the members send a Joingroup request to the Coordinator, requesting the incoming group. Once all the members have sent the Joingroup request, Coordinator will select a role from consumer as leader and send the group membership information and subscription information to leader--note leader and coordinator are not a concept. Leader is responsible for the formulation of the consumption distribution program.

2 Sync, this step leader begins to allocate a consumption plan, which is which consumer is responsible for which partition to consume which topic. Once the allocation is complete, leader will encapsulate the scheme into the Syncgroup request to Coordinator, and the non-leader will also send a syncgroup request, but the content is empty. Coordinator after receiving the distribution programme, the programme will be crammed into the syncgroup response and sent to each consumer. All the members in the group know which partitions they should consume.

Let's take a few pictures to illustrate, first, the process of joining the group:

It is worth noting that before coordinator collects all member requests, it puts the received request in a place called purgatory (Purgatory). Remember that there is an article in the country to prove that Kafka developers are very literary fan, writing is also more interesting, there is an interest to search.
Then is the process of distributing the allocation scheme, which is the Syncgroup request:

Attention!! The partition allocation scheme for the consumer group is performed on the client side! Kafka the delegation of authority to the client is primarily because it provides greater flexibility. For example, I can implement a rack-aware (rack-aware) allocation scheme similar to Hadoop, which is to select the partition data under the same rack for consumer and reduce the overhead of network transmission. Kafka gives you two allocation policies by default: Range and Round-robin. Since this is not the focus of this article, here is no longer detailed, you just have to remember that you can override consumer parameters: Partition.assignment.strategy to implement their own allocation strategy.

4.10 consumer group State Machine

Like many Kafka components, group has also made a state machine to indicate the flow of groups ' status. Coordinator the consumer group according to the status opportunity, as shown in the manual drawing based on the code comments, forgive me.

Individual states in a simple description:

Dead: There is no final state for any members within the group, and the group's metadata has been removed by coordinator. This state response to various requests is a response:unknown_member_id
Empty: There are no members in the group, but the displacement information has not expired. This state can only respond to joingroup requests
Preparingrebalance: Group ready to open new rebalance, waiting for members to join
Awaitingsync: Waiting for leader consumer to pass allocation scheme to individual members
Stable:rebalance done! Can start to spend the ~

As for the process conditions and action between the States, this is not a specific deployment.

Three, rebalance scene analysis

It explains in detail how consumer group performs rebalance, and may still have some foggy. This part of the three important scenes of the detailed timing of the expansion, further deepening the consumer group's internal principles. Since the graphs are more intuitive, all descriptions are given in the form of graphs.

1 new Members Join group (member join)

2 Group member crashes (member failure)

As mentioned earlier, the group member crashes and the team members active leave are two different scenarios. Because members do not actively inform coordinator of this when they are crashing, coordinator may need a complete session.timeout cycle to detect such crashes, which inevitably results in consumer lag. It can be said that leaving the group is the initiative to initiate rebalance, while the crash is to initiate rebalance passively. Okay, direct:

3 Active Group members (Member leave group)

4 Commit displacement (member commit offset)

To summarize, this article emphatically discusses the new version of consumer group's internal design principle, especially the interaction between consumer group and coordinator, hoping to help you.

Kafka (consumer group)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More