A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Before I start, I'd like to take a moment to clarify some of the concepts and terminology, which would be a great place for us to discuss below. In addition, please forgive this article a bit long, after all, to discuss a lot of things, although already deleted a lot of too much detail things.I. Clarification of misunderstanding and clear concept
1 version of Kafka
A lot of people in the Kafka Chinese community (for the group to do a propaganda, QQ number: 162272557) The beginning of the question is often like this: "I use the Kafka version is 2.10/2.11, now encounter a strange problem ...." "No offense, but the 2.10/2.11 here is not a Kafka version, but a Scala version of compiling Kafka. Kafka's server-side code is written in the Scala language, with 3 versions of the Scala mainstream currently 2.10, 2.11, and 2.12, respectively. In fact Kafka now each pull request has automatically added these three versions of the check. is one of my pull request, you can see this fix will use 3 Scala version to do the compilation check:
The current widely used version of Kafka should be these three large versions: 0.8.x, 0.9.x and 0.10.*. These three editions have changed a lot for consumer and consumer group, and we'll talk about them later.
2 new version VS old version
"Why is my kafkaoffsetmonitor unable to monitor the offset?" "-This is the most problem I have seen in the Kafka Chinese community, no one! In fact, Kafka 0.9 began to provide a new version of the consumer and consumer group, the displacement of the management and preservation mechanism has changed a lot-the new version consumer default will no longer save displacement to zookeeper, At present, Kafkaoffsetmonitor has not responded to this change (although many people are asking them to change, see HTTPS://GITHUB.COM/QUANTIFIND/KAFKAOFFSETMONITOR/ISSUES/79), So it's probably because you're using a new version of consumer that you can't see. As for the old and new versions, here is a unified explanation: kafka0.9 before the consumer was written in Scala, the package name structure is kafka.consumer.*, divided into high-level consumer and low-level consumer two kinds. Our well-known consumerconnector, Zookeeperconsumerconnector and Simpleconsumer are available in this version. Starting with version 0.9, Kafka provides a Java version of consumer, the package name structure is o.a.k.clients.consumer.*, and the familiar classes include Kafkaconsumer and Consumerrecord. The new version of consumer can be deployed separately, eliminating the need to rely on server-side code.
Ii. consumer groups (Consumer Group)
1 What is a consumer group
In fact, there is too much information on the internet for the popularization of these basic concepts. I should not have done it again, but for the sake of the completeness of this article, I would like to devote some space to the consumer group, at least to say what I understand. It is worth mentioning that since we are basically only discussing consumer group today, we do not have much discussion about individual consumers.
What is consumer group? Word, consumer group is a scalable and fault-tolerant consumer mechanism provided by Kafka. Since it is a group, there must be multiple consumer or consumer instances within the group (consumer instance), which share a common ID, the group ID. All the consumers within the group are coordinated to consume all the partitions (partition) of the subscription topic (subscribed topics). Of course, each partition can only be consumed by one consumer within the same consumer group. (in the article on the Internet, here are a variety of dazzling and colorful pictures will be immediately thrown, I do not draw here, please forgive). Personally, it would be nice to understand consumer group by remembering these three features:
2 Consumer position (consumer position)
Consumers in the process of consumption need to record how much data they consume, that is, the consumption of location information. In Kafka, this position information has a special term: displacement (offset). Many messaging engines keep this part of the information on the server side (broker side). The benefit of this is of course simple to implement, but there are three main issues: 1. The broker becomes stateful and can affect scalability; 2. A response mechanism (acknowledgement) needs to be introduced to confirm consumption success. 3. In order to save many consumer offset information, it is necessary to introduce complex data structure, resulting in waste of resources. And Kafka chose a different way: each consumer group to save its own displacement information, then only need a simple integer representation of the position is enough, and can introduce checkpoint mechanism to periodically persist, simplifying the implementation of the response mechanism.
3-bit shift management (offset management)
3.1 Auto vs Manual
Kafka default is to help you automatically submit the displacement (Enable.auto.commit = True), you can of course choose to manually submit the displacement to achieve their own control. In addition, Kafka will periodically save the group's consumption to make an offset map, as shown in:
indicates the current consumption of the Test-group group.
3.2-bit Move commit
The old version of the displacement is submitted to the zookeeper, the figure is not drawn, in short, the directory structure is:/CONSUMERS/<GROUP.ID>/OFFSETS/<TOPIC>/<PARTITIONID> But zookeeper is not really suitable for high-volume read and write operations, especially for write operations. So Kafka offers another solution: increase the __consumeroffsets topic, write the offset information to the topic, and get rid of the dependency on zookeeper (the thing that holds the offset). The message in __consumer_offsets holds the offset information submitted by each consumer group at a time. In the case of consumer group, the format is probably as follows:
__consumers_offsets Topic has a compact strategy that allows it to always keep up-to-date displacement information, controlling both the overall log capacity of the topic and the purpose of saving up-to-date offset. The specific principles of the compact are described in: Log compaction
For a question about which partition each group is saved to __consumers_offsets, see this article: Kafka How to read offset topic content (__consumer_offsets)
4.1 What is rebalance?
Rebalance is essentially a protocol that stipulates how all consumer under a consumer group can agree to allocate each partition of a subscription topic. For example, there are 20 consumer under a group that subscribe to a topic with 100 partitions. Normally, Kafka will allocate 5 partitions per consumer on average. This allocation process is called rebalance.
4.2 When's rebalance?
This is also a question that is often mentioned. There are three trigger conditions for rebalance:
4.3 How do I assign a partition within a group?
It was mentioned that all consumer under the group would coordinate and participate in the distribution together, how was this done? Kafka the new version of consumer provides two allocation policies by default: Range and Round-robin. Of course Kafka uses a pluggable allocation strategy, and you can create your own allocator to implement different allocation strategies. In fact, since the current range and Round-robin two allocators have some drawbacks, the Kafka community has proposed a third allocator to achieve a more equitable distribution strategy, but is still in development. All we need to know here is that the consumer group has already helped us get the partition assignment for the subscription topic done by default.
For example, assuming that there are currently two consumer:a and B under a consumer group, when the third member joins, Kafka will trigger rebalance and re-allocate partitions for a, B, and C based on the default allocation policy, as shown in:
4.4 Who will perform rebalance and consumer group management?
Kafka provides a role: Coordinator to perform administration for consumer group. Frankly speaking, Kafka is a long story about the design and modification of coordinator. The latest version of coordinator also differs greatly from the original design. I just want to mention two more big changes here.
The first is the 0.8 version of coordinator, when Coordinator was dependent on zookeeper to implement the management of the consumer group. Coordinator monitors zookeeper/consumers/<group>/ids's child node changes and/brokers/topics/<topic> data changes to determine if rebalance is required. Each consumer under the group decides which partitions to consume and takes its own decisions in zookeeper/consumers/<group>/owners/<topic>/<partition > Under registration. Obviously, this kind of scheme relies on the help of zookeeper, and each consumer is decided alone, there is no kind of "everyone belongs to a group, to negotiate to do things" spirit.
Based on these potential drawbacks, the 0.9 version of Kafka improved the coordinator design, proposing that group coordinator--each consumer group be assigned one such coordinator for Group management and displacement management. This group coordinator more responsibility than the original, such as Group member management, displacement submission protection mechanism. When the first consumer of the new version consumer group starts, it goes to and Kafka server to determine who is the coordinator of their group. All members of the group will then communicate with the Coordinator in a coordinated communication. Obviously, this coordinator design no longer needs zookeeper, and can be greatly improved in performance. We will discuss the latest version of the coordinator design in all the sections that follow.
4.5 How to determine coordinator?
The above is a brief discussion of the new coordinator design, then consumer group how to determine their own coordinator who? In simple terms, it is divided into two steps:
4.6 Rebalance Generation
The generational collection of the JVM GC is the word (strictly generational), which I translated into "session" here, which represents a member after rebalance, primarily for the protection of consumer group, which isolates invalid offset submissions. For example, the previous consumer member was unable to submit the displacement to the new term of the consumer group. There are times when we can see Illegal_generation's mistakes, that is, Kafka is complaining about the matter. Each time the group is rebalance, the Generation will be added 1, indicating that the group has entered a new version, as shown in: Generation 1 O'Clock Group has 3 members, followed by Member 2 exit Group, Coordinator triggers Rebalance,consumer group to enter Generation 2, after which member 4 joins, again triggering rebalance,group into generation 3.
4.7 Protocol (Protocol)
As I said earlier, rebalance is essentially a set of protocols. Group and coordinator used it to complete the group's rebalance. Currently, Kafka provides 5 protocols to deal with issues related to consumer group coordination:
Coordinator in the rebalance when the main use of the previous 4 kinds of requests.
How does consumer prove to coordinator that he is still alive? Send heartbeat requests to coordinator by timing. If the set timeout is exceeded, then coordinator thinks the consumer is dead. Once coordinator thinks a consumer is hanging, it will open a new round of REBALANCE and add "response" to the current heartbeat rebalance_in_progress of other consumer. Tell other consumer: Sorry everyone, you re-apply to join the group!
4.9 Rebalance Process
Finally speaking of the consumer group to implement the rebalance of the specific process. Many user estimates are also very interested in the working mechanism within consumer. Here is a discussion with you. Of course I must make it clear that the premise of rebalance is that coordinator has been determined.
Overall, rebalance is divided into 2 steps: Join and Sync
1 joins, as the name implies, join groups. In this step, all the members send a Joingroup request to the Coordinator, requesting the incoming group. Once all the members have sent the Joingroup request, Coordinator will select a role from consumer as leader and send the group membership information and subscription information to leader--note leader and coordinator are not a concept. Leader is responsible for the formulation of the consumption distribution program.
2 Sync, this step leader begins to allocate a consumption plan, which is which consumer is responsible for which partition to consume which topic. Once the allocation is complete, leader will encapsulate the scheme into the Syncgroup request to Coordinator, and the non-leader will also send a syncgroup request, but the content is empty. Coordinator after receiving the distribution programme, the programme will be crammed into the syncgroup response and sent to each consumer. All the members in the group know which partitions they should consume.
Let's take a few pictures to illustrate, first, the process of joining the group:
It is worth noting that before coordinator collects all member requests, it puts the received request in a place called purgatory (Purgatory). Remember that there is an article in the country to prove that Kafka developers are very literary fan, writing is also more interesting, there is an interest to search.
Then is the process of distributing the allocation scheme, which is the Syncgroup request:
Attention!! The partition allocation scheme for the consumer group is performed on the client side! Kafka the delegation of authority to the client is primarily because it provides greater flexibility. For example, I can implement a rack-aware (rack-aware) allocation scheme similar to Hadoop, which is to select the partition data under the same rack for consumer and reduce the overhead of network transmission. Kafka gives you two allocation policies by default: Range and Round-robin. Since this is not the focus of this article, here is no longer detailed, you just have to remember that you can override consumer parameters: Partition.assignment.strategy to implement their own allocation strategy.
4.10 consumer group State Machine
Like many Kafka components, group has also made a state machine to indicate the flow of groups ' status. Coordinator the consumer group according to the status opportunity, as shown in the manual drawing based on the code comments, forgive me.
Individual states in a simple description:
As for the process conditions and action between the States, this is not a specific deployment.
Three, rebalance scene analysis
It explains in detail how consumer group performs rebalance, and may still have some foggy. This part of the three important scenes of the detailed timing of the expansion, further deepening the consumer group's internal principles. Since the graphs are more intuitive, all descriptions are given in the form of graphs.
1 new Members Join group (member join)
2 Group member crashes (member failure)
As mentioned earlier, the group member crashes and the team members active leave are two different scenarios. Because members do not actively inform coordinator of this when they are crashing, coordinator may need a complete session.timeout cycle to detect such crashes, which inevitably results in consumer lag. It can be said that leaving the group is the initiative to initiate rebalance, while the crash is to initiate rebalance passively. Okay, direct:
3 Active Group members (Member leave group)
4 Commit displacement (member commit offset)
To summarize, this article emphatically discusses the new version of consumer group's internal design principle, especially the interaction between consumer group and coordinator, hoping to help you.
Kafka (consumer group)
Start building with 50+ products and up to 12 months usage for Elastic Compute Service