"Original" Kafka Consumer source Code Analysis

Last Update:2015-06-09 Source: Internet

Author: User

Tags zookeeper client

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As the name implies, it is Kafka's consumer API package.

First, Consumerconfig.scalaKafka consumer Configuration class, in addition to some of the default value constants and validation parameters of the method, is consumer configuration parameters, such as group.id, consumer.id, etc., detailed list see official website. Second, Consumeriterator.scalaThe Kafkastream iterator class that iterator is placed in a blocking state when the blocked queue at the bottom of the stream is empty. This iterator also provides a Shutdowncommand object that can be added to the queue as an identity bit to trigger a close operation.since it is an iterator, the most important next method must be provided. Here we analyze the method of its definition in turn:1. Next: Gets the next element. The specific logic isGet the next messageandmetadata with the next method of the parent class, and then update the consumer's metric statistics 2. Makenext: The core method, the specific logic is as follows:

Gets the current iterator, and if it is empty, gets one. The practice is to read a block of data from the underlying channel in different ways based on the timeout configuration
If the data block is a close command, return directly
Otherwise, gets the current topic information. If the displacement value to be requested is greater than the current consumption, then consumer may lose data.
Then get a iterator and call the next method to get the next element and construct a new Messageandmetadata instance to return

3. Clearcurrentchunk: Clears the current data block, that is, the current iterator reference is emptied Third, Kafkastream.scalaDefines a Kafka consumer stream. Each stream supports iterating through its messageandmetadata elements. An iterator Consumeriterator is maintained internally. Kafkastream defines the method as follows: 1. Iterator: Returns an internally maintained iterator 2. Clear: Clears the iterated queue when the consumer is re-distributed. Mainly to reduce the consumer received duplicate messages Iv. Consumerconnector.scalaThe main interface of the consumer. Defines a trait and an object. Consumerconnector trait defines some abstract methods: 1. Createmessagestreams: Create a set of KAFKASTREAM2 for each topic. Createmessagestreams (supports specifying Keydecoder and Valuedecoder) 3. Createmessagestreamsbyfilter: Also creates a set of Kafkastream for all topic given, except that this method allows a filter to be passed, allowing the black-and-white list to be filtered by 4. Commitoffsets: The Commit displacement operation is performed on all broker partitions connected to this consumer connector 5. Shutdown: Close connector and Consumer object defines two methods: 1. Create: Creates a ConsumerConnector2. Createjavaconsumerconnector: Create a consumer that is used by the Java Client Connector Wu, Fetcheddatachunk.scalaRepresents a block of acquired data that encapsulates a set of messages saved in a byte buffer, partition topic information, and the obtained displacement value Liu, Partitionassignor.scalaPartitioned for consumer in a consumer group. Partitionassignor trait defines the Assign method, which returns the mapping record for the partition to the consumer thread. Where the assigned thread must belong to a consumer in the given partition context (Assignmentcontext).when it comes to assigning the context class--assignmentcontext, it needs to receive a consumer group, a consumer ID, and a zkclient, And internally maintains a map record topic corresponding consumer thread collection (mainly provided by the methods in the TopicCount Class). The methods that are defined also include:1. Partitionsfortopic: Returns the topic corresponding to the partition collection2. Consumersfortopic: Returns the consumers thread corresponding to the topic3. Consumers: Returns a collection of consumers IDsPartitionassignor object defines a factory method for creating a partition allocator for different policies, and currently Kafka supports two rebalancing strategies (that is, partition allocation policy): Round robin and range. It is important to note that the partitioning strategy referred to here refers to how partitions are assigned to different consumer instances within the consumer group.Suppose we have a topic:t1,t1 with 10 partitions, respectively [P0, P9], and then we have 2 consumer,c1 and C2. C1 has one thread, and C2 has two threads. Let's take a look at how the default range policy allocates partitions:
1. Range Policyfor each Topic,range policy, all available partitions are sorted first in numerical order, and all consumer threads are listed in dictionary order. In conjunction with our example above, the partitioning order is 0,1,2,3,4,5,6,7,8,9, and the order of consumer threads is c1-0, c2-0, c2-1. It then divides the number of partitions by the number of threads to determine at least the number of partitions that each thread obtains. In our case, 10/3 cannot be divisible, and the remainder is 1, so c1-0 will be allocated an extra partition. The final partition allocation is as follows:
C1-0 Get partition 0 1 2 3c2-0 get partition 4 5 6c2-1 get partition 7 8 9 If the topic is a 11 partition, the partition is assigned as follows: C1-0 get partition 0 1 2 3c2-0 get partition 4 5 6 7c2-1 get partition 8 9 102. Roundrobin Strategy-Polling policyIf it is a polling policy, the example we assume above does not apply, because the policy requires that all consumer that subscribe to a topic must have the same number of threads, so we modify the example above, assuming that each consumer has 2 threads. One of the main differences between the round robin strategy and range is that you can't predict the result of the assignment until redistribution-because it uses hash modulo to randomize the sort order. if you want to adopt a Roundrobin strategy, you must first meet two conditions:

The consumer of the subscription topic must have the same number of threads
Each consumer instance within the consumer group must have the same subscribed topic collection

when these two conditions are met, Kafka will randomly sort the topic-partition against hashcode to prevent all partitions of a topic from being assigned to a consumer. All topic-partition are then assigned to the available consumer threads in the form of polling. In our improved example, suppose the topic-partition after sorting is this:t1-5, T1-3, t1-0, T1-8, T1-2, T1-1 ,T1-4, T1-7, T1-6 and t1-9, while consumer threads are c1-0, C1-1, c2-0, c2-1. Then the final partition result is as follows: T1-5 go to C1-0t1-3 to c1-1t1-0 to C2-0t1-8 c2-1 at this time all the cons The Umer thread has been allocated, but there are still unallocated partitions, and then the thread is allocated from the beginning: T1-2 go to C1-0t1-1 to c1-1t1-4 c2-0t1-7 go c2-1 again from the beginning, t1-6 to C1-0t1-9 to c1-1 at this time all the partitions have been allocated Over, each consumer thread can be assigned to almost the same number of partitions-that's how round robin is. Seven, Topiccount.scalaThe Scala defines a number of classes, and we analyze each: 1. Consumerthreadid: Encapsulates the consumer ID and thread ID. Because the ordered interface is extended, it is supported in dictionary order. Used primarily for partitioning policies. 2. TopicCount trait: Provides the main interface for topic grouping statistics and defines three methods:

getconsumerthreadidspertopic--returns a mapping of topic and its consumer thread ID collection
gettopiccountmap--returns the mapping of topic corresponding to consumer stream number
Pattern: There are currently three species of pattern:static, White_list and Black_list. Allows consumer to subscribe to multiple topic with support for black-and-white lists

3. TopicCount object: Defines a number of common methods, such as:

The naming rule for Makethreadid:consumer thread is [consumer id]-thread ID
Makeconsumerthreadidspertopic: Creates a set of Consumerthreadid for a given set of topic to
Constructtopiccount: Creates a topiccount based on the given consumer group and consumer ID. The specific logic is as follows:

Read the data under the/consumers/[group_id]/ids/[consumer_id] node (JSON)
Parse the JSON string to extract the values of each field
If pattern is a static type, create a statictopiccount return; otherwise create a wildcardtopiccount to return

Constructtopiccount also has two additional overloaded methods, creating Statictopiccount and WildcardTopicCount4, respectively. Statictopiccount class: Implements the TopicCount interface. The pattern type is static5. Wildcardtopiccount class: Implements the TopicCount interface. According to the given topicfilter to determine whether the pattern is white_list or black_list Eight, Topicfilter.scalaTopicfilter abstract class, used to parse the regular expression of topic, and provides a istopicallowed method for filtering topic. It has two sub-categories: Whitelist and blacklist respectively implement whitelist filtering and blacklist filtering. Nine, Partitiontopicinfo.scalaThe partition information of the topic is encapsulated, including the data block queue of the partition, the consumed displacements, the acquired displacements, and the fetch size. In addition, some setter and getter methods are available to get and set this information 10, Zookeeperconsumerconnector.scalaThis class is primarily responsible for handling the interaction between consumer and zookeeper. Zookeeper directory structure associated with consumer: 1. Consumer ID Registration node:/consumers/[group_id]/ids/[consumer_id] Each consumer has a unique ID number within the consumer group. It registers the ID number as a temporary node in the corresponding directory of the zookeeper and encapsulates all topic it subscribes to the subscription child JSON element. Because it is a temporary node, consumer will delete the node as soon as it ends zookeeper. It is important to note that the consumer ID is not named in sequential nodes, but is selected from the configuration-mainly because sequential generation nodes are not conducive to error recovery 2. Broker node Registration:/brokers/ids/[brokerid]. Each broker node is assigned a logical node number, starting with 0. When the broker starts, it registers itself in the zookeeper-that is, under/brokers/ids, creates a child node named after the logical node number. The value of this znode is a JSON string containing the following information:

Version: Revision number, fixed to 1
Host:broker IP address or host name
Port:broker Port
JMX: If JMX is enabled, this is the port number of JMX, otherwise-1
Timestamp:broker timestamp at time of creation

3. Partition registration information:/consumers/[group_id]/owners/[topic]/[partitionid]. 4. Consumer displacement information:/consumers/[group_id]/offsets/[topic]/[partitionid]-Displacement This Scala defines a group of associated objects, Where there is only one variable in object Shutdowncommand is used to identify the closing identity. When you see this identity in the queue, you need to end the iterative process. The Zookeeperconsumerconnector class is the core of this file. It implements the Consumerconnector trait, so it is also necessary to implement those abstract methods defined by the trait.Let's analyze some important fields of the class definition:1. Isshuttingdown: Used to identify whether the state of the connector is processing off state2. Fetcher:consumerfetcher Manager for managing Fetcher threads3. Zkclient: A client for connecting to zookeeper4. Topicregistry: Save the partition information under topic5. Checkpointedzkoffsets: Save the topic partition corresponding to the displacement 6. Topicthreadidandqueues: Holds the blocking queue 7 that corresponds to the topic's consumer thread. Scheduler: the Scheduler submits consumer displacement to zookeeper every auto.commit.interval.ms time 8. Messagestreamcreated: Identifies whether Kafkastream has created 9. Sessionexpirationlistener/topicpartitionchangelistener/loadbalancerlistener: Three ZK monitors, implemented by three nested classes, are mentioned in the following 10. Offsetschannel: Channel 11 for sending Offsetfetchrequst. The Wildcardtopicwatcher:zookeepertopiceventwatcher class implements the topic event Listener Class 12. Consumeridstring: Defines the rules for how to name consumer IDs. If Consumer.id is not specified, it is set to consumer GROUP_ host name-timestamp-(part of the UUID) in the constructor, the class will first connect zookeeper, then create the Fetcher Manager and confirm the connection to the replica manager in a blocking manner , and finally, if Auto commit (auto.commit.enable) is turned on, use the scheduler to create a timed task.Here are some of the ways that it provides:1. Connectzk: Connect the zookeeper specified in Zookeeper.connect to create Zkclient2. Createfetcher: Create Consumerfetchermanager3. Ensureoffsetmanagerconnected: This method will be blocked until you are sure that the available replica manager is found, and that the underlying IO channel is also created. This method is only for the use of Kafka to save consumer displacement-that is, set Offsets.storage=kafka4. Shutdown: Close the connector, mainly related to shutting down Wildcardtopicwatcher, scheduler, Fetcher Manager, clearing all queues, submitting displacements, and shutting down zookeeper clients and displacement channels, etc.5. REGISTERCONSUMERINZK: Register a given consumer--in zookeeper to create a temporary node under zookeeper/consumers/[groupid]/ids6. Sendshutdowntoallqueues: Clears the queue in the Topicthreadidandqueues and sends the close command to all queues7. Autocommit: Automatic submission of displacements, mainly implemented by method Commitoffsets8. Commitoffsettozookeeper: Submitting a displacement to the zookeeper is to update the data for the specified node and save the offset in the checkpointedzkoffsets cache9. Commitoffsets: Submit displacement. Before you specifically parse the code, first analyze the number of times that the property offsets.commit.retries--retry displacement. It is only valid for displacement commits when the connector is closed, and does not count autocommit-initiated commits. It also does not consider the displacement of the query before committing. For example, if a consumer metadata request fails for some reason, it will be retried but not counted in this statistic. Commitoffsets seems to be the parameter meaning of the reverse, it now the parameter name is Isautocommit, but the actual actual call process, if it is autocommit need to specify false instead. The specific logic is as follows:

Sets the number of retries based on whether it is auto-commit-if it is 1 times without retrying; otherwise offsets.commit.retries + 1
To build a set of displacements to commit from topicregistry
If the collection is empty, you do not need to commit anything, otherwise decide what kind of storage to use to save consumer displacements
If it is zookeeper save (by default), traverse the set of pending displacements to update the displacements for each topic partition to the corresponding node zookeeper
If it is Kafka to save the displacement,

To create a offsetcommitrequest request first
Then make sure that you can connect to the replica manager
Send the offsetcommitrequest request and get the corresponding response
Find the error code contained in the response if there is an error marking the commit offset failure

Fetchoffsetfromzookeeper: Gets the displacement of a given partition from the Zookeeper 11. Fetchoffsets: Gets the consumer displacement of a set of partitions, if it is saved in zookeeper directly called Fetchoffsetfromzookeeper gets, otherwise the specific logic is as follows:

Create Offsetfetchrequest
Make sure to connect to the replica manager and send offsetfetchrequest requests to get the corresponding response
If the leader has changed or the displacement cache is loading, the returned response is empty--for later retry
See if two-way displacement commit (dual.commit.enable) is enabled-for example, a consumer group is moving from migration zookeeper to Kafka, and if not, return to response directly, Otherwise, choose the big one from zookeeper and Kafka and return it to response.

There are some important ways to do this, but let's take a look at the 4 classes of nested definitions in the Scala file: 1. zksessionexpirelistener--listens for listeners that zookeeper session expires. Because of the Izkstatelistener interface in advance, it is also necessary to implement the handlestatechanged and Handlenewsession two methods.

Handlestatechanged: Don't do anything because the zookeeper client will re-connect
This method is called after the Handlenewsession:zookeeper session expires to create a new session. This means rebuilding the temporary node and re-registering the consumer. The main logic is

First empty the Topicregistry partition information cache
Re-registering consumer (REGISTERCONSUMERINZK) in zookeeper
Re-initiate load balancing operations on consumer-the Syncrebalance method of the Load Balancer listener. In addition, the Listener for child node changes and state changes is re-registered during load balancing, so the Handlenewsession method does not re-subscribe to them.

2. Zktopicpartitionchangelistener: Also a listener that listens for changes to zookeeper node data. Two methods:

Handledatachange:topic The method is called when the data is changed, the method is to call relabalanceeventtriggered to notify all listener execution threads to continue execution
Handledatadeleted: Throwing a warning indicates that the topic data was accidentally deleted

3. Zkrebalancerlistener: Monitors zookeeper child node changes to trigger load balancing of consumer. Inside the class, it creates a monitoring execution thread to monitor the given consumer, and once it is monitored to trigger rebalance, call syncedrebalance to start executing rebalance. Because it is a zookeeper listener class, it must also implement Handlechildchange, which is used to trigger the Rebalacen event. Here's a way to analyze its definition:

rebalanceeventtriggered--set Iswatchertriggered to True and wake up the monitoring thread to start the rebalance operation
deletepartitionownershipfromzk--Delete the partition corresponding to the given topic from the Zookeeper Znode:/consumers/[groupid]/owners/[topic]/[partition ] is to delete this consumer's registration information
releasepartitionownership--cancels the consumer registration information for all partitions of all topic by looping through the Deletepartitionownershipfromzk method. and delete the corresponding statistics and clear the corresponding counters
resetstate--empties all topic information registered on the consumer connector
clearfetcherqueues--empties all fetcher-related queues and data chunk that are currently traversing the consumer thread
closefetchersforqueues--Stop all Fetcher threads and empty all queues to avoid duplication of data. Stop the leader Discovery thread before emptying the fetcher. Then, if auto-commit displacement is enabled or the displacement is required to prevent consumer from returning messages from the current block. Because the partition registration information is still not released in the zookeeper, this commit displacement ensures that the currently committed displacement will be used by the next consumer thread that has the partition of the current chunk. Because Fetcher is always closed and this is the last chunk of consumer traversal, the iterator will no longer return any new messages until rebalance completes successfully and gets more chunks after fetcher restart
closefetchers--emptying the Fetcher queue of topic partitions that consumer "may" no longer consume
updatefetcher--Update the Fetcher partition
reflectpartitionownershipdecision--Determines whether consumer is the owner of a given topic partition, that is, creating/consumers/[groupid]/owners/[on Zookeeper Topic]/[partition], if can be created is owner
addpartitiontopicinfo--adds the given topic partition information to this connector cache
reinitializeconsumer--reinitialization of consumer, mainly to create a variety of listeners, update various caches, etc.
rebalance--the corresponding record of the Consumer-topic partition according to the available broker
syncedrebalance--the corresponding record of the Consumer-topic partition for re-balancing allocation

4. Wildcardstreamshandler class: Used to do topic for the use of wildcard filter 11, Consumerfetchermanager.scalaConsumer Fetcher's management class, its defined startconnections and Stopconnections methods are called repeatedly. The class primarily defines a nested class:
leaderfinderthread--, as the name implies, is the leader Discovery thread that adds fetcher to the corresponding broker when leader is available 12, Consumerfetcherthread.scalaConsumer get thread, three methods: 1. Processpartitiondata: Processing the obtained data, mainly is the message collection into the queue waiting for processing 2. Handleoffsetoutofrange: Handles the displacement of a partition out of bounds, mainly according to the value set by the Auto.offset.reset property to specify 3. Handlepartitionswitherrors: Dealing with partitions that do not leader require leader elections 13, Consumertopicstats.scalaConsumer the statistical information class, it is not in detail to say 14, Fetchrequestandresponsestats.scalaStatistics of all Fetchrequest request statistics and corresponding response statistics submitted by a given consumer client to all brokers 15, Topiceventhandler.scalaA trait that handles the topic event defines only one method: Handletopicevent 16, Zookeepertopiceventwatcher.scalaMonitoring the changes of each topic child node under the/brokers/topics node 17, Simpleconsumer.scalaKafka the consumer of the message. It maintains a blockingchannel for sending and receiving requests/responses, so the connect and disconnect methods are also available to enable and disable the underlying blockingchannel. The core approach to defining this class includes: 1. Send, that is, sending Topicmetadatarequest and ConsumerMetadataRequest2. Getoffsetsbefore: Gets a set of valid displacements before a given time of 3. Commitoffsets: Submits a topic displacement. If the version is 0 in the request, submit the displacement to zookeeper, otherwise commit the displacement to Kafka4. Fetchoffsets: Gets the displacement of a topic. Version 0 is obtained from zookeeper, otherwise 5 is obtained from Kafka. Earliestorlatestoffset: Gets the earliest or most recent displacement 6 for a given topic partition. Fetch: Get a topic set of messages from Fetchrequest

"Original" Kafka Consumer source Code Analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More