The consumer pulls the message and processes the main 4 steps:
- Gets the offset position of the consumer's pulled partition offsetfetchrequest (the new message starts at the offset position)
- Create the Fetchreqeust, generate the Map<node, Fetchrequest>, the node to which the consumer pulls the cancellation value is the key to group, the data that is consumed by the topicpartition, is in the unsent queue
- Call the poll method to actually send the request to the corresponding node, and if the return succeeds, in the Onsecuss method, the message is saved in Completedfetches
- Extracting data from completedfetches, converting to Consumerrecord, emptying buffers and updating consumption offset locations
Offset Management: update pull offset, updatefetchpositions, send Offsetfetchrequest request
After the consumer starts, it needs to get the offset of the last commit of the partition it consumes. Consumers need to submit a consumption offset (committed offset) after consuming the message, and when rebalancing (reblance) occurs, the partition (partition) may be pulled by different consumers to pick up the message, and the new consumer needs to know which offset to spend the last time. Then the new consumer will need to make a request to coordinator to get the commit offset (committed offset, the previous consumer's last commit offset) and update the local pull offset (fetch position). When the consumer submits the offset, there are 2 strategies to choose from, Auto commit (Auto commits) and manual commit (manually commit)
Auto Commit:
Typically, in order to improve performance, automatic submission is used, and the auto-commit interval (auto.commit.interval.ms) defaults to 5000 milliseconds, which is achieved by the task of delaying the queue, and after the consumer each pull message consumption, if the delay queue auto The commit task is submitted to the commit interval, then it is committed to update committed offset, and if there is no timeout time for the deferred task, the deferred task is not performed and the message continues to be pulled, but after the actual consumption processing the message, the consumer may crash before the offset is submitted. This leads to the existence of repeated consumption
Manual submission:
In some scenarios, in order to control the consumption offset more accurately to ensure that the message is not repeated consumption or not lost, the consumer client manually control whether to commit the offset
Offset and consumption semantics
Repeat consumption (at least once consumption semantics at least once):
If the consumption process is: Pull the message, process the message, commit the consumption offset. Each time poll pulls n messages and processes the message, the consumer thread saves commited offset and then performs the deferred task, but it is possible because there is no task to perform the commit offset because of the time interval, and if this time, the consumer crashes, triggers the rebalance, caused the offset of the consumption is not submitted, the actual commited offset is less than the real consumption offset, when the new consumer from the newly obtained submission offset pull messages, resulting in repeated consumption, as shown in
Missing consumption (up to one consumption semantics at most once):
If the consumption process is: Pull the message, commit the consumption offset, and process the message. When the consumption offset is submitted, adding consumer crashes trigger rebalancing, and when the new consumer tries to update the offset information, it will cause the missing message, as shown in:
Pull and consume of messages
Consumer if a message exists during the last message pull, it is returned directly, otherwise the pull cancellation request is resent from the previously updated pull offset location.
Pull cancellation request to the consumer consumption of topicpartition node Map<node, Fetchrequest>, and then through the poll to the corresponding node to obtain the partition message, once the message is successfully obtained, will be saved in Completedfetches, converted to a record grouped by topicpartition on return
Map<topicpartition, list<consumerrecord<k, v>>> drained
In addition, the local record of the last message consumed by the offset +1, at the next consumption, the offset check to determine the first record of offset must be equal to this value, otherwise it is ignored
Private intAppend (Map<topicpartition, list<consumerrecord<k, v>>>drained, partitionrecords<k, v>Partitionrecords,intmaxrecords) { ...... List<consumerrecord<k, v>> partrecords =Partitionrecords.take (maxrecords); LongNextoffset = Partrecords.get (Partrecords.size ()-1). Offset () + 1;//Next consumption location... subscriptions.position (partitionrecords.partition, nextoffset);//The location of this topicpartition recording consumption is Nextoffset returnpartrecords.size (); ....... }
Kafka and migration management of consumer messages