Kafka and migration management of consumer messages

Source: Internet
Author: User

The consumer pulls the message and processes the main 4 steps:

    • Gets the offset position of the consumer's pulled partition offsetfetchrequest (the new message starts at the offset position)
    • Create the Fetchreqeust, generate the Map<node, Fetchrequest>, the node to which the consumer pulls the cancellation value is the key to group, the data that is consumed by the topicpartition, is in the unsent queue
    • Call the poll method to actually send the request to the corresponding node, and if the return succeeds, in the Onsecuss method, the message is saved in Completedfetches
    • Extracting data from completedfetches, converting to Consumerrecord, emptying buffers and updating consumption offset locations

Offset Management: update pull offset, updatefetchpositions, send Offsetfetchrequest request

After the consumer starts, it needs to get the offset of the last commit of the partition it consumes. Consumers need to submit a consumption offset (committed offset) after consuming the message, and when rebalancing (reblance) occurs, the partition (partition) may be pulled by different consumers to pick up the message, and the new consumer needs to know which offset to spend the last time. Then the new consumer will need to make a request to coordinator to get the commit offset (committed offset, the previous consumer's last commit offset) and update the local pull offset (fetch position). When the consumer submits the offset, there are 2 strategies to choose from, Auto commit (Auto commits) and manual commit (manually commit)

Auto Commit:

Typically, in order to improve performance, automatic submission is used, and the auto-commit interval (auto.commit.interval.ms) defaults to 5000 milliseconds, which is achieved by the task of delaying the queue, and after the consumer each pull message consumption, if the delay queue auto The commit task is submitted to the commit interval, then it is committed to update committed offset, and if there is no timeout time for the deferred task, the deferred task is not performed and the message continues to be pulled, but after the actual consumption processing the message, the consumer may crash before the offset is submitted. This leads to the existence of repeated consumption

Manual submission:

In some scenarios, in order to control the consumption offset more accurately to ensure that the message is not repeated consumption or not lost, the consumer client manually control whether to commit the offset

Offset and consumption semantics

Repeat consumption (at least once consumption semantics at least once):

If the consumption process is: Pull the message, process the message, commit the consumption offset. Each time poll pulls n messages and processes the message, the consumer thread saves commited offset and then performs the deferred task, but it is possible because there is no task to perform the commit offset because of the time interval, and if this time, the consumer crashes, triggers the rebalance, caused the offset of the consumption is not submitted, the actual commited offset is less than the real consumption offset, when the new consumer from the newly obtained submission offset pull messages, resulting in repeated consumption, as shown in

Missing consumption (up to one consumption semantics at most once):

If the consumption process is: Pull the message, commit the consumption offset, and process the message. When the consumption offset is submitted, adding consumer crashes trigger rebalancing, and when the new consumer tries to update the offset information, it will cause the missing message, as shown in:

Pull and consume of messages

Consumer if a message exists during the last message pull, it is returned directly, otherwise the pull cancellation request is resent from the previously updated pull offset location.

Pull cancellation request to the consumer consumption of topicpartition node Map<node, Fetchrequest>, and then through the poll to the corresponding node to obtain the partition message, once the message is successfully obtained, will be saved in Completedfetches, converted to a record grouped by topicpartition on return

Map<topicpartition, list<consumerrecord<k, v>>> drained

In addition, the local record of the last message consumed by the offset +1, at the next consumption, the offset check to determine the first record of offset must be equal to this value, otherwise it is ignored

 Private intAppend (Map<topicpartition, list<consumerrecord<k, v>>>drained, partitionrecords<k, v>Partitionrecords,intmaxrecords)    {    ...... List<consumerrecord<k, v>> partrecords =Partitionrecords.take (maxrecords); LongNextoffset = Partrecords.get (Partrecords.size ()-1). Offset () + 1;//Next consumption location... subscriptions.position (partitionrecords.partition, nextoffset);//The location of this topicpartition recording consumption is Nextoffset    returnpartrecords.size (); .......  }

Kafka and migration management of consumer messages

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.