Kafka 0.9.0.0 Recurring consumption problem solving

Source: Internet
Author: User
Tags commit log log min time interval

background: before using the Kafka client version is 0.8, recently upgraded the version of the Kafka client, wrote a new consumer and producer code, in the local test no problem, can be normal consumption and production. However, recent projects have used a new version of the code, and when the amount of data is large, there will be recurring consumption problems. The problem of the elimination and resolution process is recorded, to avoid stepping on the pit again.

problem finding: because the Consumerrecord object can get the partition and offset of the current message, the partition and offset of the current message are also recorded in the log log. In the process of monitoring the log, it is found that the offset of one partition appears in multiple threads, and the value of the offset is not the same, so it may be repeated consumption, when the total number of records consumed by the query program and the number of message records in the Kafka vary considerably.

solve the process: Internet search How to solve the problem of Kafka repeat consumption, are said Kafka in session time did not submit offset, so refer to the online thinking, will consumer poll time changed to 100ms, That is, 100ms poll data from Kafka and set Props.put ("auto.commit.interval.ms", "1000"); Props.put ("session.timeout.ms", "30000"), Kafka Auto-commit time interval and session time.

After the test, found that when the Kafka data volume is large, there will be repeated consumption problems, and then print out the number of poll data bar to find a lot of data, and after a period of time (general 30s) will report an error:

Org.apache.kafka.clients.consumer.CommitFailedException:  
     Commit cannot is completed since the group has already Rebalanced and assigned the partitions to another member.   
    This means then the time between subsequent calls to poll () was longer than the configured session.timeout.ms, which typic Ally implies the poll loop is spending too much time message processing.   
    Can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll ( ) with Max.poll.records. [Com.bonc.framework.server.kafka.consumer.ConsumerLoop]  

This means that the commit cannot be completed because the group has been rebalanced and the partition is assigned to another member. This means that the time between subsequent calls to call poll () is longer than the configured session.timeout.ms, which usually means that the poll loop spends too much time on message processing. You can resolve this issue by increasing the session timeout or by using max.poll.records to reduce the maximum size of the batches returned in poll ().

Then set the following parameters:

    The number of data bars poll from Kafka  
    //max.poll.records data needs to be processed in session.timeout.ms this time  
    props.put ("Max.poll.records "," 100 ");  

The size of this value needs to be evaluated with the length of the session.timeout.ms, i.e. whether the 100 data can be processed within the session.timeout.ms time.

Note:

    Props.put ("session.timeout.ms", "30000");  

    The maximum wait time for a message to be sent. Need to be greater than session.timeout.ms this time  
     props.put ("request.timeout.ms", "40000");  

It is also important to note that fetch.min.bytes this parameter configuration, the size of the data pulled from Kafka, this parameter is best set, otherwise it may be problematic. The recommended setting is:

    The minimum data that the server sends to the consumer, if the value is not met, waits until the specified size is met. A default of 1 indicates immediate reception.  
    props.put ("Fetch.min.bytes", "1");  

Summarize:

In general, Kafka repeated consumption is due to the non-normal submission of offset, so modify the configuration, the normal submission of offset can be resolved. The main configurations mentioned above are as follows:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.