Apache Kafka series of producer processing logic

Recently research producer load Balancing strategy,,, I in the Librdkafka in the code to implement the partition value of the polling method,, but in the field verification, his load balance does not work,, so to find the reason; The following is an article describing Kafka processing logic , reproduced here, study a bit.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Kafka producer Processing Logic

Kafka producer generated data sent to Kafka Server, the specific distribution logic and load balancing logic, all maintained by producer.

Kafka structure diagram

Kafka producer Default Call logic

Default partition logic

1. Distribution Logic without key

Every topic.metadata.refresh.interval.ms time, randomly select a partition. All records within this time window are sent to this partition.

A partition is also re-selected after sending data error

2. Distribution according to Key

Hash the key and then model the number of partition

Utils.abs (key.hashcode)% numpartitions

How to obtain partition leader information (meta data)

After deciding which partition to send to, you need to make sure which broker the partition is leader to decide where to send.

Specific implementation Location


Implementation scenarios

1. Get partition metadata from broker. Because Kafka all brokers have all of the metadata, any broker can return all of the meta data

2. Broker selection strategy: Randomly sort the broker list, start Access from the first broker, and if there is an error, access the next

3. Error handling: Request metadata to the next broker after an error


    • Producer is getting metadata from the broker and does not care about zookeeper.
    • When the broker changes, the ability of the producer to get the metadata does not change dynamically.
    • The list of brokers used when getting metadata is determined by metadata.broker.list in the configuration of producer. As long as there is a normal service on the machine in this list, producer can get the metadata.
    • After getting the metadata, producer can write data to the broker in the non-metadata.broker.list list

Error handling

The Send function of producer does not return a value by default. Error handling has a EventHandler implementation.

Defaulteventhandler error handling is as follows:

    • Get the data that went wrong
    • Wait for a time interval, determined by the configuration retry.backoff.ms the length of time
    • Re-fetch meta data
    • Re-send data

The number of error retries is determined by configuration message.send.max.retries

Defaulteventhandler throws an exception when all retries fail. The code is as follows

if (Outstandingproducerequests.size >0) {

ProducerStats.failedSendRate.mark ()

Val correlationidend = Correlationid.get ()

error ("Failed to send requests for topics%s with correlation IDs in [%d,%d]"

. Format (Outstandingproducerequests.map (_.topic). Toset.mkstring (","),

Correlationidstart, correlationidend-1))

thrownewfailedtosendmessageexception ("Failed to send messages after"+ Config.messagesendmaxretries +"tries.", null)


