Simple Analysis of new producer source code in Kafka 0.8.1

Source: Internet
Author: User

1. Background

Recently, due to project requirements, Kafka's producer needs to be used. However, for C ++, Kafka is not officially supported.

On the official Kafka website, you can find the 0.8.x client. The client that can be used has a C-version client. Although the client is still active, there are still many code problems and the support for C ++ is not very good.

There is also the C ++ version. Although the client is designed according to the idea of C ++, it was last updated on January 1, December 19, 2013 and has not been updated for a long time.

I have learned from the official website that Kafka authors are not satisfied with the existing producer and consumer designs. They intend to release a new producer and consumer in Kafka 0.9.

The new producer has been included in the kafka0.8.1 source code. The official description is as follows.

3.4 new producer configs

We are working on a replacement for our existing producer. The Code is available in trunk now and can be considered beta quality. below is the configuration for the new producer

The new producer is still in beta version. However, in kafka0.9, both the new producer and consumer become stable versions and provide more functions. The old producer version is implemented by Scala and provides APIs for Java to call. The new producer version is directly implemented in Java.

2 introduction to the basic producer class

The source code tree is as follows:

Producerperformance. Java under the org. Apache. Kafka. Clients. Tools Package contains the most basic usage of producer.

The program originally had three parameters. After the three parameters are assigned a hard-coded value, the Code is as follows:

public static void main(String[] args) throws Exception {        String url = "";        int numRecords = 100;        int recordSize = 100;        Properties props = new Properties();        props.setProperty(ProducerConfig.REQUIRED_ACKS_CONFIG, "1");        props.setProperty(ProducerConfig.BROKER_LIST_CONFIG, url);        props.setProperty(ProducerConfig.METADATA_FETCH_TIMEOUT_CONFIG, Integer.toString(5 * 1000));        props.setProperty(ProducerConfig.REQUEST_TIMEOUT_CONFIG, Integer.toString(Integer.MAX_VALUE));        KafkaProducer producer = new KafkaProducer(props);        Callback callback = new Callback() {            public void onCompletion(RecordMetadata metadata, Exception e) {                if (e != null)                    e.printStackTrace();            }        };        byte[] payload = new byte[recordSize];        Arrays.fill(payload, (byte) 1);        ProducerRecord record = new ProducerRecord("test6", payload);        long start = System.currentTimeMillis();        long maxLatency = -1L;        long totalLatency = 0;        int reportingInterval = 1;        for (int i = 0; i < numRecords; i++) {            long sendStart = System.currentTimeMillis();            producer.send(record, callback);            long sendEllapsed = System.currentTimeMillis() - sendStart;            maxLatency = Math.max(maxLatency, sendEllapsed);            totalLatency += sendEllapsed;            if (i % reportingInterval == 0) {                System.out.printf("%d  max latency = %d ms, avg latency = %.5f\n",                                  i,                                  maxLatency,                                  (totalLatency / (double) reportingInterval));                totalLatency = 0L;                maxLatency = -1L;            }        }        long ellapsed = System.currentTimeMillis() - start;        double msgsSec = 1000.0 * numRecords / (double) ellapsed;        double mbSec = msgsSec * (recordSize + Records.LOG_OVERHEAD) / (1024.0 * 1024.0);        System.out.printf("%d records sent in %d ms ms. %.2f records per second (%.2f mb/sec).", numRecords, ellapsed, msgsSec, mbSec);        producer.close();    }

As you can see, running producer requires three basic classes: producerconfig, kafkaproducer, producerrecord, and callback function class callback.

The producerconfig class contains various configurations of Kafka and provides default configurations.

The producerrecord class is the message carrier sent to the broker, including the topic, partition, key, and value attributes.

The above two classes are very simple.

All producer operations are included in the kafkaproducer class.

This class consists of partitioner, metadata, recordaccumulator, sender, and metrics.

Partitioner is a class used to calculate the part of a message.

As the name suggests, metadata stores the metadata of the Kafka cluster. The updates of metadata are related to topics.

The recordaccumulator is similar to a queue. All messages sent by the producer are sent to the queue for processing.

The sender class uses NiO to send and receive producer messages. The sender is a daemon thread that listens to read/write events and

Metrics class, Kafka was originally used for Distributed log collection and monitoring. Metrics class can register some attention content for monitoring.

3. Source Code Analysis

We analyze the working process of the producer by sending a message.

Sending a message can be divided into two asynchronous processes.

Team-up process

@Override    public Future<RecordMetadata> send(ProducerRecord record, Callback callback) {        try {            Cluster cluster = metadata.fetch(record.topic(), this.metadataFetchTimeoutMs);            int partition = partitioner.partition(record, cluster);            ensureValidSize(record.key(), record.value());            TopicPartition tp = new TopicPartition(record.topic(), partition);            FutureRecordMetadata future = accumulator.append(tp, record.key(), record.value(), CompressionType.NONE, callback);            this.sender.wakeup();            return future;        } catch (Exception e) {            if (callback != null)                callback.onCompletion(null, e);            return new FutureFailure(e);        }    }

The send function first obtains the basic data of the Cluster Based on the topic. If the topic does not exist, the function blocks and updates metadata.

Next, retrieve the partition and write the data to the queue under the topicpartition.

public FutureRecordMetadata append(TopicPartition tp, byte[] key, byte[] value, CompressionType compression, Callback callback) throws InterruptedException {        if (closed)            throw new IllegalStateException("Cannot send after the producer is closed.");        // check if we have an in-progress batch        Deque<RecordBatch> dq = dequeFor(tp);        synchronized (dq) {            RecordBatch batch = dq.peekLast();            if (batch != null) {                FutureRecordMetadata future = batch.tryAppend(key, value, compression, callback);                if (future != null)                    return future;            }        }        // we don‘t have an in-progress record batch try to allocate a new batch        int size = Math.max(this.batchSize, Records.LOG_OVERHEAD + Record.recordSize(key, value));        ByteBuffer buffer = free.allocate(size);        synchronized (dq) {            RecordBatch first = dq.peekLast();            if (first != null) {                FutureRecordMetadata future = first.tryAppend(key, value, compression, callback);                if (future != null) {                    // Somebody else found us a batch, return the one we waited for! Hopefully this doesn‘t happen                    // often...                    free.deallocate(buffer);                    return future;                }            }            RecordBatch batch = new RecordBatch(tp, new MemoryRecords(buffer), time.milliseconds());            FutureRecordMetadata future = Utils.notNull(batch.tryAppend(key, value, compression, callback));            dq.addLast(batch);            return future;        }    }

This function has a large usage section on the send function. In short, the send function can implement simple blocking sending (using future. get () method, and use the callback function to implement non-blocking sending.

Because this is a process of writing data to the socket, after joining the queue, immediately call the Wakeup function to wake up the sender blocking reading data and send data.

Team-out process

This process is completed by the daemon thread, which continuously loops on the run function.

public int run(long now) {        Cluster cluster = metadata.fetch();        // get the list of partitions with data ready to send        List<TopicPartition> ready = this.accumulator.ready(now);        // prune the list of ready topics to eliminate any that we aren‘t ready to send yet        List<TopicPartition> sendable = processReadyPartitions(cluster, ready, now);        // should we update our metadata?        List<NetworkSend> sends = new ArrayList<NetworkSend>(sendable.size());        InFlightRequest metadataReq = maybeMetadataRequest(cluster, now);        if (metadataReq != null) {            sends.add(metadataReq.request);            this.inFlightRequests.add(metadataReq);        }        // create produce requests        List<RecordBatch> batches = this.accumulator.drain(sendable, this.maxRequestSize);        List<InFlightRequest> requests = collate(cluster, batches);        for (int i = 0; i < requests.size(); i++) {            InFlightRequest request = requests.get(i);            this.inFlightRequests.add(request);            sends.add(request.request);        }        // do the I/O        try {            this.selector.poll(5L, sends);        } catch (IOException e) {            e.printStackTrace();        }        // handle responses, connections, and disconnections        handleSends(this.selector.completedSends());        handleResponses(this.selector.completedReceives(), now);        handleDisconnects(this.selector.disconnected());        handleConnects(this.selector.connected());        return ready.size();    }

The Code annotations are clear ..

Handlesends implements the future and callback in the process of joining the queue.

Subsequent network protocol encapsulation will not be described in detail. Next, I will analyze the C client librdkafka of Kafka producer.


The first blog may not be clearly written. I hope you can give more comments. Thank you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.