Introduction to Librdkafka-the Apache Kafka/C + + client library translation

Source: Internet
Author: User

Article Source Address: Https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md


Librdkafka is a high-performance implementation of the Apache Kafka Client C language, providing reliable and well-behaved clients, while also providing a relatively rudimentary C + + interface.



Contents


This article mainly contains the following chapters:


First, performance

-Performance indicators

-High throughput

-Low Latency

-Compression


Ii. Reliability of messages


Third, usage

-Document Introduction

-Initialization

-Configuration

-Threads and callback functions

-brokers

-producer API

-consumer API


Iv. Other

-Test Details




First, performance


The Librdkafka library is multithreaded, designed for today's hardware systems, and strives to achieve minimal memory copy. The payload of the messages produced or consumed are not copied in the transmission, and there is no limit to the size of the messages.


"You may need high throughput or low latency, but you can have these two performance".


Librdkafka allows you to achieve high throughput or low latency, which is achieved through configurable property settings.


The two most important configuration properties in the performance metrics are:


-batch.num.messages: The minimum number of messages that need to accumulate in the local message queue before the message sequence is sent.

-queue.buffering.max.ms: The length of time to wait for Batch.num.messags to be implemented in the local queue.



1. Performance Index


The following tests are all used to configure the metrics using:


-intel Quad Core i7 at 3.4GHz, 8GB memory

-Test hard disk performance in a simple way by setting the configuration properties for brokers refresh.

log.flush.interval.messages=10000000

log.flush.interval.ms=100000

-Two brokers and Librdkafka are running on the same machine

-Each topic has two partitions

-Each broker is just a partitions leader

-Test with Rdkafka_performance under sub-directory example


Test results (note that only producer tests are available in the original text)

Test1:2 Brokers, 2 partitions, required.acks=2, byte messages:850000 messages/second, Mb/second


Test2:1 broker, 1 partition, Required.acks=0, byte messages:710000 messages/second, Mb/second


Test3:2 Broker2, 2 partitions, required.acks=2, byte messages, snappy compression:300000 Messages/second, Mb/seco nd


Test4:2 Broker2, 2 partitions, required.acks=2, byte messages, gzip compression:230000 Messages/second, Mb/second


Note: Detailed test information will be described at the end of this article

Note: The test results of the consumer will soon be mended.


2. High throughput rate


The key to high throughput is the message batch implementation--first, a certain number of messages are accumulated in the local message queue before they are sent out. This reduces messaging consumption and reduces the undesirable impact of back-and-forth requests.


Default settings: batch.num.messages=1000,queue.buffering.max.ms=1000, which facilitates high throughput rates. The default setting allows Librdkafka to wait for 1000ms before it sends a cumulative message to the broker, accumulating up to 1000 messages. Whichever of these two attributes satisfies first, stops the message from accumulating and sending, regardless of whether another property is satisfied.


Although these settings are global (implemented in the rd_kafka_conf_t structure), they apply to each top+partitions basic composition.


3. Low Latency


When a message is required to send a low latency, "queue.buffering.max.ms" should meet the maximum allowable value for producer-side delay. Queue.buffering.max.ms set to 0 will cause the message to be sent as soon as possible.


3. Compression


Producer message compression is implemented through the COMPRESSION.CODEC configuration property.


Compression is the batch processing of messages in the local queue, and the higher the number of batch messages, the higher the compression rate. The capacity of the local batch queue depends on the "batch.num.messages" "queue.buffering.max.ms" configuration attribute, which is discussed in the upper High throughput section.




Ii. Reliability of messages


Message reliability is an important indicator of LIBRDKAFKA----practical applications can guarantee message reliability with two specific settings ("Reuqest.required.acks" and "message.send.max.retries").


If the topic configuration property "Request.reuired.acks" (except for values other than 0, to see the specifics) is set to wait for brokers to receive a confirmation reply, Librdkafka will save the message until all expected ACKs is received. This is a good way to handle the event:

-brokers Connection Failed

-topic leader Changes

-brokers Notification Produce Error


These are librdkafka automated and do not need to be processed for any of these events in the application. The message can be retained for "message.send.max.retires" before it receives the failed feedback.


Librdkafka uses a callback function to respond to different sending reports, that is, to respond to different message delivery states, which will invoke a response callback function when each message delivery state is received.

-If Error_code is not 0, the message fails to send, Error_code indicates the reason for the failure (rd_kafka_resp_err_t enum)

-If Error_code is 0, the message is sent successfully


More details are required to see the Producer API section.


The callback function that sends the report is optional.




Third, usage


1. Introduction of documents

Librdkafka API description in Rdkafka.h, configuration properties are described in configuration.md.


2. Initialization


In practice, you need to create a top-level object rd_kafka_t, which is the basic container that provides global configuration properties and shared state information, which is created by the rd_kafka_new () function.


It is also necessary to create one or more topics object rd_kafka_topic_t for Produer and consumer. The topic object has topic-specific configuration properties and also contains all the available partitions and leader brokers mappings. It is created by calling the Rd_kafka_topic_new () function.


Both of these objects contain configurable APIs. By default, the default value is called, and the specific property default value is described in Configuration.md.


Note: In real-world applications, multiple rd_kafka_t objects may be created, and they do not share state information

Note: The Rd_kafka_topic_t object can only be used by the object that created it rd_kafka_t.


3. Configuration


To simplify integration with Kafka and shorten the learning curve, Librdkafka implementation configuration properties can be found in the Kafka official client.


Before you create an object, you need to configure it with Rd_kafka_conf_set () and the Rd_kafka_topic_conf_set () function.


Note: Rd_kafka. _conf_t objects in Rd_kafka. The _new () function cannot be reused after use, and in RD_KAKFA. After the _new () function is called, you do not need to dispose of the configuration resource.


Example:


rd_kafka_conf_t * Conf;char errstr[512];conf = rd_kafka_conf_new (); Rd_kafka_conf_set (conf, "Compression.codec", " Snappy ", Errstr, sizeof (ERRSTR)), Rd_kafka_conf_set (conf," Batch.num.messages "," N ", Errstr, sizeof (ERRSTR)); Rd_ Kafka_new (rd_kafka_producer,conf);


4. Threading and Callback functions


There will be multiple threads inside the Librdkafka to take full advantage of the hardware resources. The implementation of the API is fully thread-safe, and the actual application can invoke any API function in any thread at any time without worrying about thread safety.


A polling-based API is used to provide signal feedback to the actual application, and the actual application should call the Rd_kafka_poll () function at regular intervals. The polling API will invoke the following callbacks (all optional):


-Message Send Report callback: report message failed to send. This will allow the actual application to take action to respond to the sending failure and release the resources that were held during the message sending process.


-Error callback: report error, error is generally informational, such as connection Broker failure, practical applications usually do not need to take any action. The wrong data type is through the rd_kafka_resp_err_t enum type data, which can describe local errors and remote broker errors.


The optional callback function that is not caused by the poll function may be thrown by any thread:


-logging callback: The actual application used to send the log message generated by the Librdkafka.

-partitioner callback: The partitioner of the actual application that provides the message. Partitioner may be called by any thread at any time, and it may be called multiple times because of the same key. The Partitioner function has the following limitations:

Must not call rd_kafka_* () and other functions

Must not block or prolong execution

Be sure to return a value between 0 and partition_cnt-1, or to return a specific Rd_kafka_partition_ua value when partitioning cannot execute.


5, Brokers


Librdkafka only need a copy of the original brokers list (at least one broker). It connects all the "Metadata.broker.list" or rd_kafka_brokers_add () functions added brokers, and then applies some metadata information to each brokers: a complete list of brokers, topic, partitions and their leaders broker information in the Kafka cluster.


The brokers name is in the form of: Host:port; where port is optional, the default is 9092,host is any hostname or IPv4 or IPv6 address that can be parsed. If host is multiple addresses, Librdkafka will loop through each connection attempt. DNS records that contain all broker addresses can be used to provide a reliable bootstrap broker.



6. Producer API


After you have set up the Rd_kafka_t object with Rd_kafka_producer, you can create one or more rd_kafka_topic_t objects to accept information or send messages.


The Rd_kafka_produce () function requires the following parameters:

-rkt:topic, created by the preceding rd_kafka_topic_new () function

-partition:partition, if it is Rd_kafka_partition_ua, the configured Partitioner function will select the target partition

-payload,len: Message body

-msgflags:0 or one of the following values:

Rd_kafka_msg_f_copy:librdkafka copies the message before it is sent to prevent the cache of the message body from being used for a long time, such as a stack.

Rd_kafka_msg_f_free:librdkafka the message cache is released after the message is used.

The two flags are mutually exclusive and can only be set to indicate whether they are copied or released.

If the RD_KAFKA_MSG_F_COPY flag is not set, there is no copy of the data, and Librdkafka will take possession of the message payload pointer until the message has been sent or failed. The Send report callback function will be called when Librdkafka makes the actual call regain control of the payload cache. When Rd_kafka_msg_f_free is set, the actual call must not release payload in the Send report callback function.


-key,keylen: Optional parameter, message keyword, can be used for partitioning. It is passed to the topic Partitioner callback and, if present, is added to the message sent to the broker.

-msg_opaque: An optional parameter, a transparency pointer for each message, provided by the messaging callback that enables the application to reference a specific message.


Rd_kafka_produce () is a non-blocking API that stores messages in an internal queue and returns immediately. If the number of queued messages exceeds the configured "Queue.buffering.max.messages" attribute, the Rd_kafka_produce () function will return 1 and set errno to ENOBUFS, providing a mechanism for coping with the stress.


Note: EXAMPLES/RDKAFKA_PERFORMANCE.C provides the implementation of the producer.



7. Consumer API


The consumer API is more stateful than the producer API. After you create the Rd_kafka_t object using the Rd_kafka_consumer type and then create the Rd_kakfa_topic_t object, you must call Rd_kafka_consumer_start () in the actual app The function starts the consumer of the given partition.


Parameters of the Rd_kafka_consume_start () function:

-rkt: Consume topic, created by the front rd_kafka_topic_new ()

-partition: The partition of consume

-offset: Begins the consume message offset. This offset may be an absolute message offset, or rd_kakfa_offset_stored to use the stored offset, or one of the two specific offsets: rd_kafka_offset_beginning, Consume;rd_kafka_offset_end from the beginning of the partition message queue: Start with the next message that will be produce in partition (ignoring all current messages).


After the consumer of Topic+partition is started, Librdkafka will attempt to keep the number of messages in the local message queue at Queued.min.messages, with one side repeatedly getting messages from the broker.


The local message queue will be consume through the following three different consum APIs:

-rd_kafka_consume (): Consume a message at a time

-rd_kafka_consume_batch (): batch consume, one or more

-rd_kafka_consume_callback (): Consume all messages in the local message queue and calls the callback function to process each message


The above three ways are ranked by performance, Rd_kafka_consume () is the slowest, rd_kafka_consume_callback () the fastest. Different requirements can be selected in different ways of implementation.


A consumed message, provided or returned by each of the consume functions, is saved by the Rd_kafka_messag_t type object.


rd_kafka_message_t Object Members:

-err: Error return value. A value other than 0 indicates that an error occurred and err is a rd_kafka_resp_err_t type data. If it is 0, the appropriate message fetch is carried out, and the payload contains message.

-rkt,partition:topic and partition information

-payload,len: Payload data or error message for the message (err! =0)

-key,key_len: Optional parameter, mainly used to obtain a specific message.

-offset: Offset address of message


Payload,key, like messages, belong to Librdkafka and can no longer be used after the Rd_kafka_message_destroy () function call. Librdkafka will use the same message set to receive the cache to hold the playloads of the message set, avoiding over-copying, which means that if the actual application decides to suspend a separate rd_kafka_message_t object, this will hinder subsequent cache deallocation.


When the actual application completes the consume message, the Rd_kafka_consume_stop () function should be called to stop the consumer. This will eliminate any messages in the local queue.


Note: The EXAMPLES/RDKAFKA_PERFORMANCE.C implements the consumer.



8. Offset management


Offset management can be done by saving the file with local offset, which periodically writes the configuration properties of each topic+partition:

-auto.commit.enable

-auto.commit.interval.ms

-offset.store.path

-offset.store.sync.interval.ms


The current zookeeper also does not support offset management.


9, Consumer groups


Consumer groups is not currently supported, Librdkafka consumer API compiles only the official Scala simple version of consumer. Only Librdkafka can support this application and you can have your consumer group.



10, Topics

Topic automatically created

Topic Auto-creation is supported. Brokers needs to be configured with "Auto.create.topics.enable=true".



Iv. Other:


Test Details:

Test1:produce to brokers, partitions, required.acks=2, byte messages

Each broker was leader for one of the partitions. The random partitioner is used (default) and each broker and partition are assigned approximately 250000 messages each.

Command:

# examples/rdkafka_performance -P -t test2 -s 100 -c 500000 -m "_____________Test1:TwoBrokers:500kmsgs:100bytes" -S 1 -a 2....% 500000 messages and 50000000 bytes sent in 587ms: 851531 msgs/s and 85.15 Mb/s, 0 messages failed, no compression

Result:

Message transfer rate is approximately 850000 messages per second, megabytes per second.

Test2:produce to one broker, one partition, required.acks=0, + byte messages

Command:

# examples/rdkafka_performance -P -t test2 -s 100 -c 500000 -m "_____________Test2:OneBrokers:500kmsgs:100bytes" -S 1 -a 0 -p 1....% 500000 messages and 50000000 bytes sent in 698ms: 715994 msgs/s and 71.60 Mb/s, 0 messages failed, no compression

Result:

Message transfer rate is approximately 710000 messages per second, megabytes per second.

Test3:produce to brokers, partitions, required.acks=2, byte messages, snappy compression

Command:

# examples/rdkafka_performance -P -t test2 -s 100 -c 500000 -m "_____________Test3:TwoBrokers:500kmsgs:100bytes:snappy" -S 1 -a 2 -z snappy....% 500000 messages and 50000000 bytes sent in 1672ms: 298915 msgs/s and 29.89 Mb/s, 0 messages failed, snappy compression

Result:

Message transfer rate is approximately 300000 messages per second, and megabytes per second.

Test4:produce to brokers, partitions, required.acks=2, byte messages, gzip compression

Command:

# examples/rdkafka_performance -P -t test2 -s 100 -c 500000 -m "_____________Test3:TwoBrokers:500kmsgs:100bytes:gzip" -S 1 -a 2 -z gzip....% 500000 messages and 50000000 bytes sent in 2111ms: 236812 msgs/s and 23.68 Mb/s, 0 messages failed, gzip compression

Result:

Message transfer rate is approximately 230000 messages per second, at megabytes per second.


Introduction to Librdkafka-the Apache Kafka/C + + client library translation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.