Installation:
Download Https://github.com/edenhill/librdkafka
Preparation environment:
The GNU Toolchain
GNU make
pthreads
zlib (optional, for gzip compression support)
Libssl-dev (optional, For SSL and SASL scram support)
Libsasl2-dev (optional, for SASL GSSAPI support)
Compile and install:
./configure
make
sudo make install
server-side open
Download: https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz Note The version selection, this article selected is the release version.
The release version is described in Http://kafka.apache.org/quickstart.
To Open Zookeeper server:
bin/zookeeper-server-start.sh Config/zookeeper.properties &
To view the use of Port 2181:
Note that the zookeeper is now started.
To Open Kafka server:
bin/kafka-server-start.sh Config/server.properties &
From the configuration file we can know that the port number that the service occupies is 9092:
The following use of course can be used in the bin directory of the various scripts, but this article is mainly about the operation of the Lib mode. Usage Description: How to use producer:
Create a Kafka client configuration placeholder:
conf = Rd_kafka_conf_new (); A Configuration object (rd_kafka_conf_t) is created. And the configuration of brokers by Rd_kafka_conf_set.
Set callback for information:
To feedback the success or failure of sending information. by RD_KAFKA_CONF_SET_DR_MSG_CB (conf, DR_MSG_CB);
To create an producer instance:
1) Initialize:
The application needs to initialize the underlying container for a top-level object (rd_kafka_t) for global configuration and shared state.
Created by calling Rd_kafka_new (). Once created, the instance occupies the Conf object, so the Conf object cannot be reused after the rd_kafka_new () call, and it does not need to release the configuration resource after the Rd_kafka_new () call.
2) Create topic:
The Topi object created is reusable (the Producer Instantiation Object (rd_kafka_t) is also allowed to be reused, so there is no need to create them frequently)
Instantiate one or more topic (rd_kafka_topic_t) for production or consumption.
The topic object holds the properties of the topic level and maintains a map,
The map holds all available partition and their leader broker.
Created by calling Rd_kafka_topic_new () (Rd_kafka_topic_new (RK, topic, NULL);).
Note: Both rd_kafka_t and rd_kafka_topic_t originate from the optional configuration API.
Not using the API will cause Librdkafka to use the default configuration listed in document CONFIGURATION.MD.
3) Producer API:
By calling Rd_kafka_producer to set one or more rd_kafka_topic_t objects, you are ready to receive messages and assemble and send them to the broker.
The Rd_kafka_produce () function accepts the following parameters:
RKT: Topic to be produced, previously generated by Rd_kafka_topic_new ()
Partition: Production of partition. If set to Rd_kafka_partition_ua (not assigned), a builtin Partitioner is selected to determine PARTITION. Kafka will callback partitioner for balanced selection, Partitioner method needs to be implemented by itself. You can poll or pass in the key to hash. If not implemented, the default random method Rd_kafka_msg_partitioner_random random selection is used.
You can try to design the partition value by Partitioner.
msgflags:0 or the following value:
Rd_kafka_msg_f_copy says Librdkafka makes a copy of the message from payload immediately before it is sent. If payload is an unstable store, such as a stack, this parameter needs to be used. This is to prevent the message subject's cache from being used for a long time before copying the information beforehand.
Rd_kafka_msg_f_free says that when payload is finished, let Librdkafka release with free (3). The message cache is freed after the message is used.
The two flags are mutually exclusive, and if none are set, the payload will neither be copied nor released by Librdkafka.
If the RD_KAFKA_MSG_F_COPY flag is not set, there will be no copy of the data, and Librdkafka will occupy the payload pointer (the message body) until the message is sent or failed. When the Librdkafka finishes processing the message, it invokes the Send report callback function, allowing the application to regain ownership of the payload.
If Rd_kafka_msg_f_free is set, the application does not release payload in the Send report callback function.
Payload,len: MSG Payload (message payload, value), message length
Key,keylen: Optional message key and its length for partitioning. will be used for the topic partition callback function, if any, will be appended to the message sent to the broker.
Msg_opaque: Optional, the application provides an untyped pointer for each message that is provided to the message to send a callback function for application reference.
Rd_kafka_produce () is a non-blocking API that plugs messages into an internal queue and returns immediately .
If the number of messages in the queue exceeds the value configured by the Queue.buffering.max.messages property, Rd_kafka_produce () returns an error by returning 1 and setting the errno to enobufs such an error code.
Hint: see examples/rdkafka_performance.c get the use of the producer. How to use consumer:
The consumer API is more stateful than the producer API. The Rd_kafka_t object is created using the Rd_kafka_consumer type (the function parameter that is set when rd_kafka_new is called), and the KAFKA handle (RK) of the above new by calling Rd_kafka_brokers_add To add the broker (Rd_kafka_brokers_add (RK, brokers)),
Then after you create the Rd_kakfa_topic_t object,
Rd_kafka_query_watermark_offsets
Create topic:
RTK = rd_kafka_topic_new (RK, topic, topic_conf)
Start spending:
Call the Rd_kafka_consumer_start () function (Rd_kafka_consume_start (RKT, Partition, Start_offset)) to start partition for a given consumer.
The parameters required to call Rd_kafka_consumer_start are as follows:
RKT: topic to be consumed, previously created by Rd_kafka_topic_new ().
Partition: The partition to consume.
Offset: The message offset at which the consumption begins. Can be an absolute value or two special offsets:
Rd_kafka_offset_beginning from the beginning of the queue of the partition (the earliest message).
Rd_kafka_offset_end starts consuming the next message generated from the partition.
Rd_kafka_offset_stored uses offset storage.
When a topic+partition consumer is started, Librdkafka constantly tries to get messages from the broker in bulk to keep the local queue with a queued.min.messages number of messages.
Local Message Queuing provides services to applications through 3 different consumer APIs:
Rd_kafka_consume ()-Consumes a message
rd_kafka_consume_batch ()-consumes one or more messages
rd_kafka_consume_callback ()-consumes all messages in the local queue, And each one calls a callback function
The performance of these three APIs is in ascending order, Rd_kafka_consume () slowest, rd_kafka_consume_callback () the fastest. Different types meet the needs of different applications.
The message consumed by the above function returns the rd_kafka_message_t type.
Members of the rd_kafka_message_t:
* Err-Returns an error signal to the application. If the value is not 0, the payload field should be an error message, and err is an error code (rd_kafka_resp_err_t).
* rkt,partition-topic and partition or errors of messages.
* Payload,len-the message data or the wrong message (err!=0).
* Key,key_len-Optional message key, producer specified.
* Offset-message offset.
Either payload and key memory, or the entire message, is owned by Librdkafka and is not used after Rd_kafka_message_destroy () is called.
Librdkafka in order to avoid redundant copies of the message set, the same message set is shared for all messages received from the memory cache, which means that if the application retains a single rd_kafka_message_t, the memory will be blocked from being freed and used for other messages of the same message set.
When an application consumes messages from a topic+partition, it should call Rd_kafka_consume_stop () to end the consumption. This function also empties all messages in the current local queue.
Hint: see examples/rdkafka_performance.c get Consumer's use.
The server.properties file configuration (parameter log.dirs=/data2/logs/kafka/) in Kafka broker allows the topic written to the message queue to be stored in the form of a partition in that directory. Each partition partition is composed of segment file, and segment file consists of 2 parts: Index file and data file, 2 file one by one corresponding, paired, suffix ". Index" and ". Log" Represented as segment index files, data files, respectively.
Segment file naming rules: Partion The first segment of the global, starting with 0, each subsequent segment file name is the offset value of the last message in the previous segment file. The value is a maximum of 64 bits long, a 19-digit character length, and no number is filled with 0. Specific examples:
This article uses the CPP method, and the above described is only the function of the use of different, business logic is the same.
In the process of producer the direct use of Partition_ua but at the time of consumption, can not specify the partition value of Partition_ua because the value is actually-1, for the consumer side, is meaningless. According to the source code can know when not specifying partitioner, in fact there is a default partitioner, is Consistent-random partitioner so-called consistency random partitioner. The consistency hash maps the keyword to a specific partition after the map is mapped.
Function Prototypes:
Rd_kafka_msg_partitioner_consistent_random (
const rd_kafka_topic_t *rkt,
const void *key, size_t Keylen,
int32_t partition_cnt,
void *opaque, void *msg_opaque);
Partition_ua is actually the meaning of unassigned PARTITION, which is an unassigned partition. Rd_kafka_partition_ua (Unassigned) is a partitioner function that automatically uses topic, and of course it can take a fixed value directly.
In configuration file config/server.properties, you can set the number of partition num.partitions. Assigning Partitions
When assigning partitions, be aware of them. For a topic that has already created a partition and has already specified a partition, then the producer code after the code is directly modified in the Partitioner section, the direct introduction of the key value for partition redistribution, is not possible, will continue to follow the previous partition to add ( The previous partition is partition 0, only one). At this point, if you view partition_cnt in the program, we can see that the value is not changed because of config/server.properties modification, because the partition_cnt at this point is topic for the theme that has already been created.
If the partition_cnt in the code is used to calculate the partition value: Djb_hash (Key->c_str (), key->size ())% 5 results in the following: the hint partition does not exist.
We can check the number of partition in a topic by rdkafka_example:
./rdkafka_example-l-T Helloworld_kugou-b localhost:9092
From which we can see that the Helloworld_kugou theme has only one partition, and the HELLOWORLD_KUGOU1 theme is 5 partition, which matches what we expected.
We can modify its partition for a theme that has already been created:
./bin/kafka-topics.sh--zookeeper 127.0.0.1:2181--alter--partition 5--topic Helloworld_kugou
After the modification, we can see that the Helloworld_kugou has become 5 partitions.
Specific examples:
Create a topic of helloworld_kugou_test,5 partition. As we can see, there are already 5 partition in the pre-set log directory before the input on the producer side:
producer-side code:
Class Exampledeliveryreportcb:public Rdkafka::D ELIVERYREPORTCB {public:void DR_CB (rdkafka::message &message) {std::cout << Message delivery for (<< message.len () << bytes): "<< Message.err
STR () << Std::endl;
if (Message.key ()) std::cout << "key:" << * (Message.key ()) << ";" << Std::endl;
}
}; Class Exampleeventcb:public RDKAFKA::EVENTCB {public:void EVENT_CB (rdkafka::event &event) {switch (event. Type ()) {case Rdkafka::event::event_error:std::cerr << "ERROR (" << rdkafka::err2str (Event
. ERR ()) << "):" << event.str () << Std::endl;
if (event.err () = = Rdkafka::err__all_brokers_down) Run = false;
Break
Case Rdkafka::event::event_stats:std::cerr << "\" stats\ ":" << event.str () << Std::endl;
Break Case Rdkafka::event::event_log:fpriNTF (stderr, "log-%i-%s:%s\n", event.severity (), EVENT.FAC (). C_STR (), Event.str (). C_STR ());
Break Default:std::cerr << "EVENT" << event.type () << "(" << Rdkafka::err2str (Eve
Nt.err ()) << "):" << event.str () << Std::endl;
Break
}
}
}; /* Use of this partitioner are pretty pointless since no key is provided * in the produce () call.so when you need input yo ur key */class Myhashpartitionercb:public Rdkafka::P ARTITIONERCB {public:int32_t PARTITIONER_CB (const RD Kafka::topic *topic, const std::string *key,int32_t partition_cnt, void *msg_opaque) {std::cout<& lt; "
Partition_cnt= "<<partition_cnt<<std::endl;
Return Djb_hash (Key->c_str (), key->size ())% partition_cnt; } private:static Inline unsigned int djb_hash (const char *STR, size_t len) {unsigned int h Ash = 5381;
for (size_t i = 0; i < len; i++) hash = ((hash << 5) + hash) + str[i];
std::cout<< "hash1=" <
When validating on the consumer side, it can be found that different partition do write different data. The result is as follows:
consumer-side code:
void Msg_consume (rdkafka::message* Message, void* opaque) {switch (Message->err ()) {case Rdkafka::er
R__timed_out:break; Case RDKAFKA::ERR_NO_ERROR:/* Real message */std::cout << "Read msg at offset" << mes
Sage->offset () << Std::endl; if (Message->key ()) {std::cout << "key:" << *message->key () << std::
Endl } printf ("%.*s\n", Static_cast<int> (Message->len ()), Static_cast<const Char *> (Message->paylo
AD ()));
Break
Case RDKAFKA::ERR__PARTITION_EOF:/* Last Message */if (exit_eof) {
Run = false;
} break; Case Rdkafka::err__unknown_topic:case rdkafka::err__unknown_partition:std::cerr << "consume FA
iled: "<< message->errstr () << Std::endl; Run = false;
Break Default:/* Errors */std::cerr << "consume failed:" << message->errstr () << std::en
dl
Run = false; }} class Exampleconsumecb:public Rdkafka::consumecb {public:void CONSUME_CB (rdkafka::message &msg, V
OID *opaque) {msg_consume (&msg, opaque);
}
};
void Testconsumer () {std::string brokers = "localhost";
Std::string errstr;
std::string topic_str= "Helloworld_kugou_test";//helloworld_kugou MYHASHPARTITIONERCB Hash_partitioner; int32_t partition = Rdkafka::topic: Why can't I use the:P artition_ua;//? In consumer here can only write 0 ...
Not automatically ...
partition = 3;
int64_t start_offset = rdkafka::topic::offset_beginning;
BOOL Do_conf_dump = false;
int opt;
int USE_CCB = 0;
Create Configuration objects Rdkafka::conf *conf = Rdkafka::conf::create (Rdkafka::conf::conf_global); rdkafka::conf *tconf = rdkafka::conf::create (Rdkafka:: Conf::conf_topic); if (Tconf->set ("PARTITIONER_CB", &hash_partitioner, errstr)! = RDKAFKA::CONF::CONF_OK) {Std::cerr <
;< errstr << Std::endl;
Exit (1);
}/* * Set Configuration Properties */Conf->set ("Metadata.broker.list", brokers, ERRSTR);
EXAMPLEEVENTCB EX_EVENT_CB;
Conf->set ("EVENT_CB", &EX_EVENT_CB, ERRSTR);
EXAMPLEDELIVERYREPORTCB EX_DR_CB;
/* Set Delivery Report Callback */Conf->set ("DR_CB", &EX_DR_CB, ERRSTR);
/* * Create consumer using accumulated global configuration.
*/Rdkafka::consumer *consumer = rdkafka::consumer::create (conf, errstr);
if (!consumer) {std::cerr << "Failed to create Consumer:" << errstr << Std::endl;
Exit (1);
} std::cout << "% Created consumer" << consumer->name () << Std::endl;
/* * Create topic handle. */Rdkafka::topic *topic = Rdkafka::topIc::create (consumer, Topic_str, tconf, ERRSTR);
if (!topic) {std::cerr << "Failed to create topic:" << errstr << Std::endl;
Exit (1); }/* * Start consumer for topic+partition at start offset */Rdkafka::errorcode RESP = Consumer->sta
RT (Topic, Partition, Start_offset); if (resp! = Rdkafka