Kafka Offset Storage

Source: Internet
Author: User

1. Overview

At present, the latest version of the Kafka official website [0.10.1.1], has been defaulted to the consumption of offset into the Kafka a topic named __consumer_offsets. In fact, back in the 0.8.2.2 Version, the offset to topic is supported, but the default is to store the offset of consumption in the Zookeeper cluster. Now, the official default stores the offset of consumption in Kafka's topic, and also retains the interface stored in Zookeeper, which is set by the Offsets.storage attribute.

2. Content

In fact, the official recommendation, there is a reason. Previous version, Kafka in fact there is a relatively large hidden trouble, is to use Zookeeper to store records of each consumer/group of consumption progress. Although, in the process of use, the JVM has helped us to do some optimizations, but consumers need to interact with Zookeeper frequently, and using Zkclient API operations Zookeeper Frequent write is itself a relatively inefficient action, For later level expansion is also a headache. If the Zookeeper cluster changes during the period, the throughput of the Kafka cluster is also affected.

After this, the official actually very early moved to Kafka concept, just, previously was stored in the Zookeeper cluster by default, need to manually set, if the use of Kafka is not very familiar with, generally we accept the default storage (i.e., the existence of ZK). In the new Kafka and later versions, the offset of Kafka consumption is stored by default in a topic called __consumer_offsets in the Kafka cluster.

Of course, her realization of the principle also let us very familiar with the use of Kafka own Topic, to consume the group,topic, as well as partition as a combination of Key. All of the consumption offset is submitted to the above topic. Because this part of the message is so important that it is intolerable to lose data, the acking level of the message is set to 1, and the producer waits until all the ISR receives the message before it gets an ACK (the data security is excellent, of course, its speed will be affected). So Kafka also maintains a ternary group of Group,topic and Partition in memory to maintain the latest offset information, which is captured directly from memory when consumers get the latest offset.

3. Implement

So how do we get offset for this part of the consumption, we can define a map collection in memory to maintain the offset captured in consumption, as follows:

protected Static New Concurrenthashmap<> ();

We then update the in-memory map with a listener thread, as shown in the following code:

Private Static synchronized voidStartoffsetlistener (Consumerconnector consumerconnector) {Map<string, integer> topiccountmap =NewHashmap<string, integer>(); Topiccountmap.put (Consumeroffsettopic,NewInteger (1)); Kafkastream<byte[],byte[]> Offsetmsgstream = Consumerconnector.createmessagestreams (Topiccountmap). Get (Consumeroffsettopic). Get (0); Consumeriterator<byte[],byte[]> it =Offsetmsgstream.iterator ();  while(true) {Messageandmetadata<byte[],byte[]> offsetmsg =It.next (); if(Bytebuffer.wrap (Offsetmsg.key ()). Getshort () < 2) {                Try{grouptopicpartition Commitkey=Readmessagekey (Bytebuffer.wrap (Offsetmsg.key ())); if(Offsetmsg.message () = =NULL) {                        Continue; } offsetandmetadata Commitvalue=Readmessagevalue (Bytebuffer.wrap (Offsetmsg.message ()));                Offsetmap.put (Commitkey, Commitvalue); } Catch(Exception e) {e.printstacktrace (); }            }        }    }

After getting this part of the updated offset data, we can share this part of the data through RPC, and let the client get this part of the data and visualize it. The RPC interface looks like this:

namespace Java org.smartloli.kafka.eagle.ipcservice kafkaoffsetserver{    string query (1:string group,2: String topic,3: I32 partition),    string GetOffset (),    String sql (1: String sql),    string Getconsumer (),    string Getactiverconsumer ()}

Here, if we do not want to write interface to operate offset, we can use SQL to manipulate the consumption of the offset array, using the following way:

    • Introducing a dependent jar
<Dependency>    <groupId>Org.smartloli</groupId>    <Artifactid>Jsql-client</Artifactid>    <version>1.0.0</version></Dependency>
    • Using interfaces
Jsqlutils.query (Tabschema, TableName, dataSets, SQL);

Tabschema: Table structure; tableName: Table name; dataSets: DataSet; SQL: Operation SQL statement.

4. Preview

The consumer preview looks like this:

The diagram being consumed is as follows:

Consumer detailed offset is shown below:

The rate graph for consumption and production is as follows:

5. Summary

Here, the consumption thread ID information is not recorded when offset is deposited into the topic of Kafka, however, after we read the composition rules of the KAFKA consumer thread ID, we can manually generate the consumer thread ID by: group+ Consumerlocaladdress+timespan+uuid (8bit). Finally, you are welcome to use Kafka cluster monitoring--[Kafka Eagle],[operation manual].

6. Concluding remarks

This blog is to share with you here, if you study in the process of learning what is the problem, you can add groups to discuss or send e-mail to me, I will do my best to answer for you, with June encouragement!

Kafka Offset Storage

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.