Kafka Offset Storage

Last Update:2017-01-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Overview

At present, the latest version of the Kafka official website [0.10.1.1], has been defaulted to the consumption of offset into the Kafka a topic named __consumer_offsets. In fact, back in the 0.8.2.2 Version, the offset to topic is supported, but the default is to store the offset of consumption in the Zookeeper cluster. Now, the official default stores the offset of consumption in Kafka's topic, and also retains the interface stored in Zookeeper, which is set by the Offsets.storage attribute.

2. Content

In fact, the official recommendation, there is a reason. Previous version, Kafka in fact there is a relatively large hidden trouble, is to use Zookeeper to store records of each consumer/group of consumption progress. Although, in the process of use, the JVM has helped us to do some optimizations, but consumers need to interact with Zookeeper frequently, and using Zkclient API operations Zookeeper Frequent write is itself a relatively inefficient action, For later level expansion is also a headache. If the Zookeeper cluster changes during the period, the throughput of the Kafka cluster is also affected.

After this, the official actually very early moved to Kafka concept, just, previously was stored in the Zookeeper cluster by default, need to manually set, if the use of Kafka is not very familiar with, generally we accept the default storage (i.e., the existence of ZK). In the new Kafka and later versions, the offset of Kafka consumption is stored by default in a topic called __consumer_offsets in the Kafka cluster.

Of course, her realization of the principle also let us very familiar with the use of Kafka own Topic, to consume the group,topic, as well as partition as a combination of Key. All of the consumption offset is submitted to the above topic. Because this part of the message is so important that it is intolerable to lose data, the acking level of the message is set to 1, and the producer waits until all the ISR receives the message before it gets an ACK (the data security is excellent, of course, its speed will be affected). So Kafka also maintains a ternary group of Group,topic and Partition in memory to maintain the latest offset information, which is captured directly from memory when consumers get the latest offset.

3. Implement

So how do we get offset for this part of the consumption, we can define a map collection in memory to maintain the offset captured in consumption, as follows:

protected Static New Concurrenthashmap<> ();

We then update the in-memory map with a listener thread, as shown in the following code:

Private Static synchronized voidStartoffsetlistener (Consumerconnector consumerconnector) {Map<string, integer> topiccountmap =NewHashmap<string, integer>(); Topiccountmap.put (Consumeroffsettopic,NewInteger (1)); Kafkastream<byte[],byte[]> Offsetmsgstream = Consumerconnector.createmessagestreams (Topiccountmap). Get (Consumeroffsettopic). Get (0); Consumeriterator<byte[],byte[]> it =Offsetmsgstream.iterator ();  while(true) {Messageandmetadata<byte[],byte[]> offsetmsg =It.next (); if(Bytebuffer.wrap (Offsetmsg.key ()). Getshort () < 2) {                Try{grouptopicpartition Commitkey=Readmessagekey (Bytebuffer.wrap (Offsetmsg.key ())); if(Offsetmsg.message () = =NULL) {                        Continue; } offsetandmetadata Commitvalue=Readmessagevalue (Bytebuffer.wrap (Offsetmsg.message ()));                Offsetmap.put (Commitkey, Commitvalue); } Catch(Exception e) {e.printstacktrace (); }            }        }    }

After getting this part of the updated offset data, we can share this part of the data through RPC, and let the client get this part of the data and visualize it. The RPC interface looks like this:

namespace Java org.smartloli.kafka.eagle.ipcservice kafkaoffsetserver{    string query (1:string group,2: String topic,3: I32 partition),    string GetOffset (),    String sql (1: String sql),    string Getconsumer (),    string Getactiverconsumer ()}

Here, if we do not want to write interface to operate offset, we can use SQL to manipulate the consumption of the offset array, using the following way:

Introducing a dependent jar

<Dependency>    <groupId>Org.smartloli</groupId>    <Artifactid>Jsql-client</Artifactid>    <version>1.0.0</version></Dependency>

Using interfaces

Jsqlutils.query (Tabschema, TableName, dataSets, SQL);

Tabschema: Table structure; tableName: Table name; dataSets: DataSet; SQL: Operation SQL statement.

4. Preview

The consumer preview looks like this:

The diagram being consumed is as follows:

Consumer detailed offset is shown below:

The rate graph for consumption and production is as follows:

5. Summary

Here, the consumption thread ID information is not recorded when offset is deposited into the topic of Kafka, however, after we read the composition rules of the KAFKA consumer thread ID, we can manually generate the consumer thread ID by: group+ Consumerlocaladdress+timespan+uuid (8bit). Finally, you are welcome to use Kafka cluster monitoring--[Kafka Eagle],[operation manual].

6. Concluding remarks

This blog is to share with you here, if you study in the process of learning what is the problem, you can add groups to discuss or send e-mail to me, I will do my best to answer for you, with June encouragement!

Kafka Offset Storage

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Kafka Offset Storage

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Kafka Offset Storage

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support