Original:https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+ExampleWhy use the high level Consumer
In some scenarios, we want to read messages through multithreading, and we don't care about the order in which messages are consumed from Kafka, we just care about the data being consumed. High level is used to abstract this kind of consumption action.
Message consumption has been consumer group, each consumer group can have multiple consumer, each consumer is a thread, topic each partition can only be read by one consumer, Each partition of the Consumer group has an up-to-date offset value, stored on the zookeeper. Therefore, there will be no duplication of consumption situation.
- Because consumer's offerset is not transmitted to zookeeper in real time (through configuration to make the update cycle), consumer if suddenly crash, it is possible to read duplicate information
design High Level Consumer
The high level Consumer can and should be used in multi-threaded environments, the number of threads in the threading model (which also represents the number of Consumer in group) and the partition number of topic, and some rules are listed below:
- When the number of threads provided exceeds the number of partition, some threads will not receive the message;
- When the number of threads provided is less than the number of partition, some threads will receive messages from multiple partition;
- When a thread receives a message from more than one partition, it does not guarantee the order of receiving the message, it may receive 5 messages from Partition3, receive 6 messages from Partition4, and then receive 10 messages from Partition3;
- When adding more threads, it causes Kafka to do re-balance, which may change the correspondence between partition and threads.
- In order to avoid this situation, by Thread.Sleep (10000), let consumer have time to synchronize offset to zookeeper because of the sudden stop of consumer and the fact that the broker causes the message to be read repeatedly shutdown
Examplemaven Dependency
<!--Kafka messages-- <dependency> <groupId>org.apache.kafka</groupId> < artifactid>kafka_2.10</artifactid> <version>0.8.2.0</version> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId> kafka-clients</artifactid> <version>0.8.2.0</version> </dependency>
Consumer Threads
Import Kafka.consumer.consumeriterator;import Kafka.consumer.kafkastream;import kafka.message.MessageAndMetadata; public class Consumerthread implements Runnable {private kafkastream kafkastream;//thread number private int threadnumber; public Consumerthread (kafkastream kafkastream, int threadnumber) {this.threadnumber = Threadnumber; This.kafkastream = Kafkastream; } public void Run () {consumeriterator<byte[], byte[]> it = Kafkastream.iterator (); StringBuffer sb = new StringBuffer ();//The loop will continue to read data from Kafka until the process is manually interrupted while (It.hasnext ()) {Messageandmetadata MetaData = It.next (); Sb.append ("Thread:" + threadnumber + ""); Sb.append ("part:" + metadata.partition () + ""); Sb.append ("Key:" + metadata.key () + ""); Sb.append ("Message:" + metadata.message () + ""); Sb.append ("\ n"); System.out.println (Sb.tostring ()); } System.out.println ("Shutting down Thread:" + Threadnumber);}}
remaining Programs
Import Kafka.consumer.consumerconfig;import Kafka.consumer.kafkastream;import Kafka.javaapi.consumer.ConsumerConnector; Import Java.util.hashmap;import java.util.list;import Java.util.map;import Java.util.properties;import Java.util.concurrent.executorservice;import java.util.concurrent.Executors; public class Consumergroupexample {private final consumerconnector consumer; Private final String topic; Private Executorservice executor; Public Consumergroupexample (String a_zookeeper, String a_groupid, String a_topic) {consumer = Kafka.consumer.Consu Mer.createjavaconsumerconnector (Createconsumerconfig (A_zookeeper, a_groupid)); This.topic = A_topic; } public void Shutdown () {if (consumer! = null) Consumer.shutdown (); if (executor! = null) Executor.shutdown (); } public void run (int a_numthreads) {map<string, integer> topiccountmap = new hashmap<string, Integer > (); Topiccountmap.put (topic, NEW Integer (a_numthreads)); The returned MAP contains all the topic as well as the corresponding Kafkastream map<string, list<kafkastream<byte[], byte[]>>> consumermap = C Onsumer.createmessagestreams (TOPICCOUNTMAP); List<kafkastream<byte[], byte[]>> streams = consumermap.get (topic); Create Java thread pool executor = Executors.newfixedthreadpool (a_numthreads); Create consume thread consumption messages int threadnumber = 0; For (final Kafkastream stream:streams) {executor.submit (new consumertest (Stream, threadnumber)); threadnumber++; }} private static Consumerconfig Createconsumerconfig (String a_zookeeper, String a_groupid) {Properties PR OPS = new Properties (); Specifies the connected zookeeper cluster through which to store Offerset props.put ("Zookeeper.connect", A_zookeeper) connected to a consumer of a partition; Consumer group's ID props.put ("group.id", a_groupid); Kafka Wait Zookeeper response Time (ms) Props.put ("zookeeper.session.timeout.ms", "400"); ZooKeeper's ' follower ' can lag behind master for how many milliseconds props.put ("zookeeper.sync.time.ms", "200"); Consumer update Offerset to Zookeeper time Props.put ("auto.commit.interval.ms", "1000"); return new Consumerconfig (props); } public static void Main (string[] args) {String zooKeeper = args[0]; String groupId = args[1]; String topic = args[2]; int threads = Integer.parseint (args[3]); consumergroupexample example = new Consumergroupexample (ZooKeeper, GroupId, topic); Example.run (threads); Because consumer's offerset is not delivered to zookeeper in real time (through configuration to make an update cycle), shutdown consumer threads may read duplicate information//increase sleep time, Let consumer synchronize offset to zookeeper try {Thread.Sleep (10000); } catch (Interruptedexception IE) {} example.shutdown (); }}
Design the Kafka High level Consumer