For Kafkaconsumer, it is not like kafkaproducer, not thread-safe, the state is maintained in the consumer, so the implementation should pay attention to the use of multi-threading, generally there are 2 ways to use: 1: Each consumer has its own thread, Consumer to pull data, and processing, this method is relatively simple, easy to implement, easy to process message 2: Consumer processor, create a thread pool, after the consumer pull the data, the thread pool to process the data, pull data and processing data decoupling, But data processing could destroy partition's message order from the Kafka document we can also find out about how consumer multi-threading is handled
Project Practice:
is the specific application of consumer in the project, although the thread pool is also used, but in fact it is the first way, the thread pool is only used to start the operation of consumer:
Describe:
Consumergroup class: This corresponds to the consumer group, in the 1th step above will create a listener object, which will be passed into the Consumergroup object creation process, in CG will create a Runnableconsumer object list (list), that is, the 3rd step , the number of consumer objects in the list will correspond to the desired number of consumer in the group. Also create a thread pool object executor, where the number of thread pools is consistent with the number of consumer
Runnableconsumer class: This is a thread class, implements the Runnable interface, inside creates a Kafkaconsumer object, thread launcher executes the subscription to topic, and pulls the message
Public classRunnableconsumer<k,v>ImplementsRunnable {PrivateConsumer<k,v>consumer; Private FinalIconsumerlistener<consumerrecords<k,v>>Listener; PrivateRunnableconsumer (FinalIconsumerlistener<consumerrecords<k,v>>Listener, Properties ... props) { This. Consumer =NewKafkaconsumer<>(props, Keydeserclass, valuedeserclass); This. Listener =Listener; } Public voidrun () {Try{ consumer.subscribe (topics); while(True) {consumerrecords<K,V> records =NULL; Try { //Now handle any new record (s)Records = Consumer.poll (1000); if(Records! =NULL&& records.count () > 0) { listener. Notify (Records); } } Catch(wakeupexception Wex) {logger.trace ("Got a wakeupexception. Doing nothing. Exception Details: ", Wex); } } } Catch(Throwable e) {//Ignore as we waking up from the poll, we need to cleanly shutdown our consumerLogger.error ("Getting non-recoverable throwable:", E); Throwe; } finally { //todo:need to check on consumer closing, the any outstanding offset commits are done. //Otherwise we need to manually does it here.Processcommit (Syncmode.sync); Logger.info ("Trying to close Kafka consumer, consumergroup.isrunning: {}", Consumergroup. This. isrunning); Consumer.close (); } }}
Listener class: This is a listening class, used to actually process a topic message, first create a listener object, in the creation of CG, registered to the Runnerconsumer class, if the consumer pull to the message, the message is notified to the listener class to be specific processing, Different business needs to define different business listener classes
Modify to the second way
If you want to use the second way, the data processing from the consumer, you can modify the above listener to a thread class, in the consumer pull to the message, then remove the thread from the thread pool processing data, one of the biggest problems, is how to ensure that messages are processed sequentially, for example, if there are 2 messages in a partition, and when consumer poll to the message, it commits to 2 threads, which does not guarantee sequential processing and requires an additional thread synchronization mechanism. And because the data is not required to be processed in consumer, the performance of consumer is increased, and the data processing time-out, consumer rebalance and other potential problems are avoided.
Records = Consumer.poll (+); if null && records.count () > 0) { executor.submit(new Listener (Records));}
Reference:
http://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#multithreaded
https://howtoprogram.xyz/2016/05/29/create-multi-threaded-apache-kafka-consumer/
Kafka consumer Multi-threaded processing in the project