Java provides a convenient API for Kafka message processing. Briefly summarize:
Study reference:http://www.itnose.net/st/6095038.html
POM Configuration (see http://www.cnblogs.com/huayu0815/p/5341712.html for log4j configuration)
<dependencies> <dependency> <groupId>org.apache.kafka</groupId> <ARTIFAC Tid>kafka_2.10</artifactid> <version>0.8.2.0</version> <exclusions> <exclusion> <groupId>log4j</groupId> <artifactid>log4j</a rtifactid> </exclusion> <exclusion> <groupId>org.slf4j< /groupid> <artifactId>slf4j-log4j12</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupid>ch.qos.logback</groupid> ; <artifactId>logback-core</artifactId> <version>1.1.2</version> </dependency> <dependency> <groupId>ch.qos.logback</groupId> <artifactid>logback-access< ;/artifactid&Gt <version>1.1.2</version> </dependency> <dependency> <groupid>ch.qos.logba Ck</groupid> <artifactId>logback-classic</artifactId> <version>1.1.2</version > </dependency> <dependency> <groupId>org.slf4j</groupId> <artif actid>log4j-over-slf4j</artifactid> <version>1.7.7</version> </dependency> </ Dependencies>
PRODUCER
Import Kafka.javaapi.producer.producer;import Kafka.producer.keyedmessage;import kafka.producer.ProducerConfig; Import Java.util.properties;public class Kafkaproducer {producer<string, string> Producer; /* #指定kafka节点列表, to get metadata, do not have to specify metadata.broker.list=192.168.2.105:9092,192.168.2.106:9092 # to specify the partition processing class. By default Kafka.producer.DefaultPartitioner, the table is hashed to the corresponding partition #partitioner by key. class= Com.meituan.mafka.client.producer.CustomizePartitioner # is compressed, default 0 means no compression, 1 is compressed with gzip, 2 means compressed with snappy. After compressing the message there will be a header to indicate the type of message compression, so the consumer-side message decompression is transparent without specifying. Compression.codec=none # Specifies the serialization processing class (Mafka Client API call Description-->3. Serialization convention wiki), which defaults to Kafka.serializer.DefaultEncoder, which is byte[ ] Serializer.class=com.meituan.mafka.client.codec.mafkamessageencoder # serializer.class= Kafka.serializer.DefaultEncoder # Serializer.class=kafka.serializer.stringencoder # If you want to compress the message, specify which topic to compress the message, Default empty, which means no compression. #compressed. topics= ########### Request Ack ############### # producer Receive Message AckThe default is 0. # 0:producer does not wait for the broker to send ACK # 1: ACK # 2 is sent after leader receives the message: The ACK is sent when all follower synchronize the message successfully. Request.required.acks=0 # The maximum time the broker is allowed to wait before sending an ACK to producer # If timed out, the broker will send an error ACK to producer. means the last message because Some # reason failed (e.g. follower failed to sync successfully) request.timeout.ms=10000 ########## End ##################### # Synchronously or asynchronously sends a message, the default "Sync" table is synchronized, and the "async" table is asynchronous. Asynchronous can increase the send throughput, # also means that the message will be in local buffer and sent in a timely manner, but may also cause the loss of unsent messages Producer.type=sync ############## asynchronous send (the following four different Optional step parameter) #################### # in Async mode, when the message is cached for more than this value, it will be sent to broker in bulk, which defaults to 5000ms # This value and the Batch.num.messages Association Work together. queue.buffering.max.ms = 5000 # in async mode, the producer side allows the maximum number of messages in buffer # in any case, producer cannot send the message to the broker as soon as possible, resulting in a message in P Roducer-Side deposition # At this point, if the number of messages reaches the threshold, it will cause the producer end to block or the message is discarded, the default is 10000 queue.buffering.max.messages=20000 # if it is asynchronous, Specifies the amount of data to be sent per batch, by default of batch.num.messages=500 # when the number of bars deposited on the producer end reaches "Queue.buffering.max.meesaGes "After # Blocking for a certain amount of time, the queue still has no enqueue (producer still not send out any messages) # at this time producer can continue to block or discard the message, this timeout value is used to control the" blocking "Time #-1: No blocking timeout limit, eliminate The interest will not be discarded # 0: Immediately empty the queue, messages are discarded queue.enqueue.timeout.ms=-1 ################ end ############### # when PR The number of times a message is allowed to be re-sent when the Oducer receives an error ACK or does not receive an ACK # because the broker does not have a complete mechanism to avoid message duplication, so when a network exception (such as an ACK loss) # can cause the broker to receive duplicate messages, the default value is 3. Message.send.max.retries=3 # Producer Refresh Topic Metada time interval, producer need to know the location of partition leader, and the current topic situation # So producer needs a mechanism to get the latest metadata, and when producer encounters a specific error, it will immediately refresh # (such as topic failure, partition loss, leader failure, etc.), and this parameter can also be used to configure an additional refresh mechanism , the default value 600000 topic.metadata.refresh.interval.ms=60000*/public producer<string, string> getClient () {if (producer = = null) {Properties props = new Properties (); The Kafka Port Props.put ("Metadata.broker.list", "xxx.xxx.xxx.xxx:9092") is configured here; Configures the serialization class of Value Props.put ("Serializer.class", "Kafka.serializer.StringEncoder"); Props.put ("Producer.type", "async"); Configure the serialization class for key Props.put ("Key.serializer.class", "Kafka.serializer.StringEncoder"); Props.put ("Request.required.acks", "0"); Producerconfig config = new Producerconfig (props); Producer = new producer<> (config); } return producer; } public void Shutdown () {if (producer! = null) {producer.close (); }} public static void Main (string[] args) throws Clonenotsupportedexception {Kafkaproducer Kafkaproducer = New Kafkaproducer (); for (int i=0; i<; i + +) {kafkaproducer.getclient (). Send (New keyedmessage<string, string> ("Topic1" , "topic1_" + i + "_ Test")); Kafkaproducer.getclient (). Send (New keyedmessage<string, string> ("Topic2", "topic2_" + i + "_ Test")); } kafkaproducer.shutdown (); }}
Summarize:
1, producer each time new, the thread pool will be created automatically
2. Producer will actually establish a socket connection when calling the Send method.
The connection process is as follows:
1>, through the metadata.broker.list to obtain the corresponding brokers full amount of information (Metadata.broker.list to the broker's IP and port as long as the guarantee that one is available, not all listed. However, the development process, generally all listed).
2>, obtain topic's partition information according to Zookeeper's registration information
3>, establishing a socket connection for client and broker
3, after the end of the send, directly close the socket connection.
4. Each send will re-establish the connection
5, the client will automatically get topic partition information, so Kafka rebalance, is not affected by the
CONSUMER
Consumer API has two official, commonly known as: high-level consumer API and Simpleconsumer API.
The first highly abstract consumer API, which is simple and convenient to use, but for some special needs we may need to use the second, more basic API, the first simple introduction of the second kind of API can help us do something
- One message read multiple times
- Consume only a subset of the messages in a process partition
- Adding transaction management mechanisms to ensure that messages are processed and processed only once
The disadvantage of using the second type:
- Offset value must be tracked in the program
- The lead broker in the specified topic partition must be found
- Change of broker must be handled
I mainly tried the first and most of the APIs used.
There are two uses of the high-level Consumer API: single Consumer and multiple consumers
Single consumer:
Import Kafka.consumer.consumerconfig;import Kafka.consumer.consumeriterator;import Kafka.consumer.KafkaStream; Import Kafka.javaapi.consumer.consumerconnector;import Java.util.hashmap;import Java.util.list;import Java.util.map;import Java.util.properties;public class Kafkasingleconsumer {/** * # Zookeeper Connect to the server address, here is the offline test environment configuration (k Afka Messaging Service-->kafka broker cluster on-line deployment Environment Wiki) # Configuration example: "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002" zookeeper.connect=19 2.168.2.225:2181,192.168.2.225:2182,192.168.2.225:2183/config/mobile/mq/mafka # Zookeeper session expiration time, default 5000ms, Used to detect whether the consumer hangs up, and when the consumer hangs up, other consumers wait for the specified time to check in and trigger the load Balancer zookeeper.session.timeout.ms=5000 zookeeper.connection.timeout.ms =10000 # Specifies how long the consumer updates offset into zookeeper. Note The offset update is based on time rather than on each message that is obtained. Once an exception occurs in the update zookeeper and restarts, it is possible to get the message that has been received zookeeper.sync.time.ms=2000 #指定消费组 Group.id=xxx # When consumer consumes a certain amount of messages, it will Automatically submits offset information to zookeeper # Note that the offset information is not submitted to ZK once per consumption, but is now stored locally (in memory) and submitted periodically, by default True Auto.commit.enable=true # Automatic Update time. Default * auto.commit.interval.ms=1000 # Current Consumer identification, can be set, can also have a system generation, mainly used to track the message consumption, easy to observe conusmer.id=xxx # Consumer client number, used to differentiate between different clients, the default client program automatically generates CLIENT.ID=XXXX # maximum number of blocks cached to the consumer (default) QUEUED.MAX.MESSAGE.CHUNKS=50 # when there's a new con When Sumer joins the group, it will be reblance, and then there will be a partitions consumer to migrate to the new # consumer, if a consumer to obtain a partition of consumption rights, then it will be registered to ZK # "parti tion Owner Registry "node information, but it is possible at this time that the old consumer has not yet released this node, # This value is used to control the number of retries to register the node. Rebalance.max.retries=5 # Gets the maximum size of the message, the broker does not output a message that is larger than this value consumer chunk # every time Feth gets multiple messages, this value is the total size, raising this value, will consume more consumer side memory fetch.min.bytes=6553600 # When the size of the message is insufficient, the server blocks the time, and if timed out, the message is immediately sent to consumer fetch.wait.max.ms=5000 socket.recei VE.BUFFER.BYTES=655360 # If the zookeeper does not have an offset value or the offset value is out of range. Then give an initial offset. There are smallest, largest, # anything optional, respectively, to the current minimum offset, the current maximum offset, throw exception. Default largest Auto.offset.reset=smallest # Specifies the serialization processing class (Mafka Client API call Description-->3. Serialization convention wiki), default to Kafka.serializer.Default Decoder, or byte[] Derializer.claSs=com.meituan.mafka.client.codec.mafkamessagedecoder */public static void Main (string args[]) {string top IC = "Topic1"; Properties Props = new properties (); Props.put ("Zookeeper.connect", "xxx.xxx.xxx:2181"); Props.put ("Group.id", "Testgroup"); Props.put ("zookeeper.session.timeout.ms", "500"); Props.put ("zookeeper.sync.time.ms", "250"); Props.put ("auto.commit.interval.ms", "1000"); Consumerconfig config = new Consumerconfig (props); Consumerconnector consumer = kafka.consumer.Consumer.createJavaConsumerConnector (config); map<string, integer> topicmap = new hashmap<> (); Define single thread for topic topicmap.put (topic, New Integer (1)); Map<string, list<kafkastream<byte[], byte[]>>> consumerstreamsmap = Consumer.createMessageStreams (TOPICMAP); List<kafkastream<byte[], byte[]>> streamlist = consumerstreamsmap.get (topic); for (KAFKastream<byte[], byte[]> stream:streamlist) {consumeriterator<byte[], byte[]> consumerIte = stre Am.iterator (); while (Consumerite.hasnext ()) System.out.println ("Message from single Topic::" + new String (consumeri Te.next (). message ()); } if (consumer! = null) Consumer.shutdown (); }}
Multi-consumer
Import Kafka.consumer.consumerconfig;import Kafka.consumer.consumeriterator;import Kafka.consumer.KafkaStream; Import Kafka.javaapi.consumer.consumerconnector;import Java.util.hashmap;import Java.util.list;import Java.util.map;import Java.util.properties;import Java.util.concurrent.executorservice;import Java.util.concurrent.executors;public class Kafkamulticonsumer {/** * # Zookeeper Connection server address, here is the offline test environment configuration (Kafka message service--&G T;kafka Broker cluster on-line deployment Environment Wiki) # Configuration example: "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002" zookeeper.connect=192.168.2.225: 2181,192.168.2.225:2182,192.168.2.225:2183/config/mobile/mq/mafka # Zookeeper session expiration time, default 5000ms, to detect whether the consumer is dead, When the consumer hangs up, other consumers wait for the specified time to check in and trigger the load Balancer zookeeper.session.timeout.ms=5000 zookeeper.connection.timeout.ms=10000 # refers to Fixed how long the consumer updates offset into zookeeper. Note The offset update is based on time rather than on each message that is obtained. Once an exception occurs in the update zookeeper and restarts, it is possible to get the message that has been received zookeeper.sync.time.ms=2000 #指定消费组 Group.id=xxx # When consumer consumes a certain amount of messages, it will Automatically submits offset information to zookeeper # Note offsetThe information is not submitted to ZK once per consumption, but is now locally saved (memory) and regularly submitted, by default true auto.commit.enable=true # Automatic Update time. Default * auto.commit.interval.ms=1000 # Current Consumer identification, can be set, can also have a system generation, mainly used to track the message consumption, easy to observe conusmer.id=xxx # Consumer client number, used to differentiate between different clients, the default client program automatically generates CLIENT.ID=XXXX # maximum number of blocks cached to the consumer (default) QUEUED.MAX.MESSAGE.CHUNKS=50 # when there's a new con When Sumer joins the group, it will be reblance, and then there will be a partitions consumer to migrate to the new # consumer, if a consumer to obtain a partition of consumption rights, then it will be registered to ZK # "parti tion Owner Registry "node information, but it is possible at this time that the old consumer has not yet released this node, # This value is used to control the number of retries to register the node. Rebalance.max.retries=5 # Gets the maximum size of the message, the broker does not output a message that is larger than this value consumer chunk # every time Feth gets multiple messages, this value is the total size, raising this value, will consume more consumer side memory fetch.min.bytes=6553600 # When the size of the message is insufficient, the server blocks the time, and if timed out, the message is immediately sent to consumer fetch.wait.max.ms=5000 socket.recei VE.BUFFER.BYTES=655360 # If the zookeeper does not have an offset value or the offset value is out of range. Then give an initial offset. There are smallest, largest, # anything optional, respectively, to the current minimum offset, the current maximum offset, throw exception. The default largest Auto.offset.reset=smallest # Specifies the serialization processing class (Mafka Client API call says-->3. Serialization convention wiki), default is Kafka.serializer.DefaultDecoder, i.e. byte[] derializer.class= Com.meituan.mafka.client.codec.MafkaMessageDecoder */public static void Main (string args[]) {string topic = "Topic1"; int threadcount = 3; Properties Props = new properties (); Props.put ("Zookeeper.connect", "xxx.xxx.xxx.xxx:2181"); Props.put ("Group.id", "Testgroup"); Props.put ("zookeeper.session.timeout.ms", "500"); Props.put ("zookeeper.sync.time.ms", "250"); Props.put ("auto.commit.interval.ms", "1000"); Consumerconfig config = new Consumerconfig (props); Consumerconnector consumer = kafka.consumer.Consumer.createJavaConsumerConnector (config); map<string, integer> topicmap = new hashmap<> (); Define single thread for topic topicmap.put (topic, 3); Executorservice executor = Executors.newfixedthreadpool (threadcount); Map<string, list<kafkastream<byte[], BYTE[]>>> Consumerstreamsmap = Consumer.createmessagestreams (Topicmap); List<kafkastream<byte[], byte[]>> streamlist = consumerstreamsmap.get (topic); int count = 0; For (final kafkastream<byte[], byte[]> stream:streamlist) {final String threadnumber = "Thread" + Coun t; Executor.execute (New Runnable () {@Override public void run () {Consumeri Terator<byte[], byte[]> consumerite = Stream.iterator (); while (Consumerite.hasnext ()) {System.out.println ("Thread number" + Threadnumber + ":" + new STR ING (Consumerite.next (). Message ())); } } }); count++; } }}
Summarize:
1, Kafka allows multiple consumer group, each group allows multiple consumer. Sharing information between different group (like the publish-subscribe model), multiple consumer between the same group consume only one message at a time (similar to the production-consumer model).
2. Starting multiple Java consumer threads on the same topic, you can see multiple messages on zookeeper:
[Zk:xxx.xxx.xxx.xxx:2181 (CONNECTED)] Ls/consumers/testgroup/ids [Testgroup_xxx-1459926903849-fea50e90, TESTGROUP_XXX-1459926619712-8D1CAF90]
3, if the multi-threaded way to start consumer, you can see different consumer bound to different topic patition
[Zk:xxx.xxx.xxx.xxx:2181 (CONNECTED) 121] Get/consumers/testgroup/owners/topic1/1testgroup_ Xxx-1459926619712-8d1caf90-1czxid = 0x2000006e2ctime = Wed Apr 03:15:04 EDT 2016mZxid = 0x2000006e2mtime = Wed APR 06 0 3:15:04 EDT 2016pZxid = 0x2000006e2cversion = 0dataVersion = 0aclVersion = 0ephemeralOwner = 0x153413bc26e007edatalength = 44numChildren = 0[zk:xxx.xxx.xxx.xxx:2181 (CONNECTED) 122] Get/consumers/testgroup/owners/topic1/0testgroup_ Xxx-1459926619712-8d1caf90-0czxid = 0x2000006e3ctime = Wed Apr 03:15:04 EDT 2016mZxid = 0x2000006e3mtime = Wed APR 06 0 3:15:04 EDT 2016pZxid = 0x2000006e3cversion = 0dataVersion = 0aclVersion = 0ephemeralOwner = 0x153413bc26e007edatalength = 44numChildren = 0
4, for initiating multiple consumer processes or starting a single consumer process in a multi-threaded way, the difference is that only the consumer information registered with zookeeper is multiple or a "ls/consumers/testgroup/ids", But for the consumption of the message, all abide by consumption only once, the same partition will only bind one consumer information.
5, if a consumer hangs, consumer and partition binding information will be reassigned, as far as possible to ensure load balance
6, if the number of consumer is greater than the number of partitions, will cause the redundant part of the thread can not get the message, constantly Got ping response for sessionid:0x153413bc26e0082 after 2ms. is a waste of resources
If more than one server starts the consumer process, it is best to allocate the number of consuming threads in the consumer process according to the number of partitions
More underlying details, the late encounter and continue to investigate, first will use, understand the general principle!
Kafka Java producer Consumer Practice