Apache KAFKAC Series Lient Development-java

Source: Internet
Author: User
Tags ack throw exception zookeeper

Apache Kafka Area QQ Group: 162272557

1. Dependency Packages

<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.1</version>
</dependency>

2.producer Program Development ExampleDescription of 2.1 producer references#指定kafka节点列表. Used to get metadata without all the specified
metadata.broker.list=192.168.2.105:9092,192.168.2.106:9092
# Specifies the partition processing class. By default Kafka.producer.DefaultPartitioner, the table is hashed to the corresponding partition by key
#partitioner. Class=com.meituan.mafka.client.producer.customizepartitioner

# is compressed, default 0 means no compression, 1 means gzip compression, and 2 means compression with snappy.

After compressing the message there will be a header to indicate the type of message compression, so the consumer-side message decompression is transparent without specifying.


Compression.codec=none

# Specifies the serialization processing class (Mafka Client API call description-->3. Serialization Conventions wiki), Kafka.serializer.DefaultEncoder, or byte[]
Serializer.class=com.meituan.mafka.client.codec.mafkamessageencoder
# Serializer.class=kafka.serializer.defaultencoder
# Serializer.class=kafka.serializer.stringencoder
# Suppose you want to compress the message. This specifies which topic to compress the message, default empty, which means no compression.


#compressed. topics=

########### Request ACK ###############
# Producer The time to receive the ACK of the message. Default feel 0.
# 0:producer will not wait for broker to send ACK
# 1: Send ACK after leader received message
# 2: Send ack When all follower are synchronized successfully.
Request.required.acks=0
# The maximum time the broker agrees to wait before sending an ACK to producer
# Assuming a timeout, the broker will send an error ACK to producer. means the last message because of some
# reason failed (for example, follower failed to synchronize successfully)
request.timeout.ms=10000
########## End #####################


# Synchronous or asynchronous sending of messages, default "Sync" Table synchronization, "async" table Async. Asynchronous can improve send throughput,
# also means that messages will be in local buffer and sent in a timely manner, but may also result in the loss of messages that have not been sent in the past
Producer.type=sync
############## Asynchronous Send (the following four asynchronous parameters are optional) ####################
# in Async mode, when the message is cached for more than this value, it will be sent to broker in bulk, with the default feeling that 5000ms
# This value works in conjunction with Batch.num.messages.
queue.buffering.max.ms = 5000
# in Async mode, the producer side agrees with the maximum message volume of buffer
# Anyway, producer can't send the message to broker as soon as possible, causing the message to be heavily deposited on the producer side
# at this point, assuming that the number of messages reached the threshold, will cause the producer End plug or message is discarded, default 10000
queue.buffering.max.messages=20000
# Assuming it is asynchronous, specify the volume of data to be sent each time. Default feeling 200
batch.num.messages=500
# when the message is deposited on the producer end of the number of bars reached "queue.buffering.max.meesages"
# After a certain amount of time, the queue still has no enqueue (producer still not send out no matter what message)
# at this time producer can continue to block or discard messages, this timeout value is used to control the "blocked" time
#-1: No congestion timeout limit, messages will not be discarded
# 0: Empty the queue immediately, the message is discarded
Queue.enqueue.timeout.ms=-1
################ End ###############

# when producer receives an error ACK, or does not receive an ACK, the number of times that the message was re-sent
# because the broker does not have a complete mechanism to avoid repeated messages, when the network is abnormal (for example, the ACK is missing)
# It is possible to cause the broker to receive repeated messages with a default value of 3.
Message.send.max.retries=3


# producer Refresh Topic Metada time interval, producer need to know the location of partition leader, and the current topic situation
# So producer need a mechanism to get the latest metadata, and when producer encounters a specific error, it will refresh immediately
# (such as topic failure, partition loss, leader failure, etc.), also can be configured by this parameter additional refresh mechanism, the default value of 600000
topic.metadata.refresh.interval.ms=60000


Import java.util.*; Import Kafka.javaapi.producer.producer;import Kafka.producer.keyedmessage;import kafka.producer.ProducerConfig;        public class Testproducer {public static void main (string[] args) {Long events = Long.parselong (Args[0]);         Random rnd = new Random ();        Properties Props = new properties ();        Props.put ("Metadata.broker.list", "192.168.2.105:9092"); Props.put ("Serializer.class", "Kafka.serializer.StringEncoder");        The default string encoding message Props.put ("Partitioner.class", "Example.producer.SimplePartitioner");         Props.put ("Request.required.acks", "1");         Producerconfig config = new Producerconfig (props);         producer<string, string> Producer = new producer<string, string> (config);                 for (long nevents = 0; nevents < events; nevents++) {Long runtime = new Date (). GetTime ();                String IP = "192.168.2." + rnd.nextint (255);    String msg = runtime + ", www.example.com," + IP;            keyedmessage<string, string> data = new keyedmessage<string, string> ("page_visits", IP, msg);        Producer.send (data);    } producer.close (); }}

2.1 Specify Keywordkey. Send a message to the specified partitions description: Assume that you need to implement your own definition of partitions message delivery. Need to implement Partitioner interface
public class Customizepartitioner implements Partitioner {public    customizepartitioner (verifiableproperties props {     }    /**     * Returns the partition index number     * @param key SendMessage, output of Partkey     * @param The total number of partitions in numpartitions topic     * @return     */    @Override public    int partition (Object key, int numpartitions) {        System.out.println ("Key: "+ key +"  numpartitions: "+ numpartitions);        String partkey = (string) key;        if ("Part2". Equals (Partkey))            return 2;//        System.out.println ("Partkey:" + key);         ........        ........        return 0;}    }

3.consumer Program Development Example 3.1Consumer parameter Description# Zookeeper Connection server address, here is the offline test environment configuration (Kafka message service-->kafka broker cluster on-line deployment environment Wiki)
# Configuration Sample: "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002"
Zookeeper.connect=192.168.2.225:2181,192.168.2.225:2182,192.168.2.225:2183/config/mobile/mq/mafka
# Zookeeper session Expiration time, default 5000ms. Used to detect whether the consumer hangs up, when the consumer hangs up, other consumers have to wait for the designated time ability to check and trigger another load balancer
zookeeper.session.timeout.ms=5000
zookeeper.connection.timeout.ms=10000
When the reblance is #当consumer, the retry fails at the time interval.

zookeeper.sync.time.ms=2000

#指定消费组
Group.id=xxx
# When consumer consumes a certain amount of news, he will voluntarily submit the offset information to zookeeper
# Note that the offset information is not submitted to ZK once per consumption, but is now stored locally (in memory) and regularly submitted, with the default feeling True
Auto.commit.enable=true
# proactively update your time. Default 60 * 1000
auto.commit.interval.ms=1000

# Current consumer identification, can be set, can also have a system generation, mainly used to track the message consumption situation, easy to observe
Conusmer.id=xxx

# Consumer client number. Used to differentiate between different clients, the default client program generates itself voluntarily
Client.id=xxxx
# Maximum number of blocks cached to consumer (default 10)
Queued.max.message.chunks=50
# When a new consumer is added to the group, it will be reblance, and thereafter there will be a partitions consumer to migrate to the new
# on the consumer, suppose a consumer gets a partition of spending rights, then it will register with ZK
# "Partition Owner Registry" node information, but it is possible at this time that the old consumer has not yet released this node,
# This value is used to control the number of retries for the enrollment node.
Rebalance.max.retries=5
# Gets the maximum size of the message, and the broker does not output messages that are larger than this value consumer chunk
# each time Feth will get multiple messages, this value is the total size, raising this value, will consume a lot of other consumer-side memory
fetch.min.bytes=6553600
# When the size of the message is insufficient, the server is blocked for a time, assuming timeout, the message will be sent to consumer immediately
fetch.wait.max.ms=5000
socket.receive.buffer.bytes=655360

# Suppose zookeeper has no offset value or offset value out of range.

Then give an initial offset. There are smallest, largest,
# anything optional, respectively, to the current minimum offset, the current maximum offset, throw exception. Default Largest
Auto.offset.reset=smallest
# Specifies the serialization processing class (Mafka Client API call description-->3. Serialization Conventions wiki), Kafka.serializer.DefaultDecoder, or byte[]
Derializer.class=com.meituan.mafka.client.codec.mafkamessagedecoder

3.2 Multi-Threading parallel consumption topicConsumertest class

Import Kafka.consumer.consumeriterator;import Kafka.consumer.KafkaStream; public class Consumertest implements Runnable {    private kafkastream m_stream;    private int m_threadnumber;     Public Consumertest (Kafkastream a_stream, int a_threadnumber) {        m_threadnumber = A_threadnumber;        M_stream = A_stream;    }     public void Run () {        consumeriterator<byte[], byte[]> it = M_stream.iterator ();        while (It.hasnext ())            System.out.println ("Thread" + M_threadnumber + ":" + New String (It.next (). Message ()));        SYSTEM.OUT.PRINTLN ("Shutting down Thread:" + m_threadnumber);}    }

Consumergroupexample class
Import Kafka.consumer.consumerconfig;import Kafka.consumer.kafkastream;import Kafka.javaapi.consumer.ConsumerConnector; Import Java.util.hashmap;import java.util.list;import Java.util.map;import Java.util.properties;import Java.util.concurrent.executorservice;import java.util.concurrent.Executors;    public class Consumergroupexample {private final consumerconnector consumer;    Private final String topic;     Private Executorservice executor; Public Consumergroupexample (String a_zookeeper, String a_groupid, String a_topic) {consumer = Kafka.consumer.Consu        Mer.createjavaconsumerconnector (Createconsumerconfig (A_zookeeper, a_groupid));    This.topic = A_topic;        } public void Shutdown () {if (consumer! = null) Consumer.shutdown ();    if (executor! = null) Executor.shutdown (); } public void run (int a_numthreads) {map<string, integer> topiccountmap = new hashmap<string, Integer        > (); Topiccountmap.put (topic, NEW Integer (a_numthreads)); Map<string, list<kafkastream<byte[], byte[]>>> consumermap = Consumer.createmessagestreams (        TOPICCOUNTMAP);         List<kafkastream<byte[], byte[]>> streams = consumermap.get (topic);         Start All Threads executor = Executors.newfixedthreadpool (a_numthreads);        Start consuming message int threadnumber = 0;            For (final Kafkastream stream:streams) {executor.submit (new consumertest (Stream, threadnumber));        threadnumber++; }} private static Consumerconfig Createconsumerconfig (String a_zookeeper, String a_groupid) {Properties PR        OPS = new Properties ();        Props.put ("Zookeeper.connect", "192.168.2.225:2183/config/mobile/mq/mafka");        Props.put ("Group.id", "Push-token");        Props.put ("zookeeper.session.timeout.ms", "60000");        Props.put ("zookeeper.sync.time.ms", "2000");         Props.put ("auto.commit.interval.ms", "1000"); return new Consumerconfig (props);        } public static void Main (string[] args) {String zooKeeper = args[0];        String groupId = args[1];        String topic = args[2];         int threads = Integer.parseint (args[3]);        consumergroupexample example = new Consumergroupexample (ZooKeeper, GroupId, topic);         Example.run (threads);        try {thread.sleep (10000);    } catch (Interruptedexception IE) {} example.shutdown (); }}

Summary:

Kafka Consumer API is divided into the high API and low API, the above demo is the use of the Kafka higher API, Advanced API does not care about maintaining consumption status information and load balancing. The system will depend on the configuration parameters,

Periodically flush offset to ZK, assuming that there are multiple consumer and each consumer creates multiple threads, the Advanced API will perform its own active load balancing operation based on the consumer information on ZK.

Precautions:

1. The advanced API will be internally implemented to persist offset for the message that was last read by each partition, and the data is stored in the consumer group name in zookeeper (such as/CONSUMERS/PUSH-TOKEN-GROUP/OFFSETS/PUSH-TOKEN/2.

Among the Push-token-group is the consumer group, Push-token is topic, the last 2 represents the 3rd partition), and the offset is updated once per interval (default 1000ms) time.

It is possible to get repeated messages when restarting the consumer. In addition You may also get repeated messages when the partition leader is changed. Therefore, it is better to wait for a certain time (10s) and then shutdown () when closing the consumer.

2. The consumer group name is a global message, note that the old consumer needs to be closed before the new consumer starts.

Assume that the new process starts and the consumer group name is the same. Kafka will join this process to consume in the available consumer thread group

Topic and triggering load balancing again, it is possible for messages of the same partition to be sent to different processes.

3. Assuming that the total number of consumer in the consumer group is greater than the number of partitions, a subset of the threads or some consumer may not be able to read the message or be in spare state.

4. Assume that the number of partitions is greater than the number of threads (assuming that there are multiple consumers in the consumer group, the number of threads is the sum of all consumer threads in the consumer group). A subset of threads will read messages to multiple partitions

5. Assuming that a thread consumes multiple partition messages, the received message is not guaranteed in order.

Note: The Zookeeper Web UI tool can be used to manage viewing ZK folder tree data: XXX/CONSUMERS/PUSH-TOKEN-GROUP/OWNERS/PUSH-TOKEN/2

Push-token-group is the consumer group, Push-token is topic,2 for partition 3. See what's Inside:

Push-token-group-mobile-platform03-1405157976163-7ab14bd1-0 indicates that the partition is run by the indicated thread.


Summary:Producer Performance Optimization: Asynchronous. Messages are sent in bulk. View the above reference description in detail. Consumer Performance Optimization: assumed to be high throughput data. Set each time to take the cancellation (fetch.min.bytes) larger,fetching messages frequently (fetch.wait.max.ms) (or shorter time intervals),assumptions are low latency requirements,set the time interval is small, each time you get the message from Kafka broker as small as possible.


Please specify reproduced from: http://blog.csdn.net/lizhitao/article/details/37811291


Apache KAFKAC Series Lient Development-java

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.