Kafka:kafka Getting Started tutorial and Java client using _

Kafka:kafka Getting Started tutorial and Java client using __kafka

Last Update:2018-07-28 Source: Internet

Author: User

Tags serialization zookeeper log4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Directory

Catalog Kafka Introduction to Introduction to the environment Introduction consumption mode download cluster install configuration command use Java actual reference

Kafka Introduction

Written by Scala and Java, Kafka is a highly-throughput distributed subscription messaging system. Environment Introduction

Operating system: centos6.5
kafka:1.0.1
zookeeper:3.4.6 Terminology Introduction the Broker:kafka cluster contains one or more servers, known as Broker Topic: Each message published to the Kafka Cluster has a category called Topic. (Physically different topic messages are stored separately, and logically a topic message is saved on one or more broker but the user only needs to specify the topic of the message to produce or consume the data without having to worry about where the data is stored) Partition: Partition is a physical concept in which each topic contains one or more partition. Producer: Responsible for posting messages to Kafka broker Consumer: Message consumers, clients that read messages to Kafka broker. Consumer Group: Each Consumer belongs to a specific Consumer group (you can specify group name for each Consumer, or the default group name if not specified). consumption Patterns

To take care of the students who are not very familiar with MQ, first talk about the principle of MQ. Typically, MQ stores a queue on the server side. The producers have dropped the message to MQ server, where consumers consume from MQ server. This solves the problem of high coupling between producers and consumers, At the same time, also solve the production speed and the difference in consumption speed caused by consumers can not keep up with the production speed of producers caused by excessive consumer pressure problem.

The topic in Kafka is a series of queues called a theme. Of course, both ACTIVEMQ and RABBITMQ have this concept. A class of messages will be thrown into a topic.

Finish topic let's talk about partition (partition), this thing is Kafka unique things, but also Kafka to achieve horizontal expansion and high concurrency of an important design. Let's imagine if each topic has only one queue, As the business increases topic there is more and more news. To a server can not fit to do. To solve this problem, we introduced the concept of partition. A partition (partition) Represents a physically existing queue. Topic is only a group of partition (partitions), that is to say topic is only a logical concept. So when the message on topic is growing. We can add the new partition (partition) On the other server. That is to say, the partition (partition) Inside the topic can be divided into different machines. In actual production, it is also the basic play.

Here is a special case, sometimes we create a topic did not specify the number of partition (partition) or specify the number of partition (partition) is 1, then there is actually a default partition (partition), the name I forgot.

From the Producer (producer) perspective, a message is thrown into the topic and the task is completed. As to which partition (partition) was specifically thrown into the topic, Producer (producer) There is no need for attention. Here Kafka automatically helps us do load balancing. Of course, if we specify a partition (partition), it's OK. This official document and Baidu.

Next we talk about Consumer Group (consumer group), Consumer Group (consumer group) as the name implies is a group of Consumer (consumers) collectively. That has the concept of how the group works. If there is only one group and only one Consumer within the group, So this is the traditional point-to-point mode, if there are more than one group, each group has a consumer, that is the publish-subscribe (pub-sub) mode. Each group will receive the same message.

Finally, the most difficult to understand is the most discussed place, partition (division) and Consumer (consumer) relationship. First, a consumer (consumer) thread can only receive one partition (partition) of data at a time, A partition (partition) will only send the message to a consumer (consumer) at some point. We designed several scenarios:

Scene One: there are partition-1 and partition-2 under Topic-1
Under Group-1 there are consumer-1 and Consumer-2 and consumer-3
All consumer have only one thread, and all consume topic-1 messages.
consumption: consumer-1 only consume partition-1 data
Consumer-2 only consumes partition-2 data.
Consumer-3 will not consume any data
Reason: only one partition (partition) of data can be accepted

scene Two: there are partition-1 and partition-2 under Topic-1
There's consumer-1 under the group-1.
Consumer has only one thread and consumes topic-1 messages.
consumption: consumer-1 first to consume partition-1 data
Consumer-1 consume partition-1 data and start consuming partition-2 data
Reason: here is Kafka detection of the current consumer-1 consumption finished partition-1 in idle state, automatically helped me do the load. So you see here, look at "some moment" of the sentence above.
Special case: Consumer must specify the topic when consuming the message, can not specify partition, the situation of scene two occurs in the case that does not specify partition, if consumer-1 specifies Partition-1, Then consumer-1 consumption after partition-1 even in idle state will not consume the message of partition-2.

Then we summed up an experience, the same group of consumers (single-threaded consumption) should not be more than topic under the partition (partition) number, or there will be a consumer idle state, at this time the number of concurrent threads =partition (partition) Quantity. Conversely, the number of consumers is less than topic under the partition (partition) Quantity is also not ideal, because the number of concurrent threads = The number of consumers, and can not fully play the Kafka concurrency efficiency.

Finally, we look at the top of the graph, Consumer Group a two machines to open two thread consumption P0 P1 P2 P3 message Consumer Group B four machines single-threaded consumption P0 P1 P2 messages on the message. At this time the highest efficiency. Download

Download Address: Http://kafka.apache.org/downloads
Here we download the cluster installation configuration in the/usr/local directory

decompression: cd/usr/local && TAR-XZVF kafka_2.11-1.0.1.tgz

Create log directory: cd/usr/local/kafka_2.11-1.0.1 && mkdir kafkalogs

configuration:vi/usr/local/kafka_2.11-1.0.1/config/server.properties need to change the bottom five places

#broker的id, each machine ID in the cluster is unique, the other two are 1 and 2
broker.id=0  
#是Kafka绑定的interface, there is a need to write a computer intranet IP address, or bind port failure
# The other two were the 192.168.1.5 and 192.168.1.9
host.name=192.168.1.3 
#向zookeeper注册的对外暴露的ip和port, 118.212.149.51 is the 192.168.1.3 IP address #如果不配置kafka部署在外网服务器的话本地是访问不到的 of the extranet
.
advertised.listeners=plaintext://118.212.149.51:9092 
#zk集群的ip和port, ZK cluster Tutorial:
zookeeper.connect= 192.168.1.3:2181,192.168.1.5:2181,192.168.1.9:2181
#log目录, just built on top.
Log.dirs=/usr/local/kafka_2.11-1.0.1/kafkalogs

start the cluster (executed in three broker): Enter the bin directory cd/usr/local/kafka_2.11-1.0.1/bin/execute the startup script and specify the configuration file./kafka-server-start.sh- Daemon.. /config/server.properties

To verify that the cluster started successfully:

[Root@template ~]# cd/usr/local/zookeeper-3.4.6/bin/
[root@template bin]#.
/zkcli.sh-server 127.0.0.1:2181 ...
[zk:127.0.0.1:2181 (CONNECTED) 0] Ls/brokers/ids
[0, 1, 2] #这里的012分别是三个broker的id

View a broker information: Note the ip:port of endpoints information, this is our external service exposure address, I am here extranet access, so exposed to the extranet IP and port

[Zk:127.0.0.1:2181 (CONNECTED) 1] get/brokers/ids/0
{"Listener_security_protocol_map": {"plaintext": "PlainText "}," Endpoints ": [" plaintext://118.212.149.51:9092 "]," Jmx_port ":-1," host ":" 118.212.149.51 "," timestamp ":" 1521010377533 ", port": 9092, "Version": 4}
Czxid = 0x700000626
CTime = Wed Mar 14:52:57 CST 2018
Mzxid = 0x70 0000626
mtime = Wed Mar 14:52:57 CST 2018
pzxid = 0x700000626 cversion
= 0
dataversion = 0
ACL Version = 0
ephemeralowner = 0x3621e366ae20014
datalength = 198
Numchildren = 0

command to use

Create topic:

#--replication-factor the number of replicas created, this use to back up. The number of replicas cannot be greater than
the #--partitions created by the broker number of 1. Create according to the actual situation
./kafka-topics.sh Create--zookeeper 192.168.1.3:2181--replication-factor 1--partitions 1--topic Milo

View topic:

./kafka-topics.sh--list--zookeeper 192.168.1.3:2181

View topic Details:

./kafka-topics.sh--describe--zookeeper 192.168.1.3:2181

The results are as follows:

First line Topic information summary: Topic name (Topic), partition number (Partitioncount), number of replicas (replicationfactor), configuration (config)
The second line, line fourth, lists all the Partition of the Topic named Milo respectively. Topic First Name (Topic), Partition (Partition), Partition (Borker), where this Leader is located, Broker (replicas) where the replica resides, ISR list (ISR)
PS: A set of synchronized states (a set of In-sync replicas), referred to as ISR, is commonly understood as a substitute, and not every broker can be used as a substitute. First of all, this broker has a copy, followed by a copy of the condition. Like our college football team. , some people are substitutes, some people even did not enter the big list, because he could not play football. ^ ^

Production message:

./kafka-console-producer.sh--broker-list 118.212.149.51:9092--topic test\ World

Consumer message:

./kafka-console-consumer.sh--zookeeper 118.212.149.51:2181--topic Milo--from-beginning
Hello World

Java Combat

Pom.xml

<dependencies>
    <dependency>
      <groupId>org.apache.kafka</groupId>
      < artifactid>kafka_2.11</artifactid>
      <version>1.0.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId> kafka-clients</artifactid>
      <version>1.0.1</version>
    </dependency>
  </ Dependencies>

Producer.java

Package Cn.milo.kafka;
Import org.apache.kafka.clients.producer.*;
Import Org.apache.kafka.common.serialization.StringSerializer;

Import Org.apache.log4j.Logger;

Import java.util.Properties;                                            
 /****************************************************** Hu Jintao @ClassName: Producer.java @author: Milo ^ @date: 2018 03 14 11:34  version:v1.0.x *******************************************************/public class Producer

    {static Logger log = Logger.getlogger (Producer.class);
    private static final String TOPIC = "Milo2";
    private static final String broker_list = "118.212.149.51:9092";

    private static kafkaproducer<string,string> producer = NULL;
        /* Initialize producer/static {Properties configs = initconfig ();
    Producer = new kafkaproducer<string, string> (configs);
    }/* Initialize Configuration/*private static Properties Initconfig () {Properties Properties = new properties ();
        Properties.put (producerconfig.bootstrap_servers_config,broker_list);
        Properties.put (Producerconfig.key_serializer_class_config, StringSerializer.class.getName ());
        Properties.put (Producerconfig.value_serializer_class_config,stringserializer.class.getname ());
    return properties; public static void Main (string[] args) throws Interruptedexception {//Message entity Producerrecord<strin
        G, string> record = null; for (int i = 0; i < 1000. i++) {record = new producerrecord<string, string> (TOPIC, "value" + (int) (10*
            (Math.random ())); Send Message Producer.send (record, new Callback () {@Override public void Oncompleti On (Recordmetadata recordmetadata, Exception e) {if (null!= e) {log.info ("Sen
            D error "+ E.getmessage ());        }else {System.out.println (String.Format) ("offset:%s,partition:%s", Recordmetadata.offset (),
                    Recordmetadata.partition ()));
        }
                }
            });
    } producer.close ();
 }
}

Consumer:

Package Cn.milo.kafka;
Import Org.apache.kafka.clients.consumer.ConsumerRecord;
Import Org.apache.kafka.clients.consumer.ConsumerRecords;
Import Org.apache.kafka.clients.consumer.KafkaConsumer;

Import Org.apache.log4j.Logger;

Import java.util.Properties;                                            
 /****************************************************** Hu Jintao @ClassName: Consumer.java @author: Milo ^ @date: 2018 03 14 15:50  version:v1.0.x *******************************************************/public class Consumer

    {static Logger log = Logger.getlogger (Producer.class);
    private static final String TOPIC = "Milo2";
    private static final String broker_list = "118.212.149.51:9092";

    private static kafkaconsumer<string,string> consumer = null;
        static {Properties configs = initconfig (); Consumer = new kafkaconsumer<string, string> (configs);
        private static Properties Initconfig () {Properties Properties = new properties ();
        Properties.put ("Bootstrap.servers", broker_list);
        Properties.put ("Group.id", "0");
        Properties.put ("Key.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer");
        Properties.put ("Value.deserializer", "Org.apache.kafka.common.serialization.StringDeserializer");
        Properties.setproperty ("Enable.auto.commit", "true");
        Properties.setproperty ("Auto.offset.reset", "earliest");
    return properties; The public static void main (string[] args) {while (true) {consumerrecords<string, string>
            Records = Consumer.poll (10);
            For (consumerrecord<string, string> record:records) {log.info (record); }
        }
    }
}

Reference Documents

[1].kafka Learning a very detailed classic tutorial: http://blog.csdn.net/tangdong3415/article/details/53432166
[2]. Kafka Introduction and practice. Guandan

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More