Objective
In the previous article on how to build a Kafka cluster, this article explains how to use Kafka easily. However, when using Kafka, it should be easy to understand the next Kafka.
Introduction of Kafka
Kafka is a high-throughput distributed publish-subscribe messaging system that handles all the action flow data in a consumer-scale website.
Kafka has the following characteristics:
- Provides message persistence in the form of Time complexity O (1), which guarantees constant-time complexity even with terabytes of data.
- High throughput rates. Even on very inexpensive commercial machines, a single machine can support the transmission of more than 100K messages per second.
- Supports message partitioning between Kafka servers, and distributed consumption, while guaranteeing the sequential transmission of messages within each partition.
- It also supports offline data processing and real-time data processing.
- Scale out: Supports online horizontal scaling.
Kafka's terminology
- The Broker:kafka cluster contains one or more servers, which are called broker.
- Topic: Each message published to the Kafka Cluster has a category, which is called Topic. (Physically different topic messages are stored separately, logically a topic message is saved on one or more brokers but the user only needs to specify the topic of the message to produce or consume data without worrying about where the data is stored)
- Partition:partition is a physical concept, and each topic contains one or more Partition.
- Producer: Responsible for publishing messages to Kafka broker.
- Consumer: The message consumer, the client that reads the message to Kafka broker.
- Consumer Group: Each Consumer belongs to a specific Consumer group (the group name can be specified for each Consumer, and the default group if the group name is not specified).
Kafka Core API
Kafka has four core APIs
- The application uses the Producer API to publish messages to 1 or more topic.
- The application uses the consumer API to subscribe to one or more topic and process the resulting message.
- The application uses the streams API to act as a stream processor, consuming input streams from 1 or more topic, and producing an output stream to 1 or more topic, effectively swapping the input flow to the output stream.
- The connector API allows you to build or run reusable producers or consumers and link topic to existing applications or data systems.
The example diagram is as follows:
Kafka Application Scenarios
- Build real-time streaming data pipelines that can reliably get data between systems or applications.
- Build live streaming applications that can transform or respond to data flow.
Refer to the Kafka official documentation for more information.
Development preparation
What should we do if we are going to develop a Kafka program?
First of all, after setting up the Kafka environment, we should consider whether we are producers or consumers, that is, the sender of the message or the recipient.
In this article, however, both the producer and the consumer will develop and explain.
After a rough understanding of Kafka, let's develop the first program.
The development language used here is Java, which builds tools maven.
MAVEN relies on the following:
<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka_2.12</artifactId> <version>1.0.0</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>1.0.0</version> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>1.0.0</version> </dependency>
Kafka Producer
In the development of production, the first simple introduction of the following Kafka various configuration instructions:
- The address of the Bootstrap.servers:kafka.
- ACKs: The acknowledgement mechanism of the message, the default value is 0.
Acks=0: If set to 0, the producer does not wait for the Kafka response.
Acks=1: This configuration means that Kafka writes this message to a local log file, but does not wait for a successful response from other machines in the cluster.
Acks=all: This configuration means that leader will wait for all follower to complete synchronously. This ensures that the message will not be lost unless all the machines in the Kafka cluster are hung out. This is the strongest availability guarantee.
- Retries: When configured to a value greater than 0, the client will resend if the message fails to send.
- Batch.size: When multiple messages need to be sent to the same partition, the producer tries to merge the network requests. This will improve the efficiency of the client and the producer.
- Key.serializer: Key serialization, default Org.apache.kafka.common.serialization.StringDeserializer.
- Value.deserializer: Value serialization, default Org.apache.kafka.common.serialization.StringDeserializer.
...
There are more configurations available to view official documents, which are not explained here.
Then our Kafka producer configuration is as follows:
Properties props = new Properties(); props.put("bootstrap.servers", "master:9092,slave1:9092,slave2:9092"); props.put("acks", "all"); props.put("retries", 0); props.put("batch.size", 16384); props.put("key.serializer", StringSerializer.class.getName()); props.put("value.serializer", StringSerializer.class.getName()); KafkaProducer<String, String> producer = new KafkaProducer<String, String>(props);
After the Kafka configuration is added, we start to produce the data, and the production data code needs to be as follows:
producer.send(new ProducerRecord<String, String>(topic,key,value));
- Topic: The name of the message queue that can be created first in the Kafka service. If the topic is not created in Kafka, it will be created automatically!
- Key: The value that corresponds to values, similar to map.
- Value: The data format to be sent is of type string.
After writing a good producer program, let's start production!
The message I sent here is:
String messageStr="你好,这是第"+messageNo+"条数据";
And only send 1000 to exit, the result is as follows:
You can see that the information was printed successfully.
If you do not want to use the program to verify that the program is sent successfully, and the accuracy of the message sent, you can use the command view on the Kafka server.
Kafka Consumer
Kafka consumption of this piece should be the focus, after all, most of the time, we mainly use the data for consumption.
Kafka consumption is configured as follows:
- The address of the Bootstrap.servers:kafka.
- Group.id: Group name different group name can be repeated consumption. For example, you used group name A to consume Kafka 1000 data, but you also want to spend the 1000 data again, and do not want to re-produce, so here you just need to change the name of the reorganization can be repeated consumption.
- Enable.auto.commit: Whether auto-commit, default is true.
- auto.commit.interval.ms: Processing length from poll (pull).
- Session.timeout.ms: Timeout period.
- Max.poll.records: The number of bars at which the maximum pull is taken.
- Auto.offset.reset: Consumer rules, default earliest.
Earliest: When there is a committed offset under each partition, the consumption starts from the offset that is submitted, and when there is no committed offset, it is consumed from the beginning.
Latest: When there is a committed offset under each partition, it is consumed from the offset that is committed, and the data under the newly generated partition is consumed when there is no committed offset.
None:topic each partition has a committed offset, the consumption begins after offset, and an exception is thrown whenever a partition does not have a committed offset.
- Key.serializer: Key serialization, default Org.apache.kafka.common.serialization.StringDeserializer.
- Value.deserializer: Value serialization, default Org.apache.kafka.common.serialization.StringDeserializer.
Then our Kafka consumer configuration is as follows:
Properties props = new Properties(); props.put("bootstrap.servers", "master:9092,slave1:9092,slave2:9092"); props.put("group.id", GROUPID); props.put("enable.auto.commit", "true"); props.put("auto.commit.interval.ms", "1000"); props.put("session.timeout.ms", "30000"); props.put("max.poll.records", 1000); props.put("auto.offset.reset", "earliest"); props.put("key.deserializer", StringDeserializer.class.getName()); props.put("value.deserializer", StringDeserializer.class.getName()); KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
Since I am setting up the Auto commit, so the consumption code is as follows:
We need to subscribe to a topic, which is to specify which topic to consume.
consumer.subscribe(Arrays.asList(topic));
After subscribing, we then pull the data from the Kafka:
ConsumerRecords<String, String> msgList=consumer.poll(1000);
In general, consumption will use monitoring, here we use for (;;) To listen, and set a spending 1000 to exit!
The results are as follows:
You can see that we have successfully consumed the production data here.
Code
Then the producer and consumer code is as follows:
Producers:
Import Java.util.properties;import Org.apache.kafka.clients.producer.kafkaproducer;import Org.apache.kafka.clients.producer.producerrecord;import Org.apache.kafka.common.serialization.StringSerializer ;/** * * title:kafkaproducertest* Description: * Kafka producer demo* version:1.0.0 * @author pancm* @date January 26, 2018 */public Class Kafkaproducertest implements Runnable {private final kafkaproducer<string, string> producer; Private final String topic; Public Kafkaproducertest (String topicname) {Properties props = new Properties (); Props.put ("Bootstrap.servers", "master:9092,slave1:9092,slave2:9092"); Props.put ("ACKs", "all"); Props.put ("retries", 0); Props.put ("Batch.size", 16384); Props.put ("Key.serializer", StringSerializer.class.getName ()); Props.put ("Value.serializer", StringSerializer.class.getName ()); This.producer = new kafkaproducer<string, string> (props); This.topic = topicname; } @Override public void Run () {int messageno = 1; try {for (;;) {String messagestr= "Hello, this is the section" +messageno+ "Data"; Producer.send (New producerrecord<string, string> (topic, "Message", Messagestr)); Produced 100 lines on the print if (messageno%100==0) {System.out.println ("message Sent:" + messagestr); }//Production of 1000 will exit if (messageno%1000==0) {System.out.println ("sent successfully" +me ssageno+ "article"); Break } messageno++; }} catch (Exception e) {e.printstacktrace (); } finally {producer.close (); }} public static void Main (String args[]) {kafkaproducertest test = new Kafkaproducertest ("Kafka_test") ; Thread thread = new thread (test); Thread.Start (); }}
Consumer:
Import Java.util.arrays;import Java.util.properties;import Org.apache.kafka.clients.consumer.ConsumerRecord; Import Org.apache.kafka.clients.consumer.consumerrecords;import Org.apache.kafka.clients.consumer.KafkaConsumer; Import org.apache.kafka.common.serialization.stringdeserializer;/** * * title:kafkaconsumertest* Description: * Kafka Consumer demo* version:1.0.0 * @author pancm* @date January 26, 2018 */public class Kafkaconsumertest implements Runnable { Private final kafkaconsumer<string, string> consumer; Private consumerrecords<string, string> msglist; Private final String topic; private static final String GROUPID = "Groupa"; Public Kafkaconsumertest (String topicname) {Properties props = new Properties (); Props.put ("Bootstrap.servers", "master:9092,slave1:9092,slave2:9092"); Props.put ("Group.id", GROUPID); Props.put ("Enable.auto.commit", "true"); Props.put ("auto.commit.interval.ms", "1000"); Props.put ("Session.Timeout.MS "," 30000 "); Props.put ("Auto.offset.reset", "earliest"); Props.put ("Key.deserializer", StringDeserializer.class.getName ()); Props.put ("Value.deserializer", StringDeserializer.class.getName ()); This.consumer = new kafkaconsumer<string, string> (props); This.topic = topicname; This.consumer.subscribe (arrays.aslist (topic)); } @Override public void Run () {int messageno = 1; System.out.println ("---------Start consuming---------"); try {for (;;) {msglist = consumer.poll (1000); if (Null!=msglist&&msglist.count () >0) {for (consumerrecord<string, string> record:msg List) {//Consume 100 lines to print, but the printed data is not necessarily this regular if (messageno%100==0) { System.out.println (messageno+ "=======receive:key =" + record.key () + ", value =" + record.value () + "offset=== "+record.offset ()); }//When spending 1000, exit if (messageno%1000==0) {Brea K } messageno++; }}else{thread.sleep (1000); }}} catch (Interruptedexception e) {e.printstacktrace (); } finally {consumer.close (); }} public static void Main (String args[]) {kafkaconsumertest test1 = new Kafkaconsumertest ("Kafka_test"); Thread thread1 = new Thread (test1); Thread1.start (); }}
Note: master, slave1, slave2 because I have a relational mapping in my environment, this can be replaced by the server's IP.
Of course I put it on GitHub and I'm interested to see it. Https://github.com/xuwujing/kafka
Summarize
The simple development of a Kafka program requires the following steps:
- Successfully build Kafka server and start successfully!
- Get the Kafka service information, and then configure it in the code accordingly.
- After the configuration is complete, listen to Message Queuing in Kafka for messages to be generated.
- Process the resulting data for business logic!
Kafka Introduction to the official documentation:
Http://kafka.apache.org/intro
This is the end of this article, thank you for reading!
Kafka using Java to achieve data production and consumption demo