Kafka Personal Summary

Source: Internet
Author: User

In recent times, more Kafka have been used. From the initial FE project, KafkaProducer How to use KafkaConsumer to the back of the project using the string. , there is a general understanding of the Kafka. Recently because of relatively idle, so in search of some information about Kafka. Overall, Kafka is a new type of distributed Message Queuing open source tools, ready-made PDFs, books are relatively small, the best information is Apache on the Kafka,apache Kafka

First of all, Kafka The overall structure (excerpt from the official website):

K Afka is a producer-consumer message subscription system, the main carrier is topic,producer to publish the message to the specified topic, subscribed to the topic consumer can get related messages. Kafka is a distributed system, and for this reason, each topic is divided into multiple partition.

at producer End , multiple producer can be written together to a topic, because it is multithreading, the order of publishing messages in topic is not guaranteed, topic is composed of partition, that is, partition order is not guaranteed. However, the messages inside each partition are ordered, and each producer can produce messages to different partition. Producer can be used in a synchronous or asynchronous manner when publishing a message, and can be done according to the configuration file. Cluster is the intermediate carrier of producer and consumer, which contains multiple brokers. Producer publishes messages to cluster, consumer consumes messages from cluster. The replication of data is a problem to be considered in every distributed system, in Kafka, each topic has multiple partition, the replication of data is in partition unit, Each partition has a broker as the leader, and the other broker is the slave of the partition, which accepts leader data from producer, Other slave are simply responsible for replicating leader operations. In other words, for each partition only one leader to communicate with consumer and producer, in order to ensure that the leader in the outage after the data is not lost, the system will choose from another slave as leader, to perform leader task. Cluster cluster storage space is generally relatively large, so the data push from producer to cluster is push, and the way to get data from consumer to cluster is pull, which is described in detail below consumer. Each broker is a partition leader, and it is another partition slave.

On the consumer side, the way consumer gets data from cluster is not push, but pull, which is the difference between push is to push the data actively, like producer as long as the data is generated to push to cluster, rather than generate data and put it on the local , the data from the cluster initiative to pull producer. This is done because the cluster is very busy, and the space is very large, so that you can passively accept the producer data. Similarly, because cluster is very busy, consumer if need data can take the initiative to go to cluster pull. In addition, cluster do not know consumer consumption capacity, if producer production data is much more than the ability of consumer consumption, and cluster has been constantly push data to consumer, it will likely lead to consumer downtime, So, in the way of pull, consumer when it needs data to fetch it by itself.

In a distributed system, where multiple consumer consume data at the same time, the synergy of these consumer becomes an important work, in Kafka, each consumer has a global group_id, as shown in:


Each consumer is subordinate to each group, in order to ensure the correctness of the consumption, each group is a complete individual, it will cluster the data all consumed, different groups according to their own needs will be different to consume data. Within each group, each partition is uniquely assigned to a consumer and does not assign it to two or more consumer. Therefore, the number of consumer within a group cannot be greater than the number of partition, otherwise there will be a result consumer never work. In order to ensure the consistency of data consumption, avoid data re-consumption or non-consumption, each partition in Kafka has an offset, indicating consumer consumption of the partition to where, the location is maintained by consumer, Every time consumer go to cluster request data, it will be offset to cluster to see, tell cluster I last spent where, this should be from where to spend. The consumer can reset the offset to any location in order to meet the different requirements of the consumer.

Eat a meal, suddenly no idea, forget it first.

Kafka Personal Summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.