Introduction to roaming Kafka

Source: Internet
Author: User
Address: http://blog.csdn.net/honglei915/article/details/37564521
Kafka is a distributed, partitioned, and reproducible message system. It provides common messaging system functions, but has its own unique design. What is this unique design?
First, let's look at several basic terms of the message system:
  • Kafka sends messagesTopicUnit.
  • The program that publishes messages to the Kafka topic becomesProducers.
  • The program that subscribes to topics and consumes messages becomesConsumer.
  • Kafka runs as a cluster and can be composed of one or more services. Each Service is calledBroker.
Producers sends messages to the Kafka cluster over the network. The cluster provides messages to consumers, as shown in:
The client and the server communicate over TCP. Kafka provides Java clients and supports multiple languages. Topics and logs first look at an abstract concept provided by Kafka: topic. A topic is an induction of a group of messages. Kafka partitions the logs of each topic, as shown in:
Each partition is composed of a series of ordered and unchangeable messages that are continuously appended to the partition. Each message in a partition has a continuous serial number called offset, which uniquely identifies the message in the partition.
Within a configurable period, the Kafka cluster retains all published messages, regardless of whether these messages are consumed. For example, if the message storage policy is set to 2 days, a message can be consumed within two days of its release. Then it is discarded to free up space. The performance of Kafka is a constant level unrelated to the data volume, so it is not a problem to keep too much data.
In fact, the only data that each consumer needs to maintain is the location of the message in the log, that is, offset. the offset value is maintained by the consumer. Generally, as the consumer continuously reads messages, the offset value increases. However, the consumer can read messages in any order, for example, it can set Offset to an old value to re-read the previous message.
The combination of the above features makes Kafka consumers very lightweight: they can read messages without affecting clusters and other consumers. You can use the command line to "tail" messages without affecting other consumers that are consuming messages.
Log partitioning can achieve the following goals: first, this makes the number of each log not too large and can be saved on a single service. In addition, each partition can be separately released and consumed, providing a possibility for concurrent topic operations. Each distributed partition has copies in several services of the Kafka cluster, so that the services holding copies can process data and requests together, and the number of copies can be configured. Replica makes Kafka fault-tolerant. Each partition uses one server as the "leader", zero or several servers as the "followers". The leader is responsible for reading and writing messages, and followers replicates the leader. if the leader is down, one of the followers will automatically become the leader. Each service in the cluster will assume two roles at the same time: as the leader of some of its partitions, and as the followers of other partitions, the cluster will have better load balancing. Producersproducer publishes messages to the specified topic and determines the partition to which the messages are published. Generally, the Server Load balancer mechanism randomly selects partitions, but you can also select partitions using specific partition functions. The second type is used. Consumers usually publish messages in two modes: queue mode (queuing) and publish-subscribe mode (publish-subscribe ). In queue mode, consumers can read messages from the server at the same time. Each message is read only by one consumer. In publish-subscribe mode, messages are broadcast to all consumers. Consumers can join a consumer group to compete for a topic. messages in the topic will be distributed to a member in the group. The consumer in the same group can be in different programs or on different machines. If all the consumers are in a group, this becomes the traditional queue mode and implements load balancing in each consumer. If all the consumers are not in different groups, it becomes the publish-subscribe mode, and all the messages are distributed to all the consumers. More often, each topic has a number of consumer groups, each of which is a logical "subscriber". To ensure fault tolerance and better stability, each group is composed of several consumers. This is actually a publishing-subscription mode, but the subscriber is a group rather than a single consumer.
A cluster consisting of two machines has four partitions (P0-P3) and two consumer groups. Group A has two consumer groups with four
Compared with traditional message systems, Kafka can ensure good orderliness. Traditional queues store ordered messages on servers. If multiple consumers consume messages from the server at the same time, the server will distribute messages to the consumer in the order of message storage. Although the server publishes messages in order, messages are asynchronously distributed to various consumers, so the original order may be lost when the messages arrive, this means that concurrent consumption will lead to disordered order. To avoid faults, such message systems generally use the "dedicated consumer" concept. In fact, only one consumer is allowed to consume messages. Of course, this means that concurrency is lost.
Kafka is better at this aspect. With the concept of partitioning, Kafka can provide better orderliness and load balancing when multiple consumer groups are concurrent. Distribute each shard to only one consumer group, so that a shard is consumed by only one consumer in the group, and messages in the shard can be consumed sequentially. Because there are multiple partitions, load balancing can still be performed between multiple consumer groups. Note that the number of consumer groups cannot exceed the number of partitions, that is, the number of partitions allowed for concurrent consumption.

Kafka can only ensure the order of messages in one partition. It is not allowed between different partitions, which can meet the needs of most applications. If the ordering of all messages in a topic is required, only one partition is required for the topic. Of course, only one consumer group consumes the topic.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.