A brief introduction to the introductory chapter of roaming Kafka

Source: Internet
Author: User

Introduction Kafka is a distributed, partitioned, replicable messaging system. It provides the functionality of a common messaging system, but has its own unique design. What is this unique design like? First, let's look at a few basic messaging system terminology:
    • Kafka the message in the topic Unit.
    • The program that publishes the message to Kafka topic becomes producers.
    • The program that will book topics and consume messages becomes consumer.
    • Kafka runs in a clustered manner and can consist of one or more services, each of which is called a broker.
Producers sends messages to the Kafka cluster over the network, and the cluster provides messages to the consumer, as shown in the:  client and server traffic through the TCP protocol. Kafka provides Java clients and supports multiple languages.  topics and logs first look at the abstract concept provided by Kafka: topic. A topic is a generalization of a set of messages. The logs for each topic,kafka are partitioned, as shown in: Each partition consists of a series of sequential, immutable messages that are appended to the partition consecutively. Each message in the partition has a sequential serial number called offset, which is used to uniquely identify the message in the partition.   During a configurable time period, the Kafka cluster retains all published messages, whether or not they are consumed. For example, if a message's save policy is set to 2 days, it can be consumed within two days of the time a message is released. It will then be discarded to free up space. Kafka performance is a constant level independent of the amount of data, so keeping too much data is not a problem.   Actually the only data that needs to be maintained per consumer is the location of the message in the log, That is offset. This offset has consumer to maintain: in general, as consumer constantly read the message, the value of this offset is increasing, but in fact consumer can read the message in any order, such as it can set offset to an old value to reread the previous message 。 The combination of   above makes Kafka consumers very lightweight: they can read messages without affecting the cluster and other consumer. You can use the command line to "tail" messages without affecting other consumer that are consuming messages.   Log partitioning can be achieved for the following purposes: First, this makes the number of each log not too large and can be saved on a single service. In addition, each partition can be published and consumed separately, providing a possibility for concurrent operation topic. Distributed each partition has replicas in several services of the Kafka cluster so that the services that hold the replicas can work together to process data and requests, and the number of replicas is configurable. Replicas make Kafka a fault-tolerant capability.   Each partition consists of one server as "leader", 0 or several servers as "followers", leader is responsible for processing the message read and write, followers to replicate leader. If leader down, One of the followers will automatically become leader. Each service in a cluster plays two roles at the same time: as part of the partition it holdsLeader, as well as other partitions of the followers, so that the cluster will have a good load balance. Producersproducer publishes the message to the topic it specifies, and is responsible for deciding which partition to publish to. It is generally simple to select partitions randomly by the load balancing mechanism, but you can also select partitions by specific partitioning functions. The use of more is the second kind.  consumers publishing messages typically have two modes: queue mode (queuing) and publish-subscribe mode (publish-subscribe). In queue mode, consumers can read messages from the server at the same time, each message is read only by one of the consumer, and the messages in the publish-subscribe mode are broadcast to all consumer. Consumers can join a consumer group to compete for a message in a topic,topic will be distributed to one member of the group. The consumer in the same group can be in different programs or on different machines. If all of the consumer are in a group, this becomes the traditional queue pattern and load balancing is achieved in each consumer. If all consumer are not in different groups, this becomes the publish-subscribe mode, and all messages are distributed to all consumer. More commonly, each topic has a number of consumer groups, each of which is a logical "subscriber", and for fault tolerance and better stability, each group consists of several consumer. This is actually a publish-subscribe pattern, except that subscribers are a group rather than a single consumer. A cluster of two machines has 4 partitions (P0-P3) of 2 consumer groups. There are two Consumerb groups in Group A with 4   compared with traditional message system, Kafka can guarantee the order well. Traditional queues store ordered messages on the server, and if multiple consumers consume messages from the server at the same time, the server distributes the messages to consumer in the order in which they are stored. Although the server publishes the messages sequentially, the messages are distributed asynchronously to each consumer, so when the message arrives it may have lost its original order, which means that concurrent consumption leads to a sequence of confusion. In order to avoid failure, such a message system usually uses the concept of "dedicated consumer", in fact, only allow a consumer to consume messages, of course, this means the loss of concurrency.   In this respect Kafka do better, through the concept of partitioning, Kafka can provide better ordering and load balancing in the case of multiple consumer groups concurrency. Distribute each partition to only one consumer group, so that a partition is only one of the groupsConsumer consumption, you can order the message to consume this partition. Because there are multiple partitions, load balancing can still be done across multiple consumer groups. Note that the number of consumer groups cannot be more than the number of partitions, that is, how many partitions are allowed for concurrent consumption.  kafka can only guarantee the ordering of messages within a partition, which is not possible between different partitions, which can meet the needs of most applications. If the order of all messages in the topic is required, then only one partition is allowed for this topic, and of course only one consumer group consumes it.

A brief introduction to the introduction to roaming Kafka

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.