Introduction to Kafka Basics

Source: Internet
Author: User
Tags zookeeper

Kafka resolution

Www.jasongj.com/2015/01/02/Kafka Depth Analysis

Terminology:broker
The Kafka cluster contains one or more servers, which are called broker Topic
Each message published to the Kafka Cluster has a category, which is called topic. (Physically different topic messages are stored separately, and logically a topic message is saved on one or more broker but the user only needs to specify the topic of the message to produce or consume the data without having to worry about where the data is stored) Partition
Parition is a physical concept where each topic contains one or more partition, and the parition quantity can be specified when the topic is created. Each partition corresponds to a folder under which the partition data and index files are stored Producer
Responsible for posting messages to Kafka broker Consumer
Consumer messages. Each consumer belongs to a specific consumer group (you can specify group name for each consumer, or the default group if you do not specify group name). When using the consumer High level API, a message from the same topic can only be consumed by one consumer within the same consumer group, but multiple consumer group can consume the message simultaneously.

Architecture:

A typical Kafka cluster contains a number of producer (either Page view generated from the Web front-end, or server logs, System CPUs, memory, etc.), several broker (Kafka support level extensions, more general broker numbers, The higher the cluster throughput rate, a number of consumer groups, and a zookeeper cluster. Kafka manages cluster configuration through zookeeper, elects leader, and rebalance when consumer group changes. Producer uses push mode to publish messages to Broker,consumer to subscribe to and consume messages from broker using pull mode.

Consumer Group
High level consumer the offset of the last message read from a partition in the zookeeper ( Kafka from version 0.8.2 also supports the use of offset in zookeeper and the use of offset in a dedicated Kafka topic). This offset is saved based on the name given to Kafka by the client program, known as the consumer Group. The Consumer group is the overall Kafka cluster, not some topic. Each of the high level consumer instances belongs to a consumer group, which, if unspecified, belongs to the default group.

Push vs. Pull

As a messaging System,kafka follows the traditional approach, selecting producer to broker push messages and consumer from broker pull messages. Some logging-centric system, such as Facebook's scribe and Cloudera's flume, adopt very different push modes. In fact, both the push mode and the pull model have their pros and cons.
Push mode is difficult to adapt to consumers with different consumption rates because the message delivery rate is determined by broker. The goal of the push mode is to deliver the message as quickly as possible, but it can easily cause consumer to process messages, typically by denial of service and network congestion. The pull model can consume messages at the appropriate rate based on the consumer's ability to consume.

Topic & Partition

Topic can logically be considered a queue. Each consumption must specify its topic, which can be simply understood as having to indicate which queue to put the message in. To allow Kafka throughput to scale horizontally, the topic is physically divided into one or more partition, and each partition corresponds to a physical folder that stores all messages and index files for this partition.

Kafka guaranteed that only one consumer in the same consumer group would consume a message, in fact, Kafka guaranteed that every consumer instance in the stable state would consume only one or more specific partition data. A partition data is only consumed by a particular consumer instance. That is, Kafka allocates the message in partition, rather than as an allocation unit per message. The disadvantage of this design is not to guarantee the same consumer group consumer uniform consumption of data, the advantage is that each consumer do not have to communicate with a large number of broker, reduce communication overhead, but also reduce the difficulty of distribution, and achieve simpler. In addition, because the data in the same partition is orderly, the design ensures that the data in each partition can be consumed in an orderly fashion.
The conclusion is that each consumer instance consumes only one or more specific partition data, and a partition data is consumed only by a particular consumer instance.

If the number of consumer (only 1 messagestream per consumer) is less than the partition number in a consumer group, at least one consumer consumes multiple partition data. If the number of consumer is the same as the number of partition, then exactly one consumer consumes a partition of data. If the number of consumer is more than partition, some consumer cannot consume any of the messages under the topic.

The Consumer rebalance algorithm is as follows:

Sort all the partirtion under the target topic, and deposit it in PT
Sort all consumer under a consumer group, in CG, the first I consumer as CI
N=size (PT)/size (CG), rounding up
To lift the consumption rights of CI to the original assigned partition (I starting from 0)
Assign the I∗n to (i+1) ∗n−1 partition to CI
At present, the latest version (0.8.2.1) Kafka's consumer rebalance control strategy is done by each consumer by registering zookeeper on watch. Each consumer is created to trigger the consumer group's rebalance, which starts with the following process:

High level consumer registers its ID under its consumer group at startup and the path on zookeeper is/consumers/[consumer Group]/ids/[consumer ID]
Register on the/consumers/[consumer group]/ids watch
Register on the/brokers/ids watch
If consumer creates a message flow through topic filter, it also creates watch on the/brokers/topics
Force yourself to start the rebalance process within its consumer group

Under this strategy, the increase or decrease of each consumer or broker triggers consumer rebalance. Since each consumer is only responsible for adjusting the partition it consumes, in order to ensure the consistency of the entire consumer group, when a consumer triggers rebalance, the consumer All other consumer in the group should also trigger the rebalance.

For more in-depth information, see Http://www.infoq.com/cn/profile/%E9%83%AD%E4%BF%8A

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.