Kafka (ii): basic concept and structure of Kafka

Source: Internet
Author: User

I. Core concepts in the Kafka

Producer: specifically the producer of the message
Consumer: The consumer of the message specifically
Consumer Group: consumer group, can consume topic partition messages in parallel
Broker: cache proxy, one or more servers in the KAFA cluster are collectively referred to as Broker.
Topic: refers specifically to different classifications of Kafka processed message sources (feeds of messages).
Partition: Topic A physical grouping, a Topic can be divided into multiple partition, each partition an ordered queue. Each message in the partition will be assigned an ordered ID (offset)
Message : messages are the basic unit of communication, and each producer can post some messages to a topic (subject)
producers (a verb): message and data producers, the process of releasing messages to a topic of Kafka is called producers
consumers (a verb): messages and data consumers, the process of subscribing to topics and processing their published messages is called consumers

Two. Kafka's logical architecture

Note: when there are too many messages in a topic, the topic is partitioned and the messages are divided into different partition.

Why to Partition:
is to divide and conquer a large number of data, the data partition, different consumer can consume different partitions of data, different consumer to the data consumption can be made parallel, so that the speed of data processing can be accelerated.

The process of sending messages:
1.Producer publishes the message to the partition of the specified topic according to the specified partition method (Round-robin, hash, etc.)
When a 2.kafka cluster receives a message from producer, it persists it to the hard disk and retains the message for a specified length of time (configurable) without paying attention to whether the message is being consumed.
3.Consumer pull data from the Kafka cluster and control the offset of the get message

Three. Kafka's producers

1.producers definition:
Message and data producers, the process of releasing messages to a topic of Kafka is called produces

2. You can specify the partition of the message:
Producer publishes the message to the specified topic, and producer can also decide which partition the message belongs to (that is, the producer can specify topic to put the sent message in a partition1, or Partition2) (note: This mechanism can be understood as a form of load balancing, rotation), for example, based on "Round-robin" or through other algorithms, etc. ()

3. Send asynchronously:
Kafka supports asynchronous bulk sending of messages. Bulk delivery can effectively improve the delivery efficiency. The asynchronous send mode of the Kafka producer allows for bulk sending, first caching the messages in memory and then sending them in batches at a time .

Four. Kafka's broker

1.Broker: (The broker can be understood as a Kafka server) cache proxy, one or more servers in the Kafka cluster are collectively referred to as broker.
Note:
Kafka Support Message Persistence, the producer produces the message, Kafka will not directly pass the message to the consumer, but first to store in the broker, persistence is saved in the Kafka log file.

2.Message is persisted in the broker by means of a log append (that is, the new message is saved in the last side of the file, is ordered). and partitioning (patitions)

3. To reduce the number of disk writes, the broker temporarily buffer the message and flush to disk when the number (or size) of the message reaches a certain threshold, reducing the number of disk IO calls.

Five. Kafka's broker stateless mechanism

1. The broker does not have a copy mechanism and the broker's messages will not be available once the broker is down.

Note: Broker does not have a copy, that broker downtime how to solve?
Although the broker does not have a copy, the message itself has a copy and is not lost. The broker simply reads the log of the message after the outage.

2. Broker does not save the status of the Subscriber, which is saved by the subscriber itself .

3. Stateless causes the deletion of messages to be a challenge (potentially deleted messages are being subscribed), Kafka takes a time-based SLA (Service level Assurance), and the message is saved for a certain amount of time (typically 7 days) after it is deleted.

4. The message subscriber can rewind back to any location to re-consume, when the subscriber fails, you can select the minimum offset (ID, that is, offset) to re-read the consumer message.

Note: 1. How does the consumer determine that the message should be consumed and that the message has already been consumed?
Zookeeper would help to record that the message had been consumed, and that the message had not been consumed

2. How quickly does the consumer find the message that it is not consuming?
This implementation depends on the Kafka "sparse Index"

Six. Composition of the Kafka message

1.Message message:
is the basic unit of communication, and each producer can post some messages to a topic (subject)

The message in 2.Kafka is organized in topic as the basic unit, and the different topic are independent of each other. Each topic can be divided into several different partition (each topic has several partition specified when the topic is created), and each partition stores part of the message.

Each message in 3.partition contains the following three properties:
Offset, which is the unique indication of the message, through which a unique message can be found
Corresponding type: Long
Messagesize corresponding Type: Int32
Data is the specific content of the message

Note: 1. The message is stateless, and the message's order of consumption is not related
2. Each partition can only be consumed by a single consumer, but a consumer can consume multiple partition, which is a one-to-many relationship

Seven. The purpose of Kafka's partition partition

1.kafka is based on file storage. By partitioning, you can spread the log content across multiple servers to avoid file size up to the upper limit of a single disk, and each Partiton is saved by the current server (Kafka instance);

2. A topic can be sliced more than any number of partitions to save/consume the message efficiently.

3. The more partitions means that more consumer can be accommodated, effectively increasing the capacity of concurrent consumption.

Eight. Kafka Consumersø messages and data consumers, the process of subscribing to topics and processing their published messages is called consumers. Ø In Kafka, we can assume that a group is a "subscriber", and that each partions in a topic is only consumed by a consumer in a "subscriber", but a Consumer can consume messages in multiple partitions (when consumer data is less than partions) Øø Note: Kafka design principle determines, for a topic, The same group can not have more than partitions number of consumer concurrent consumption, otherwise it will mean that some consumer will not be able to get the message.

Kafka (ii): basic concept and structure of Kafka

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.