Kafka description 1. Brief Introduction to Kafka

Last Update:2014-09-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Background:Various Application Systems in today's society, such as business, social networking, search, and browsing, constantly produce information like information factories. In The Big Data era, we are faced with the following challenges:

How to collect this huge information
How to analyze it
How to implement the above two points in a timely manner

These challenges form a business demand model, that is, information about producer production (produce) and consumer consumption (consume) (processing and analysis, A bridge between the two is required-the message system. At a micro level, such requirements can also be understood as how messages are transmitted between different systems.
Kafka was born:Open-source by linked-in
Kafka is a framework for solving such problems. It enables seamless connection between producers and consumers. Kafka-High-output distributed message system (a high-throughput distributed messaging system)
Kafka features:It describes its own design as unique. Let's take a look at how it exists:

Fast: A single Kafka service can process several hundred MB of data sent by thousands of clients per second.
Scalability: A single cluster can be used as a big data processing hub to centrally process various types of businesses
Persistence: messages are persistently stored on disks (Tb-level data can be processed, but the data processing efficiency remains extremely high), and the backup fault tolerance mechanism is available.
Distributed: focuses on the big data field and supports distributed processing. clusters can process millions of messages per second.
Real-time: messages produced can be consumed by consumers immediately.

Kafka components:

Topic: the folder where messages are stored, that is, the topic.
Producer: the producer of the message to the topic.
Consumer: the consumer that subscribes to a topic to consume messages.
BROKER: a Kafka service instance is a broker.

For example, as you can see, the messages produced by the producer are sent to the Kafka cluster over the network, while the consumer consumes messages from the cluster.
Topic and partition:
When a message is sent to a topic, it is essentially a folder, and the topic is composed of partition logs (partition log). Its organizational structure is shown as follows:
We can see that the messages in each partition are ordered, and the generated messages are constantly appended to the partition log. Each message in the message is assigned a unique offset value. The Kafka cluster stores all the messages, no matter whether the messages are consumed or not. We can set the message expiration time, and only the expired data will be automatically cleared to free up disk space. For example, if we set the message expiration time to 2 days, all messages in the two days will be saved to the cluster, and the data will be cleared only after two days. The metadata to be maintained by Kafka only has one -- the Offset Value of the consumed message in partition. Each time a consumer consumes a message, the offset value is increased by 1. In fact, the message status is completely controlled by the consumer. The consumer can track and reset the offset value. In this case, the consumer can read messages from any location. There are multiple considerations for storing message logs in the form of partition. First, it is convenient to expand in the cluster. Each partition can be adjusted to adapt to the machine where it is located, A topic can be composed of multiple partitions, so the entire cluster can adapt to the random size of data. The second is to improve concurrency, because it can read and write data in units of partitions. Distributed:These partitions are distributed on each server in the cluster, and each partition can have multiple backups in the cluster. The number of backups is configurable. Each partition has a leader server, and other backup servers are called followers. Only the leaderserver can process all read/write requests on this partition, other followers passively copy the data on the leader. If a leader fails, one server in followers will automatically upgrade to a leader. Therefore, in fact, every server in the cluster plays a partition leaderserver, and other partition followerserver.
Producers:Producer can publish a message to a topic based on its own choice, and producer can decide which partition to publish the message to this topic. Of course, we can choose a simple partition selection algorithm provided by the API, you can also implement a partition Selection Algorithm by yourself.
Consumers:Message transmission usually consists of two modes: queuing (Queue) and publish-subscribe (publish-subscribe)

Queuing: each consumer removes a message from the message queue.
Pub-scrib: the message is broadcast to every consumer.

Kafka implements the two modes-consumergroup at the same time by providing an abstraction of consumer. A consumer instance must specify a consumergroup name for itself. If all the instances use the same consumergroup name, the consumer will work in the queuing mode; assuming that all instances use different consumer group names, they work in public-subscribe mode.
For example, we can see that clusters with two servers share P0 ~ P3, two consumer groups, consume partition in queuing mode and consume messages in pub-scrib mode between groups. Message sequence:How does Kafka ensure the order of message consumption? As mentioned above, the order of messages in a partition is ordered, but Kafka only ensures that messages are ordered in a partition. If you want to make the messages in the entire topic orderly, you can set only one partition for a topic.

Kafka description 1. Brief Introduction to Kafka

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Kafka description 1. Brief Introduction to Kafka

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Kafka description 1. Brief Introduction to Kafka

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support