Kafka description 1. Brief Introduction to Kafka

Source: Internet
Author: User
Background:Various Application Systems in today's society, such as business, social networking, search, and browsing, constantly produce information like information factories. In The Big Data era, we are faced with the following challenges:
  1. How to collect this huge information
  2. How to analyze it
  3. How to implement the above two points in a timely manner
These challenges form a business demand model, that is, information about producer production (produce) and consumer consumption (consume) (processing and analysis, A bridge between the two is required-the message system. At a micro level, such requirements can also be understood as how messages are transmitted between different systems.
Kafka was born:Open-source by linked-in
Kafka is a framework for solving such problems. It enables seamless connection between producers and consumers. Kafka-High-output distributed message system (a high-throughput distributed messaging system)
Kafka features:It describes its own design as unique. Let's take a look at how it exists:
  • Fast: A single Kafka service can process several hundred MB of data sent by thousands of clients per second.
  • Scalability: A single cluster can be used as a big data processing hub to centrally process various types of businesses
  • Persistence: messages are persistently stored on disks (Tb-level data can be processed, but the data processing efficiency remains extremely high), and the backup fault tolerance mechanism is available.
  • Distributed: focuses on the big data field and supports distributed processing. clusters can process millions of messages per second.
  • Real-time: messages produced can be consumed by consumers immediately.

Kafka components:
  • Topic: the folder where messages are stored, that is, the topic.
  • Producer: the producer of the message to the topic.
  • Consumer: the consumer that subscribes to a topic to consume messages.
  • BROKER: a Kafka service instance is a broker.
For example, as you can see, the messages produced by the producer are sent to the Kafka cluster over the network, while the consumer consumes messages from the cluster.
Topic and partition:
When a message is sent to a topic, it is essentially a folder, and the topic is composed of partition logs (partition log). Its organizational structure is shown as follows:
We can see that the messages in each partition are ordered, and the generated messages are constantly appended to the partition log. Each message in the message is assigned a unique offset value. The Kafka cluster stores all the messages, no matter whether the messages are consumed or not. We can set the message expiration time, and only the expired data will be automatically cleared to free up disk space. For example, if we set the message expiration time to 2 days, all messages in the two days will be saved to the cluster, and the data will be cleared only after two days. The metadata to be maintained by Kafka only has one -- the Offset Value of the consumed message in partition. Each time a consumer consumes a message, the offset value is increased by 1. In fact, the message status is completely controlled by the consumer. The consumer can track and reset the offset value. In this case, the consumer can read messages from any location. There are multiple considerations for storing message logs in the form of partition. First, it is convenient to expand in the cluster. Each partition can be adjusted to adapt to the machine where it is located, A topic can be composed of multiple partitions, so the entire cluster can adapt to the random size of data. The second is to improve concurrency, because it can read and write data in units of partitions. Distributed:These partitions are distributed on each server in the cluster, and each partition can have multiple backups in the cluster. The number of backups is configurable. Each partition has a leader server, and other backup servers are called followers. Only the leaderserver can process all read/write requests on this partition, other followers passively copy the data on the leader. If a leader fails, one server in followers will automatically upgrade to a leader. Therefore, in fact, every server in the cluster plays a partition leaderserver, and other partition followerserver.
Producers:Producer can publish a message to a topic based on its own choice, and producer can decide which partition to publish the message to this topic. Of course, we can choose a simple partition selection algorithm provided by the API, you can also implement a partition Selection Algorithm by yourself.
Consumers:Message transmission usually consists of two modes: queuing (Queue) and publish-subscribe (publish-subscribe)
  • Queuing: each consumer removes a message from the message queue.
  • Pub-scrib: the message is broadcast to every consumer.
Kafka implements the two modes-consumergroup at the same time by providing an abstraction of consumer. A consumer instance must specify a consumergroup name for itself. If all the instances use the same consumergroup name, the consumer will work in the queuing mode; assuming that all instances use different consumer group names, they work in public-subscribe mode.
For example, we can see that clusters with two servers share P0 ~ P3, two consumer groups, consume partition in queuing mode and consume messages in pub-scrib mode between groups. Message sequence:How does Kafka ensure the order of message consumption? As mentioned above, the order of messages in a partition is ordered, but Kafka only ensures that messages are ordered in a partition. If you want to make the messages in the entire topic orderly, you can set only one partition for a topic.

















Kafka description 1. Brief Introduction to Kafka

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.