Kafka Detailed introduction of Kafka

Source: Internet
Author: User
Tags message queue

Background:
In the era of big data, we are faced with several challenges, such as business, social, search, browsing and other information factories, which are constantly producing various kinds of information in today's society:

How to collect these huge information
how to analyze how it is       
done in time as above two points

The above challenges form a business demand model, which is the information of producer production (produce), consumer consumption (consume) (processing analysis), and between producers and consumers, a bridge-messaging system that communicates both.
From a micro level, this requirement can also be understood as how messages are delivered between different systems.

Kafka Birth : Open Source by linked-in

Kafka-is a framework for solving such problems, which enables a seamless connection between producers and consumers.
kafka-high-yield distributed messaging System (A high-throughput distributed messaging systems)

Kafka features : it describes its own design is unique, first look at how it has the superiority: fast: A single Kafka service can handle hundreds of MB of data from thousands of clients per second. Scalability: A single cluster can serve as a large data processing hub that centralizes all types of business persistence: Messages are persisted to disk (terabytes of data-level data can be processed but remain highly data-efficient), and backup-tolerant mechanisms are distributed: focusing on big data, supporting distributed, The cluster can process millions messages per second in real time: Produced messages can be consumed immediately by consumers

Components of the Kafka:

Topic: The directory where the message resides is the subject
Producer: The party that produces the message to topic
Consumer: A party that subscribes to topic consumer messages    
Broker:kafka The service instance is a Broker

As shown in the following figure, messages from producer production are sent over the network to Kafka cluster, where consumer consume messages

Topic and partition:

The message is sent to a topic, which is essentially a directory, and topic consists of some partition Logs (partition log), and its organizational structure is shown in the following figure:

We can see that the messages in each partition are ordered, and the produced messages are appended to the partition log, each of which is given a unique value of offset.

The Kafka cluster stores all messages, regardless of whether the message is consumed or not, and we can set the expiration time of the message, and only the expired data is automatically cleared to free up disk space. For example, if we set the message expiration time to 2 days, all messages within the 2 days will be saved to the cluster, and the data will only be purged for more than two days.

Kafka the metadata needed to be maintained is only one – the offset value of the consumer message in partition, and the offset will be added 1 for each message consumed consumer. In fact, the state of the message is completely controlled by consumer, consumer can track and reset this offset value, so that consumer can read any location of the message.

The message log is stored in the form of partition multiple considerations, first, easy to expand in the cluster, each partition can be adjusted to adapt to its machine, and a topic can have multiple partition composition, so the entire cluster can adapt to any size of data The second is to increase concurrency because it can be read and written in partition.

Distributed:
These partitions are distributed across each server in the cluster, and each partition can have multiple backups in the cluster, and the number of backups is configurable.

Each partition has a leader server, and the other backed up server is called followers, and only the leader server handles all read and write requests on this partition. Other followers, however, passively replicate the data on the leader. If a leader is hung, one of the servers in the followers is automatically upgraded to leader. So, in fact, each server in the cluster acts as a partition leader server, and other partition follower servers.

Producers:
Producer can be based on their own choice to publish the message to a topic, producer can also decide to publish the message to this topic which partition, of course, we can choose the API provides a simple partition selection algorithm, you can also implement a partition selection algorithm.

Consumers:
Messaging typically consists of two modes, queuing (queue) and Publish-subscribe (publish-subscribe)

Queuing: Each consumer takes a message from the message queue
Pub-scrib: The message is broadcast to each consumer     

Kafka implements both modes-consumergroup by providing an abstraction of consumer. Consumer instances need to give themselves a consumergroup name, and if all instances use the same consumergroup name, then these consumer will work in queuing mode. If all instances use different consumergroup names, they work in public-subscribe mode.

As shown in the following illustration: A cluster with two servers has a total of p0~p3 four partition, two consumer group, which consumes partition in queuing mode within group, and consumes Pub-scrib mode between groups.

Message order:
Kafka is how to ensure the order of message consumption. Before talking about partition, the order of messages in a partition is ordered, but Kafka only guarantees that the messages are ordered in a partition, and if you want to order the messages in the entire topic, then a topic is only set to one partition.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.