Kafka (i): First knowledge Kafka__kafka

Source: Internet
Author: User
Tags message queue rabbitmq
first, Message queue-related concepts JMS ==> JAVA API

The JMS, the Java Message Service Application interface, is a Java platform-oriented messaging middleware (MOM) API for sending messages and communicating asynchronously between two applications, or distributed systems. The Java Messaging Service is a platform-independent API, with most MOM providers supporting JMS.

From the perspective of use, JMS and JDBC play a similar role, the user is based on the corresponding interface with the implementation of JMS services to carry out the relevant operations. The common MQ that implements the JMS protocol is ACTIVEMQ.

JMS provides two message models, Peer-2-peer (point-to-point), and Publish-subscribe (publish subscription) models. When the point-to-point model is used, the message is sent to a queue, which can only be consumed by one consumer. When a publishing subscription model is used, messages can be consumed by multiple consumers. In the release subscription model, producers and consumers are completely independent and do not need to perceive each other's presence.

The message how to reach the consumer driven by message-routing from the producer end is determined. In JMS, message routing is very simple, with producer and consumer linking to the same queue (Peer-to-peer) or topic (PUB/SUB) to implement message routing. Jmsconsumer also supports message selector, which, through message selectors, consumer can consume only those messages that have passed selector filtering. In the JMS model, the message routing mechanism is illustrated as follows:

AMQP Protocol

That is, advanced message Queuing Protocol, an application-tier standard advanced Messaging queuing protocol that provides unified messaging services, is an open standard for application layer protocols and is designed for message-oriented middleware. The client and message middleware based on this protocol can deliver messages, which are not restricted by different products of client/middleware, different development languages and so on. The implementation in Erlang has RABBITMQ and so on.

From this point of view, AQMP can use HTTP to analogy, do not care about the implementation of the language, as long as everyone in accordance with the corresponding data format to send message requests, different languages of the client can and different languages of the server link.

In AMQP, there are some differences between message routing (messagerouting) and JMS, and the role of exchange and binding is increased in AMQP. Producer sends the message to Exchange,binding to decide that the exchange message should be sent to that queue, while consumer directly consume the message from the queue. The queue and exchange bind have consumer to decide. AMQP's routing scheme diagram process is as follows:

Contrast Items JMS AMQP
Defined Java API Wire-protocol
Cross-language Whether Is
Cross platform Whether Is
Model Provides two kinds of message models: (1), Peer-2-peer (2), pub/sub PUB/SUB provides five message models: (1), direct Exchange (2), Fanout Exchange (3), topic Change (4), headers Exchange (5), System Exchange
Support Message types Multiple message types: TextMessage, Mapmessage, Bytesmessage, Streammessage, ObjectMessage Byte[] When actually applied, there are complex messages that can be serialized and sent.
Two, common MQ contrast

Kafka contrast Activemq, rabbitmq the biggest difference:
-Kafka Support Dynamic expansion
- ActiveMQ, RABBITMQ the message will be deleted after the consumer has been consumed, and the message will be kept for two days after Kafka consumer consumption. Third, what is Kafka

Http://kafka.apache.org/intro

Official Introduction:
Apache Kafka®is a distributed streaming platform
Kafka is a distributed data stream processing system

1.1 Main three functions:

1. Publish and subscribe to Streams of records, similar to a message queue or enterprise messaging system.
    Publish a subscription record stream, similar to Message Queuing or Enterprise message system

2. Store streams of records in a fault-tolerant durable way.
    The system is capable of storing data with fault tolerance

3. Process streams of records as they occur.
    The system is capable of real-time processing when the data flow is triggered

1.2 Usage Scenarios

1. Building real-time streaming data pipelines that reliably get data between systems or applications
    need to stream each other between systems or applications Interactive processing of real-time systems

2. Building real-time streaming applications that transform, or react to the streams of data
    needs to be converted or processed in a timely manner in the data stream

The reason for 1.3 Kafka speed is fast
-Use 0 Copy technology to make data transfer faster
-The use of bulk data reading, reduce disk IO operation
-To ensure that historical messages can continue to be read, the offset point is provided, which is read in order of the message
-Network transmission using data compression format, so faster transmission, less bandwidth consumption
-Kafka data can be set up so that the data continues to be available after a problem occurs. Iv. Basic knowledge of Kafka

Broker: Message middleware handles nodes, a Kafka node is a broker, and multiple broker can form a Kafka cluster.
Topic: A kind of message, Kafka cluster can be responsible for multiple Topic distribution at the same time.
partition:topic physical partitions, a topic can be divided into multiple Partition, and each Partition is an ordered queue. The
Segment:partition is physically composed of multiple Segment.
offset: Each partition consists of a sequence of sequential, immutable messages that are appended sequentially to the partition. Each message in the partition has a sequential serial number called offset, which is used to partition uniquely identify a message. Because Kafka message consumption also continued to save two days, so designed a subscript offset, other such as RABBITMQ message consumption after automatic deletion, so do not need this.
Producer: Responsible for posting messages to Kafka broker.
Consumer: Message consumer, client that reads messages to Kafka broker.
Consumer Group: Each Consumer belongs to a specific Consumer group.

This diagram represents a Kafka cluster with two Kafka servers, one with the same topic in Server1 and Server2, and two in Server1 with three partitions topic.

A description of a topic partition in which the record is appended to each partition.

The subscript represents offset, and each consumer can read a different offset. If consumer a reads the subscript is 9, the consumer reads the subscript is 11

Producer when sending a message, you can specify which partition to send to, or you can use a specified equalization policy, which is stored by a specified equalization to a partition, and when not configured, the default is to use a random equilibrium policy store.

The following figure: The two-server Kafka cluster manages four partitions (P0-P3) for two consumer groups. Consumer group A has two consumer instances, and consumer group B has four consumer instances.

In Kafka's design, there can be several different group to consume the same topic message at the same time, as shown in figure, we have two different group at the same time consumption, their consumption record position offset each item, do not interfere with each other. For a group, the number of consumers should not be more than the number of partitions, because in a group, each partition can only be bound to one consumer, that is, a consumer can consume multiple partitions, a partition can only be consumed by one consumer, therefore, If the number of consumers in a group is greater than the number of partitions, the excess consumer will not receive any information.

Kafka messages are stored in topic, topic has multiple partitions, and for the security of the data, there are multiple replica per partition, and the message store in the partition consists of multiple segment.

As in figure, a topic has n partitions, then each partition has n copies, and multiple replicas elect a leader, and when a producer push message to leader, the other follower synchronizes from the leader pull message.

Note that both producers and consumers interact only with leader, follower only with leader, and when leader hangs, a follower will be returned between leader.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.