I'll take you to meet Kafka.

Source: Internet
Author: User

Kafka is a high-throughput distributed publish-subscribe messaging system that handles all the action flow data in a consumer-scale website. You can also think of it as a publish-subscribe message for distributed commit logs, in fact the Kafka official web site explains it.


  A few key terms you need to know about KAFK


Topics:kafka receive a variety of messages


Producers: Send Message to Kafka


Consumers: Subscribers who receive messages from Kafka


Broker: A KAKFA cluster of one or more servers


is an example of a producer sending a consumer through a Kafka cluster

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/7E/9D/wKioL1cF0lai42d2AAAbmR3K_Pk543.png "title=" Kafka-cluster.png "alt=" Wkiol1cf0lai42d2aaabmr3k_pk543.png "/>

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

Topics and logs


A topic is the process of categorizing published messages, maintaining a partition log for each Topic,kafka cluster, such as

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M00/7E/A1/wKiom1cF0b6R1fyGAAA86y6PuK0521.png "title=" Kafka-topic.png "alt=" Wkiom1cf0b6r1fygaaa86y6puk0521.png "/>

Each partition is numbered, and messages for each partition are numbered according to the submitted log. The message in the partition is assigned a unique number, which is called offset to identify the message in the partition.


The Kafka cluster will hold all the published messages, regardless of whether they are consumed by the consumer for a fixed period of time. For example, the retention period for the message log setting is 2 days, and within 2 days of the message release, the consumer can consume and then discard the message to free up space. Kafka performance is independent of data space, so saving large amounts of data is not a problem for Kafka.


In fact, it is "offset" to store the metadata for each consumer location in the log. Offset is controlled by the consumer: in general, offset works when a consumer line reads a message. But in practice, consumers can read messages in whatever way they want, because consumers can reset existing offset.


This mechanism shows that Kafka consumers are very easy to handle-the processing of messages has little effect on the cluster or other consumers. For example, we can use the "tail" topic in command-line tools to process messages without changing the existing consumer.


There are several different purposes for log partitioning. First, you can avoid too large a log file on a single server. Each separate partition must be located on the same server and processed on the same server, but a topic may have multiple partitions, which guarantees processing of large amounts of data. Second, partitioning can be used as a unit of parallel processing.


  Distributed


The logs are distributed across different partitions in the Kafka cluster, and each server processes the data and requests the shared partition. Each partition can be replicated by configuring the server's fault tolerance mechanism.


Each partition has a server as a "leader (Master Node)", with 0 or more servers as "followers (from node)", and the master node can read and write data from the partition, but only the messages from the primary node are replicated from the node. If the primary node goes down, one of the slave servers automatically becomes the new primary server. The primary server handles data from some partitions, processing data from the server for other servers, thus preserving the balance of the cluster.


Producer (producers)


The producer can decide which topic the message is sent to, and the producer can choose which partition to send the message within the topic. This simple cyclic load balancing method can be done in semantic partitioning. This partition is usually completed within 1 seconds.


Consumer (consumers)


Traditional Message Queuing is handled in two ways: sequential processing and publish/subscribe processing. In sequential processing, consumers are read in the order in which messages enter the message queue. The Publish/Subscribe method is to broadcast the message to all consumers. Kafka provides an abstract way-consumer grouping (consumer group) to satisfy the above two ways of handling messages.


Each consumer has a group name, and messages posted to topic are passed to the consumer object only if the subscribed consumer is in the corresponding group. A consumer object can exist in a different process or host.


If all consumer objects have the same group name, this is like a traditional sequential queue, where consumers distribute the messages evenly.


If all of the consumer objects have different group names, this is like a publish/subscribe mode, and the consumer accepts only the subscribed messages.


In general, consumers who subscribe to a topic (topic) have multiple in the same group, which is for system stability and fault tolerance. is a concrete example.

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/7E/A1/wKiom1cF0eLB4uSXAABZmEMcpDs622.png "title=" Kafka-cluster-server.png "alt=" Wkiom1cf0elb4usxaabzmemcpds622.png "/>

Kafka has a higher ranking reliability than traditional Message Queuing.


Traditional Message Queuing when a message is saved sequentially to the server, if more than one consumer reads the message from the queue, the server sends the message sequentially. However, although the server sends the message sequentially, the consumer receives the message asynchronously, so the message that the consumer receives may not be sequential, but the consumer does not know that the message is in a disorderly order. To avoid this, traditional Message Queuing usually allows only one process to read messages, which means that the processing of messages is one-way rather than parallel.


Kafka has a better way of dealing with this by using partitions in the topic to do parallel processing. The Kafka ensures both sequential output and a balance between consumers. By assigning a partition to a topic, the message is distributed to consumers in the same group, ensuring that the consumers within each partition are unique, and that the messages are read sequentially. Because of the load balancing of multiple consumer objects through partitioning, consumers in the same consumer group cannot exceed partitions.


Kafka only implements the ordering of messages within a partition, not the same topic in different partitions. For most applications, data partitioning and intra-partition data sequencing is sufficient. If you want all the messages to be in order, there can only be one partition, which means there can only be one consumer within a consumer group. In this case, the processing of the messages is not parallel.


Reliability


Messages are appended to the topic's partition in the order in which the producers send them. For example, if a producer sends the same message two times, called M1,m2,m1 First, then M1 will have a smaller offset and will appear in the log earlier than M2.


Consumers see messages in the order in which they are stored in the log.


For the topic of copying n times, even if the N-1 server fails, no messages that have been submitted to the log are lost


Usage Scenarios


Message Agent

Kafka can replace some of the traditional message agents. The message broker has many usage scenarios, such as decoupling the data handler, caching the unhandled message, and so on. Compared to most message processing systems, Kafka has better throughput, built-in partitioning, replication, and fault tolerance, which allows Kafka to handle large-scale messaging applications well.


Activity Tracking


Kafka was originally used to provide real-time tracking of data related to website user behavior, such as statistical pv,uv.


Monitoring statistics


Kafka are often used to manipulate monitoring data, such as aggregating statistics from distributed applications.


Log Collection


Our services are usually deployed on multiple computers, and the running logs of the servers are scattered across the machines. Kafka is often used to collect logs from various servers and then unify into HDFs or other offline storage systems, such as Facebook's Scribe, which uses Kafka when collecting logs.


Stream processing


Many users complete the original data of the periodic summary, processing and other processing, the results of the operation into a new topic write Kafka for more in-depth processing. For example, the first step in the article is to crawl the text content of the user's subscription from the RSS with a crawler, and then publish the content to articles topic. The next handler, after formatting the content under Articles topic, is published to the format topic. The final handler attempts to recommend these formatted content to the appropriate user. Storm and Samze are a popular framework for dealing with this kind of business.


Event capture


Changes in the state of business are recorded in chronological order, and this programming method is taken as event acquisition. The Kafka supports large-scale log data storage, which makes the Kafka an ideal backend module for event-gathering applications.


This article is from the "This person's IT World" blog, be sure to keep this source http://favccxx.blog.51cto.com/2890523/1761228

I'll take you to meet Kafka.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.