Apache Top Project Introduction 2-kafka_kafka

Source: Internet
Author: User

Apache Top Project Introduction Series-1, we start from Kafka. Why Popular + name Cool.


Kafka official website is seen relatively simple, straight to the Web site, "Kafka is a highly huff and puff distributed messaging system." Kafka initially started with LinkedIn, which was originally used as a basis for LinkedIn to manage activity flow (PV, user behavior Analysis, search) and operational data processing pipline.

Because of its distributed and high throughput is widely used, such as with Cloudera, Hadoop, Storm, Spark etc.

Kafka first as a message system, provides basic functions, such as decoupling, sequencing, asynchrony and so on. At the same time, high-quality design concept to support high throughput, provide O (1) Time responsible degree of sustainability, data level reached TB/PB above, support off-line and real-time processing, that is, with hadoop,storm docking, support level scale out.


Schema diagram:


As you can see, Kafka is a distributed architecture design (of course DT times, does not support horizontal scale out cannot survive), front producer concurrency (support batch) push messages to Kafka specific topic Cluster Server broker, Each topic contains multiple partition to facilitate horizontal scaling, and consumers consumer to the broker server pull through consumer group to obtain messages. Kafka through ZK Management cluster configuration, election leader, and rebalance. Message mode is push/pull.


Let's build a Kafka Cluster service:

Sent through ZK, consumer message:
Use Java to produce/consume messages:


More straightforward, here attention can be sent in bulk message, not all message middleware can be sent in bulk, bulk delivery is one of the reasons for high throughput.



This uses stream streams to consume payload, and message flow iterators do not stop, like listening messages.

Kafka is efficient or innovative:

Message deletion Management
Typically, message middleware consumes a message and deletes a message, which makes the message very expensive to use. Instead, Kafka uses stateless management to introduce message offsets, message-based SLA application retention policies that are deleted after a certain period of time, so that, according to the official website, the consumer Kafka message is very lightweight: come and go. Sounds like takeout, take and go. Even, because of the introduction of offsets, consumers are free to get arbitrary location messages, including retrieving messages that have been consumed.


2. Kafka uses Linux sendfile to copy files from Linux kernel


3.kafka introduces ZK to manage distributed coordination, HA, fault tolerance. ZK is used to manage Kafaka agent broker, and when Kafka is added or an agent fails, ZK services will inform producers and consumers.
4. Producer performance, message structure optimization size, and bulk delivery



5. Consumption of this performance: message structure optimization and stateless introduction of cheap quantity, no need for why B + Tree index.

Generally speaking, Kafka performance is very prominent, it is usually a substitute for message middleware, if the management Hadoop,stream is the most important. In addition, if you are dealing with Web logs, users use behavioral analysis, or offline processing log is the only choice.


Well, first here, up early to write things, is not reliable, time tight task. I hope you will excuse me, some graphs are borrowed from the network.


@erixhao

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.