650) this.width=650; "Src=" http://dl2.iteye.com/upload/attachment/0117/7226/ E9d40ea7-3982-3e47-8856-51eae85c41b3.jpg "title=" click to view original size picture "class=" Magplus "width=" "height=" 131 "style=" border : 0px;float:left; "/>
Apache Top Project Introduction Series-1, we start with Kafka. Why Popular + name Cool.
Kafka official website is a relatively simple, direct visit to the site, "Kafka is a high-throughput distributed messaging system." Kafka initially started LinkedIn as the foundation for LinkedIn to manage the pipline of activity streams (PV, user behavior Analysis, search) and operational data processing.
Because of its distributed and high throughput is widely used, such as with Cloudera, Hadoop, Storm, Spark etc.
Kafka first, as a message system, provides basic functions such as decoupling, sequencing, asynchrony, and so on. At the same time, high-quality design concept to support higher throughput, to provide O (1) time responsible for durability, data level of more than TB/PB, support offline and real-time processing, that is, with the Hadoop,storm docking, support horizontal scale out.
Architecture diagram:
650) this.width=650; "Src=" http://dl2.iteye.com/upload/attachment/0117/7228/ 112026de-01d4-30c7-8a85-61cb4a7e89ac.png "title=" click to view original size picture "class=" Magplus "width=" "height=" 329 "style=" border : 0px; "/>
As can be seen, Kafka is a distributed architecture design (of course DT era, does not support horizontal scale out cannot survive), the former segment producer concurrent (support batch) push messages to Kafka specific topic Cluster Server broker, Each topic also contains multiple partition to facilitate horizontal scaling, and the consumer consumer through consumer group to the broker server pull to get messages. Kafka manages cluster configuration through ZK, elects leader, and rebalance. The message pattern is push/pull.
We're going to build a Kafka Cluster service:
650) this.width=650; "Src=" http://dl2.iteye.com/upload/attachment/0117/7230/ 0444b5ac-0ff8-3740-a2b8-066887da03cd.jpg "title=" click to view original size picture "class=" Magplus "width=" "height=" 138 "style=" border : 0px; "/>
Send via ZK, consumer message:
650) this.width=650; "Src=" http://dl2.iteye.com/upload/attachment/0117/7232/ B860c8ff-ce63-378e-b0a3-2317d4fc829e.jpg "title=" click to view original size picture "class=" Magplus "width=" "height=" style= : 0px; "/>
Use Java to produce/consume messages:
650) this.width=650; "Src=" http://dl2.iteye.com/upload/attachment/0117/7234/ Bcbc8a5f-d05f-3b11-80ca-51ac78c50b11.jpg "style=" border:0px; "/>
More straightforward, here note can be sent in bulk message, not all message middleware can be sent in bulk, bulk send is one of the reasons for high throughput.
Here the stream stream is used to consume the payload, and the message flow iterator does not stop, like a listener message.
Kafka's efficiency or its innovative point:
-
message removal management typically message middleware consumes a message, deleting a message, which makes the message very expensive to use. While Kafka uses stateless management to introduce message offsets, message time-based SLAs apply retention policies, and messages are deleted after a certain amount of time, so according to the official website, consuming Kafka messages is very lightweight: come and go. Sounds like takeout, take and go. Even with the introduction of offsets, consumers are free to get arbitrary location messages, including retrieving messages that have already been consumed.
650) this.width=650; "Src=" http://dl2.iteye.com/upload/attachment/0117/7238/ 81ed9c88-16f3-3c3b-9f17-8005930b776a.png "style=" border:0px; "/>
2. Kafka using Linux sendfile to copy files from Linux kernel
650) this.width=650; "Src=" http://dl2.iteye.com/upload/attachment/0117/7240/ 17ab56ed-4304-3ece-9362-326d39237b99.png "style=" border:0px; "/>
3.kafka introduces ZK, manages distributed coordination, HA, fault tolerance. ZK is used to manage Kafaka agent broker, when Kafka new or an agent fails, ZK service will inform producers and consumers.
650) this.width=650; "Src=" http://dl2.iteye.com/upload/attachment/0117/7242/ 29f2fff3-9287-3b7b-ae0c-cd290c480246.png "style=" border:0px; "/>
4. Producer performance, message structure optimization size and bulk delivery
650) this.width=650; "Src=" http://dl2.iteye.com/upload/attachment/0117/7244/ 355e3ba1-dae8-3e56-b779-e22eb5c590fe.png "style=" border:0px; "/>
5. Consumption This performance: message structure optimization and stateless introduction of inexpensive, no need why B + Tree index.
650) this.width=650; "Src=" http://dl2.iteye.com/upload/attachment/0117/7246/ 6ef00c74-ceb6-3415-aa73-822b7e94d411.png "style=" border:0px; "/>
In general, Kafka performance is outstanding, it is often a substitute for message middleware, if the management of Hadoop,stream is the most important. In addition, if the site log processing, users use behavioral analysis, or offline processing log, etc. are the perfect choice.
Well, first here, up early to write something, sure enough, time tight task heavy ah. I hope you all forgive, some pictures borrowed from the network.
Public number: Technical Geek techbooster
This article is from the "Erixhao" blog, make sure to keep this source http://erixhao.blog.51cto.com/10238307/1784007
Apache Top Project Introduction 2-kafka