Kafka Foundation (i)

Source: Internet
Author: User

1. Overview
After one months of observation, business in the integration of Kafka, all aspects are still stable, here is going to take time to share some Kafka in the actual scene of some of the use of experience. This blog intends to first give you a door, so that we have a preliminary understanding of Kafka, know what Kafka is doing, the following is the contents of this blog:

Kafka background
Kafka Application Scenario
Kafka Architecture principle
Let's start today's blog share content.

2.Kafka background
Kafka It is essentially a messaging system, developed by the three-person team from LinkedIn at the time, who developed the Apache Kafka real-time information queuing technology dedicated to providing real-time data processing services solutions for companies from all walks of life. Kafka is the central nervous system of LinkedIn, which manages the convergence of applications, which are processed and then distributed elsewhere. Unlike the traditional enterprise information queuing system, Kafka deals with all the data that flows through a company in a near-real-time manner, and is now serving LinkedIn, Netflix, Uber and Verizon, and has built a real-time information processing platform for this purpose.

Running data is the most common part of the data that all sites use to make statements about their site usage, including PV, browsing content information and search records. These data are usually in the form of a log file, and then have a cycle of statistical analysis of these log files, and then obtain the required KPI metrics results.

3.Kafka Application Scenario
When we are in touch with a new technology or language, we have to understand the application of this technology (or language), also say to understand what it can do, who is the object of the service, the following diagram to illustrate, as shown in the following figure:

First of all, Kafka can be applied to message systems, such as the more popular message push, these messages push the system's message source, you can use Kafka as the core of the system to complete the production of messages and the consumption of messages. Then is the whereabouts of the site, we can the Enterprise Portal, the user's operation records and other information sent to the Kafka, according to the actual business needs, can be real-time monitoring, or do off-line processing. Finally, one is the log collection, similar to the flume suite such as the Log collection system, but the Kafka design architecture is push/pull, suitable for heterogeneous clusters, Kafka can batch submissions, for the producer, in terms of performance is basically no consumption, On the consumer side, we can store it using a distributed file storage system such as HDFs.

4.Kafka Architecture principle
Kafka's design is to do a unified information collection platform, real-time collection of feedback information, and has a good fault-tolerant ability. Our most intuitive feeling in Kafka is its consumers and producers, as shown in the following illustration:

4.1Producer and Consumer

Here Kafka the preservation of messages is categorized according to topic, which consists of the message producer (Producer) and the message consumer (Consumer), and each server is called a broker. For Kafka clusters, both producer and consumer rely on zookeeper to ensure data consistency.

4.2Topic
After each message is delivered to the Kafka cluster, the message is made up of a type, which is called a topic, and different topic messages are stored separately. As shown in the following illustration:

A topic is categorized as a message, each topic can be split into multiple partition, and in each message its position in the file is called offset, which marks the only message. In the Kafka, after the message is consumed, the message will still be deleted after a certain period of time, such as in the configuration information, the file information retained 7 days, then 7 days, regardless of whether the message in Kafka is consumed, will be deleted, so as to free up disk space, reduce disk IO consumption.

In Kafka, multiple partitions of a topic are distributed across multiple servers in the Kafka cluster, and each server is responsible for the read and write operations of the messages in the partition. In addition, Kafka can also configure the number of partitions that need to be backed up in order to increase the available rows. Due to the arrival of ZK to coordinate, each partition has a server for the leader State, the service external response (such as read and write operations), if the leader downtime, the other follower to elect a new leader to ensure the high availability of the cluster.

5. Summary
Overall, the introduction of Kafka related background, overview and principles, these more biased theory, the concept of strong, we need to seriously understand, pondering, here can be roughly familiar with, the heart has a contour, will be introduced in the Kafka of the actual combat usage, Let everyone in the actual business and coding to understand the Kafka of these principles.

6. Concluding remarks
This blog and everyone to share here, if you are studying in the process of what the problem, you can add group for discussion or send mail to me, I will do my best to answer for you, and June mutual encouragement.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.