Kafka Foundation (i)

Source: Internet
Author: User

1. Overview
After one months of observation, business in the integration of Kafka, all aspects are still stable, here is going to take time to share some of the Kafka in the actual scene of some of the use of experience. This blog intends to first to everyone into a door, so that we have a preliminary understanding of Kafka, know what Kafka is doing, the following is the contents of this blog directory:

Kafka background
Kafka Application Scenarios
Kafka Architecture Principles
Below start today's blog share content.

2.Kafka background
Kafka It was essentially a messaging system developed by a three-person group that was founded at the time from LinkedIn, and they developed the Apache Kafka Real-time information queuing technology, which is dedicated to providing real-time data processing services solutions for companies from all walks of life. Kafka is the central nervous system of LinkedIn, which manages the aggregation of various applications, which are processed and distributed to other places. Unlike the traditional enterprise information queuing system, Kafka is a near-real-time processing of all data flowing through a company that has been serviced by LinkedIn, Netflix, Uber and Verizon, and has established a real-time information processing platform for this purpose.

Pipelining data is the most common part of data used by all sites to make statements about their site usage, including PV, browsing content information, and search history. This data is usually first in the form of log files, and then periodically to the log files for statistical analysis processing, and then obtain the required KPI indicator results.

3.Kafka Application Scenarios
When we come into contact with a new technology or language, we have to understand the application of this technology (or language), to understand what it can do, who it is to serve, and to illustrate it in a diagram as shown below:

First of all, Kafka can be applied to the message system, for example, the more popular message push, these messages push the system's message source, you can use Kafka as the core of the system to complete the production of messages and message consumption. Then is the site of the whereabouts, we can the Enterprise Portal, the user's operation records and other information sent to the Kafka, according to the actual business needs, can be real-time monitoring, or do offline processing. Finally, one is log collection, similar to the Flume suite, such as the log collection system, but the Kafka design architecture uses push/pull, suitable for heterogeneous clusters, Kafka can submit messages in bulk, for producer, in terms of performance is basically no consumption, On the consumer side, we can use a distributed file storage system such as HDFS for storage.

4.Kafka Architecture Principles
Kafka is designed to be a unified information collection platform, can collect feedback information in real-time, and has good fault-tolerant ability. Our most intuitive feeling in Kafka is its consumers and producers, as shown in the following figure:

4.1Producer and Consumer

Here Kafka the preservation of messages is categorized according to topic, consisting of the message producer (Producer) and the message consumer (Consumer), and each server is called a broker. For Kafka clusters, both producer and consumer rely on zookeeper to ensure data consistency.

4.2Topic
After each message is delivered to the Kafka cluster, the message is represented by a type, which is called a topic, and the messages of different topic are stored separately. As shown in the following illustration:

A topic is categorized as a message, each topic can be split into multiple partition, in each message, its position in the file is called offset, which is used to mark a unique message. In Kafka, after the message is consumed, the message is still retained for a certain amount of time after deletion, such as in the configuration information, the file information is retained for 7 days, 7 days later, regardless of whether the message in Kafka is consumed, it will be deleted, in order to free up disk space, reduce the disk IO consumption.

In Kafka, multiple partitions of a topic are distributed across multiple servers in the Kafka cluster and each server is responsible for reading and writing messages in the partition. In addition, Kafka can configure the number of partitions that need to be backed up to increase the available rows. With the advent of ZK coordination, each partition has a server for the leader State, service external response (such as read and write operations), if the leader outage, will be the other follower to elect a new leader to ensure high availability of the cluster.

5. Summary
In general, the introduction of Kafka related background, overview and principles, these more biased theory, the concept of strong, need to be serious to understand, pondering, here can be roughly familiar with, the heart has a contour, the following will be introduced Kafka actual combat usage, Let everyone in the actual business and coding to understand these principles of Kafka.

6. Concluding remarks
This blog is to share with you here, if you study in the process of learning what is the problem, you can add groups to discuss or send e-mail to me, I will do my best to answer for you, with June mutual encouragement.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.