Kafka Learning One of the Kafka is what is the main application in what scenario?

Source: Internet
Author: User
Tags garbage collection

1, Kafka is what.

Kafka, a distributed publish/subscribe-based messaging system developed by LinkedIn, is written in Scala and is widely used for horizontal scaling and high throughput rates.

2. Create a background

Kafka is a messaging system that serves as the basis for the activity stream of LinkedIn and the Operational Data Processing pipeline (Pipeline). Activity flow data is the most common part of data that almost all sites use to make reports about their site usage. Activity data includes content such as page views, information about the content being viewed, and search conditions. This data is typically handled by writing various activities to a file in the form of a log, and then periodically analyzing the files in a statistical manner. Operational data refers to 3 of the server's performance data (CPU, IO usage, request time, service log, and so on). There are a wide variety of statistical methods for operating data.

3. Basic architecture Diagram

4. Basic Concept Explanation

1) Broker

The Kafka cluster contains one or more servers, which are called broker. The broker side does not maintain the consumption status of the data and improves performance. Direct disk storage, linear read and write, fast: avoids duplication of data between the JVM's memory and system memory, and reduces the consumption of performance-creating objects and garbage collection.

2) Producer

Responsible for publishing messages to Kafka broke

3) Consumer

The message consumer, the client that reads the message to Kafka broker, consumer pulls the data from the broker and processes it.

4) Topic

Each message published to the Kafka Cluster has a category, which is called topic. (Physically different topic messages are stored separately, logically a topic message is saved on one or more brokers but the user only needs to specify the topic of the message to produce or consume data without worrying about where the data is stored)

5) Partition

Parition is a physical concept, and each topic contains one or more partition.

6) Consumer Group

Each consumer belongs to a specific consumer group (the group name can be specified for each consumer, and the default group if the group name is not specified)

7) Topic & Partition

Topic can logically be thought of as a queue, and each consumption must specify its topic, which can be simply understood to indicate which queue to put the message in. In order to make the Kafka throughput can be linearly improved, the topic is physically divided into one or more partition, each partition in the physical corresponding to a folder, the folder stores all messages and index files of this partition. If you create Topic1 and Topic2 two topic, with 13 and 19 partitions respectively, a total of 32 folders will be generated on the entire cluster (a total of 8 nodes are used in this article, where Topic1 and Topic2 replication-factor are 1).

5. Applicable Scenarios

1, Messaging

For some conventional messaging systems, Kafka is a good choice; partitons/replication and fault tolerance can make the Kafka have good scalability and performance advantages. But so far, we should be aware that Kafka does not provide "transactional "" Message transmission guarantee (message acknowledgement mechanism) "message packet" and other enterprise-class features; Kafka can only be used as a "regular" message system, to some extent, has not ensured that the message is sent and received absolutely reliable (for example, the message resend, message sent lost, etc.)

2, Website activity Tracking

Kafka can be the best tool for "Site activity tracking" and can send information such as Web page/user actions to Kafka. And real-time monitoring, or offline statistical analysis, etc.

3, Metrics

Kafka are typically used for operational monitoring data. This includes aggregated statistics from distributed applications that are used to produce centralized operational data feeds.

4. Log Aggregation

The Kafka feature determines that it is well suited as a "log collection center", application can send the operation log "bulk" "asynchronously" to the Kafka cluster instead of being stored locally or in db; Kafka can submit messages in batches/compressed messages, etc. For the producer end, it is almost impossible to feel the cost of performance. At this point consumer can make other systematic storage and analysis systems such as Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.