Big Data Architecture: Kafka

Source: Internet
Author: User

Kafka is a high-throughput, distributed, publish-Subscribe messaging System that leverages Kafka technology to build large-scale messaging systems on inexpensive PC servers. Kafka features such as message persistence, high throughput, distributed, multi-client support, real-time, etc. for offline and online message consumption

KAKFA Features:

    • Decoupling: The message system inserts an implicit, data-based interface layer in the process.
    • Redundancy: Message Queuing persistence to prevent data loss.
    • Extensibility: Message Queuing decoupling process, easy to expand the processing process.
    • Recoverability: The processing process is invalid and can be resumed after recovery.
    • Order Guarantee: Message Queue guarantee order. Kafka guarantees an orderly message within a partition.
    • Asynchronous communication: Message Queuing allows messages to be queued, and then processed when needed.

Kafka's terminology

Kafka Architecture

Typical Kafka architecture

A typical Kafka cluster contains several producer (which can be a message generated by a Web front-end application or something like events generated by the Internet flume collection of online logs), and several brokers (Kafka support horizontal expansion, the more general broker number, The higher the cluster throughput, several consumer Group, and one zookeeper cluster. Kafka manages cluster configuration and service collaboration through zookeeper. Producer uses push mode to publish messages to Broker,consumer to subscribe to and consume messages from the broker by listening using pull mode.
Multiple brokers work together, producer and consumer deployments are frequently invoked in each business logic, and the three coordinate requests and forwards through zookeeper management. Such a high-performance distributed message publishing and subscription system is complete. There is a detail to note that the process of producer 刡 broker is push, that is, the data is pushed to the broker, and consumer to broker is pull, is through the consumer actively pulled the data, Instead of the broker sends the data to the consumer side actively.

Relationship of producer, consumer, broker and zookeeper

We look at the figure above, we reduce the number of brokers, just have one. Now suppose we follow the deployment:?

Server-1 Broker is actually the Kafka server, because producer and consumer are going to connect to it. Broker is mainly used for storage.

Server-2 is the server side of zookeeper, zookeeper the specific role you can go online to check, here you can imagine, it maintained a table, recorded the IP of the various nodes, ports and other information (in the future, it also saved the relevant information Kafka).

Server-3, 4, 5 what they have in common is the configuration of zkclient, more specifically, it is necessary to configure the zookeeper address before running, the reason is very simple, the connection between the need for zookeeper to distribute.

The relationship between Server-1 and Server-2, they can be placed on a machine, can also be divided into open, zookeeper can also be equipped with clusters. The aim is to prevent a certain station from hanging up.

Simply say the whole system to run the order:

1. Start the Zookeeper server

2. Start the Kafka server

3. Producer if the data is produced, the broker is first found through zookeeper and then the data is stored in the broker

4. Consumer if you want to consume data, you will first find the corresponding broker through zookeeper and then consume it.

Big Data Architecture: Kafka

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.