Kafka is a high-throughput, distributed, publish-Subscribe messaging System that leverages Kafka technology to build large-scale messaging systems on inexpensive PC servers. Kafka features such as message persistence, high throughput, distributed, multi-client support, real-time, etc. for offline and online message consumption
KAKFA Features:
- Decoupling: The message system inserts an implicit, data-based interface layer in the process.
- Redundancy: Message Queuing persistence to prevent data loss.
- Extensibility: Message Queuing decoupling process, easy to expand the processing process.
- Recoverability: The processing process is invalid and can be resumed after recovery.
- Order Guarantee: Message Queue guarantee order. Kafka guarantees an orderly message within a partition.
- Asynchronous communication: Message Queuing allows messages to be queued, and then processed when needed.
Kafka's terminology
Kafka Architecture
Typical Kafka architecture
A typical Kafka cluster contains several producer (which can be a message generated by a Web front-end application or something like events generated by the Internet flume collection of online logs), and several brokers (Kafka support horizontal expansion, the more general broker number, The higher the cluster throughput, several consumer Group, and one zookeeper cluster. Kafka manages cluster configuration and service collaboration through zookeeper. Producer uses push mode to publish messages to Broker,consumer to subscribe to and consume messages from the broker by listening using pull mode.
Multiple brokers work together, producer and consumer deployments are frequently invoked in each business logic, and the three coordinate requests and forwards through zookeeper management. Such a high-performance distributed message publishing and subscription system is complete. There is a detail to note that the process of producer 刡 broker is push, that is, the data is pushed to the broker, and consumer to broker is pull, is through the consumer actively pulled the data, Instead of the broker sends the data to the consumer side actively.
Relationship of producer, consumer, broker and zookeeper
We look at the figure above, we reduce the number of brokers, just have one. Now suppose we follow the deployment:?
Server-1 Broker is actually the Kafka server, because producer and consumer are going to connect to it. Broker is mainly used for storage.
Server-2 is the server side of zookeeper, zookeeper the specific role you can go online to check, here you can imagine, it maintained a table, recorded the IP of the various nodes, ports and other information (in the future, it also saved the relevant information Kafka).
Server-3, 4, 5 what they have in common is the configuration of zkclient, more specifically, it is necessary to configure the zookeeper address before running, the reason is very simple, the connection between the need for zookeeper to distribute.
The relationship between Server-1 and Server-2, they can be placed on a machine, can also be divided into open, zookeeper can also be equipped with clusters. The aim is to prevent a certain station from hanging up.
Simply say the whole system to run the order:
1. Start the Zookeeper server
2. Start the Kafka server
3. Producer if the data is produced, the broker is first found through zookeeper and then the data is stored in the broker
4. Consumer if you want to consume data, you will first find the corresponding broker through zookeeper and then consume it.
Big Data Architecture: Kafka