Introduction to Kafka + deployment of pseudo Clusters
Kafka is a popular message queue middleware. It is often used for analyzing common message queues and website activity data (PV, traffic, clicks, etc), log collection (connected to the big data storage engine for offline analysis ).
All content comes from the network, and the credibility needs to be verified! If you have any questions, please correct them in time. Concepts
In Kafka, message queues are divided into three roles:
producerThe producer is responsible for generating log data.
broker, Storage node, responsible for followingtopicInpartitionPartitions: evenly distributed storage partitions.
consumerThat is, the consumer is responsible for reading and usingbroker.
Producer
The producer in the Kafka system is used to generate data and send it to the broker for storage. To maintain socket connection with the partition in the broker, you must maintain the ing between the producer and the partition broker in zk. Data under the same topic is sent to different partitions in a load balancing mode.
Broker
The Broker can be used as a storage node in Kafka. Data is organized by topic and distributed to different partitions in a certain load balancing mode. A Topic consists of multiple partitions. You can set the number of backups for each partition. A partition consists of one leader and multiple followers. The producer communicates with the leader directly. After the leader receives the message, other followers synchronize the message. After all follwers synchronize a message, the message becomes consumable.
In the Broker, information such as topics and partitions, partitions and producers, and election backups between partitions must be coordinated by ZK.
Consumer
Consumer is a Consumer in Kafka and usually exists as a Group. A Group contains multiple consumers. Each group corresponds to one Topic, and the partitions in the Topic can only correspond to one consumer. That is, if there are many consumers, some consumers cannot consume data. If there are few consumers, A consumer consumes data from multiple shards at the same time.
Kafka only ensures that messages in one partition are consumed in an orderly manner, and that messages in multiple partitions are not ordered.
To ensure the reliability of data consumption, Kakka provides several consumption mechanisms:
- 1 at most once, that is, after the data is consumed, the offset is saved and the data cannot be obtained again.
- 2 at least once, that is, after the data is consumed, the offset is saved. If an error occurs, the data may be retrieved next time.
- 3 exactly once, to be viewed
In Kafka, offset is maintained by consumer (it can be done by zk ). This mechanism has two advantages,
- One is to consume data based on the consumer capability to avoid the pressure on consumption data;
- The other is that you can customize the number of data consumed by fetch. You can read one data entry at a time or 100 data entries at a time.
Topic
All operations (such as message storage, reading, and consumption) on the Data topic in Kafka are completed based on the topic.
Partition
Each Topic is composed of multiple partitions. The data in each partition ensures the order, that is, by time sequence, append to the end of the partition. A partition has a fixed size. When the capacity is insufficient, a new partition is created. Kafka regularly cleans expired files within a certain period of time.
This continuous file storage, on the one hand, effectively utilizes the linear access to the disk; on the other hand, reduces the memory pressure.
Zookeeper
In Kafka, the scheduling and resource allocation of many nodes depend on zookeeper.
For example:
- 1. register the Broker and save the Broker's IP address and port;
- 2. Register a Topic to manage the partitions and distribution of topics in the broker.
- 3. Server Load balancer for brokers: dynamically distributes topics to brokers and determines the distribution of topics and the load of brokers.
- 4. Consumers. Messages in each partition are sent to only one consumer (I do not know what the relationship is with zookeeper)
- 5. The ing between consumers and shards is stored in zk.
- 6. Consumer load balancing: Once a consumer increases or decreases, the load balancing of the consumer will be triggered.
- 7. The offset information of the consumer is maintained by zk in the High level, and the offset information is maintained by the consumer in the Low Level.
Build a pseudo Cluster Environment
Deploy the pseudo cluster environment, that is, the single-node environment is very simple. Download the deployment file, decompress it, and run it directly.
Run the following command:
# Start zookeeperbin/zookeeper-server-start.sh config/zookeeper. properties & # Start kafkabin/kafka-server-start.sh config/server. properties &
If you want to test, you can start the test program:
# Start producer test program./kafka-console-producer.sh -- broker-list localhost: 9092 -- topic test # Start consumer test program./kafka-console-consumer.sh -- zookeeper localhost: 2181 -- topic test -- from-beginning
The content entered on the producer interface can be seen directly on the consumer interface.
Kafka architecture design of the distributed publish/subscribe message system
Apache Kafka code example
Apache Kafka tutorial notes
Principles and features of Apache kafka (0.8 V)
Kafka deployment and code instance
Introduction to Kafka and establishment of Cluster Environment
This article permanently updates the link address: