Kafka Introduction + pseudo-cluster deployment
Kafka is a very popular Message Queuing middleware, which is commonly used for regular Message Queuing, active data analysis of Web sites (PV, traffic, clicks, etc.), log collection (docking Big data storage engine offline analysis).
The whole content comes from the network, the credibility remains to be verified! If you have any questions, please correct them in time. Concept Introduction
In Kafka, Message Queuing is divided into three roles:
producer
, i.e. producer, responsible for generating log data.
broker
, a storage node that is responsible for topic
partition
partitioning, evenly distributed storage partitions.
consumer
, which is the consumer, is responsible for reading broker
the partition in use.
Producer
A producer in a Kafka system that generates data and sends it to the broker for storage. Because the socket connection needs to be maintained with the partition in the broker, the correspondence between the producer and the partition broker needs to be maintained in ZK. The data under the same topic is sent to different partitions in a load-balanced manner.
Broker
The broker can be used as a storage node in Kafka, and the data is assigned to different partitions according to the topic organization, in a load-balanced manner. A topic consists of multiple partitions, each of which can set the number of backups. The partition consists of a leader+ multiple followers, the producer communicates directly with the leader, and leader receives the message, the other followers synchronizes the message. After all the follwers synchronization messages, the message becomes a consumable state.
The topic and partition in broker, partition and producer, the election backup between partition and so on all need the ZK to coordinate.
Consumer
Consumer are consumers in Kafka, usually in the form of groups, and a group contains multiple consumer. Each group corresponds to a topic, the topic within the partition can only correspond to a consumer, that is, if the consumer is a lot of circumstances, there will be some consumers do not consume data, if the consumer is very rare, there will be consumers simultaneously consume multiple partitions of data.
Kafka only guarantees that the consumption of a partition's messages is orderly, and that multiple partitions are not guaranteed to be orderly.
To ensure the reliability of data consumption, Kakka provides several mechanisms for consumption:
- 1 at the most once, that is, after the consumption data, save offset, you can no longer access this data.
- 2 at least once, that is, after consumption data, save offset, if the save error, the next time may also fetch the data
- 3 exactly once, pending review
In Kafka, offset is maintained by consumer (which can actually be done by ZK). Such a mechanism has two benefits,
- One is the ability to consume data according to consumer, to avoid the pressure of producing consumption data;
- The other is the number of data that can be customized for fetch consumption, can read 1 at a time, and can read 100 strips 1 times.
Topic
The subject of the data in Kafka, all operations (such as message storage and reading \ Consumption) are done according to topic.
Partition
Each topic is composed of multiple partitions, and the data inside each partition is guaranteed to be ordered, that is, by time series, append to the tail of the partition. The partition is a fixed size, and when the capacity is insufficient, a new partition is created. Kafka regularly cleans up outdated files for a certain period of time.
This continuous file storage, on the one hand, effectively utilizes the linear access of the disk, on the other hand, reduces the memory pressure.
Zookeeper
In Kafka, the scheduling of many nodes and the allocation of resources depend on the zookeeper to complete.
Such as:
- 1 Broker's registration, save broker's IP and port;
- 2 topic registration, Managing topic partition and distribution in broker
- 3 Broker Load Balancing, speaking topic dynamically assigned to broker, through topic distribution and broker load judgment
- 4 consumers, each partition of the message sent only to a consumer (do not know what the relationship with zookeeper)
- 5 consumer-to-partition correspondence, stored in ZK
- 6 Consumer load balancing, once consumers increase or decrease, will trigger the load balance of consumers
- 7 The Offset,high level of the consumer maintains the offset information by ZK;
Construction of pseudo-cluster environment
Deploying a pseudo-clustered environment, which is a single-node environment, is straightforward. Download the deployment file, unzip it, and run it directly.
Run the command as follows:
# 启动zookeeperbin/zookeeper-server-start.sh config/zookeeper.properties & # 启动kafkabin/kafka-server-start.sh config/server.properties &
If you want to test, you can start the test program:
# 启动生产者测试程序./kafka-console-producer.sh --broker-list localhost:9092 --topic test# 启动消费者测试程序./kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
The content entered in the producer interface can be seen directly in the consumer interface.
Category: Kafka
Preliminary discussion + Pseudo-cluster deployment