Kafka is a distributed message-oriented middleware that can be roughly divided into three parts: producer, broker, and consumer. The producer is responsible for generating messages and sending them to Kafka. The broker can be simply understood as every machine in the Kafka cluster, it is responsible for completing the main functions of Message Queue (receiving messages, persistent storage of messages, providing messages for consumer, clearing messages .....); the consumer obtains messages from the broker and performs subsequent operations. Each broker has an ID that is manually configured in the configuration file.
Messages in Kafka belong to topics, which can be simply understood as groups. Messages in a topic are divided into partitions ). It may be hard to understand partition. At least I have never figured out how to determine the number of partition when I read the program. The configuration items num. partitions and topic. Partition. Count. MAP are available in the broker configuration file server. properties. Num. partitions is the default number of partition for each topic of the broker (set to N); topic. Partition. Count. Map sets the number of partition (set to N) for each topic ). The broker creates [0, 1... N-1] n partitions for the topic on this machine based on the number of settings. Therefore, the name of partition can be understood as composed of two parts: brokerid and partitionnum, where partitionnum is a number starting from 0.
The broker organizes the message queues of the machine according to the following principles. First, create a directory for each partition of each topic in the format of topic-partition. The broker stores messages of a topic-partition in segments. The name format of each segment is the offset (offet) of each segment ). The organization structure of the directory is as follows:
Format of each message in Kafka:
Length (4 bytes)
Magicvalue (1 byte)
Attribute (1 byte)
CRC verification code (4 bytes)
Payload (message content)
Magicvalue = 1. Attribute indicates whether the message is compressed and the compression method. CRC indicates the CRC verification code of payload. A message is added to the persistent storage file of Kafka: the message length. The format is as follows:
Here, both bytebuffermessageset and filemessageset are stored in the preceding format.
Note that Kafka compresses messages. Kafka compresses a group of messages as follows. First, serialize each message to a byte array, and then compress the byte array to form a new message. The specific format is as follows:
Note: Offset in filemessageset refers to the starting position of the messageset in the file, which is the specific position in the file.