Address: http://blog.csdn.net/honglei915/article/details/37760631
Message format
A Message consists of a fixed-length header and a variable-length byte array. The header contains a version and a CRC32 verification code.
/*** The format of a message with n Bytes is as follows: ** assume that the version is 0 ** 1. 1-Byte "magic" Mark ** 2. 4-byte CRC32 Verification Code ** 3. n-5 bytes details ** assume the version is 1 ** 1. one-Byte "magic" Mark ** the number of bytes in 2.1 agrees to mark additional information, for example, whether it is compressed, ** 3.4-byte CRC32 Verification Code ** of the decoding type. n-6 bytes of detailed information **/
Logs
A topic named "my_topic" with two partitions. Its logs are composed of two directories,My_topic_0 and my_topic_1 are detailed data files in each directory. Each data file is a series of log entities. Each log entity has a four-byte integer n to mark the message length, followed by a message of n Bytes.Each message can be labeled with a 64-bit integer offset, indicating the start position of the message in the message stream sent to this partition. The name of each log file is the offset of the first log of this file. therefore, the name of the first log file is 00000000000. kafka. therefore, the word difference between each adjacent two files is a number S, and S is almost the same as the maximum size of the log file specified in the configuration file.
The message format is maintained by a unified interface, so messages can be seamlessly transmitted between the producer, broker, and consumer. The message format stored on the hard disk is as follows:
Message length: 4 bytes (value: 1 + 4 + n) version: 1 bytecrc check code: 4 bytes detailed message: n Bytes
Write operation
Messages are constantly appended to the end of the last log. When the log size reaches a specified value, a new file is generated. There are two shards for the write operation. One specifies that the data must be refreshed to the hard disk when the number of messages reaches this value, and the other specifies the time interval between refreshing to the hard disk, this guarantees data persistence. When the system crashes, only a certain amount of messages or messages within a certain period of time will be lost.
Read operations
The read operation requires two shards: A 64-bit offset and the maximum read volume of one second byte. S is usually larger than the size of a single message, but in some cases, S is smaller than the size of a single message. In this case, the read operation will retries continuously, and the read volume will be doubled each retry until a complete message is read. You can configure the maximum value for a single message so that the server rejects messages whose size exceeds this value. You can also specify a maximum Read Attempt limit for the client to avoid infinite retries to read a complete message.
In actual read operations, you must first locate the log file where the data is located, and then calculate the offset in the log based on the offset (the front offset is the offset of the entire partition ), then read at the offset position. The location operation is completed by the binary search method. Kafka maintains the offset range for each file in the memory.
The format of the result sent to the consumer is as follows:
MessageSetSend (fetch result)total length : 4 byteserror code : 2 bytesmessage 1 : x bytes...message n : x bytes
MultiMessageSetSend (multiFetch result)total length : 4 byteserror code : 2 bytesmessageSetSend 1...messageSetSend n
Delete
The log manager agrees to customize the deletion policy. The current policy is to delete the logs whose modification time is earlier than N days (delete by time). You can also use another policy: retain the last n GB of data (delete by size ). In order to avoid blocking read operations during deletion, the producer uses the copy-on-write form, the binary search function of the read operation is actually performed on a static snapshot copy, which is similar to the copyonwritearraylist of Java.
Reliability Assurance
The log file has a configurable quota of M. messages that exceed the cache volume will be forcibly refreshed to the hard disk. A log correction thread cyclically checks the messages in the latest log file to confirm that each message is valid. The legal standard is: the size and maximum offset of all files are smaller than the size of the log file, and the CRC32 verification code of the message is consistent with the verification code stored in the message entity. If an Invalid Message is found at an offset, the content from this offset to the next legal offset will be removed.
There are two situations that must be considered: 1. When a crash occurs, some data blocks cannot be written. 2. Some blank data blocks are written. Another reason is that each file has an inode (inode refers to a Data Structure in many "UNIX-like file systems. Each inode stores a file system object in the file system, including files, folders, sizes, device files, sockets, pipelines, and so on), but it cannot guarantee the order of updating inode and writing data, when the size information stored by inode is updated but data is written, a crash occurs, leading to a blank data block. The CRC check code can check these blocks and remove them. Of course, data blocks that are not written due to crashes will be lost.