1. A topic contains multiple partition and the message is stored in the partition, and offset can be seen as the ID of the message in partition through which Kafka can locate the specific message.
2. A partition is composed of more than one segment (fragment), the producer sends a message to Topic,broker after receiving the message and adds the message to the last segment in partition. When segment reaches a certain size, Broker creates a new segment.
3. Log.dirs=/opt/kafka_data designated storage directory for Kafka data in Server.properties file
The hello-0 in the figure is topic for hello,partition 0, one partition a folder.
Partition there will be a lot of segment, each fragment filename with the smallest offset in this fragment named, including index and log files, index file log is the data file, two files with the same name is the smallest offset.
Suppose you look for a offset=1111 message in hello-0, and the directory has [001000.index,001000.log] and [002000.index,002000.log] Then you can first determine that the 001000.log file should exist, because 1000 is the smallest offset in this file, and then read the index file 001000.index to memory, the index file is a sparse index, that is, every byte of data to establish an index such as 0~ 100,101~200 and offset=1111 (1111-1000=111) can now be determined in the 101~200 interval so that it can be positioned to the approximate location of the message and thus be more efficient than the full read data.