Kafka two magic weapon to solve the search efficiency

Source: Internet
Author: User

fragmentation of data files

One of the ways Kafka solves query efficiency is to fragment data files, such as 100 message, whose offset is from 0 to 99. Assume that the data file is divided into 5 segments, the first paragraph is 0-19, the second segment is 20-39, and so on, each segment is placed in a separate data file, and the data file is named with the smallest offset in the paragraph. In this way, when looking for a message with the specified offset, a binary lookup can be used to navigate to which segment of the message.

index a data file

data file segmentation allows you to find a message of offset in a smaller data file, However, this still requires sequential scanning to find the message corresponding to offset. In order to further improve the efficiency of the search, Kafka created an index file for each segmented data file, and the file name is the same as the name of the data file, except that the extension is. Index.

    • Relative offset: Since the data file is segmented, the starting offset for each data file is not 0, and the relative offset indicates the size of this message relative to the smallest offset in the data file to which it belongs. For example, the offset of a data file after a fragment starts at 20, then the relative offset of the message with offset 25 in the index file is 25-20 = 5. Storing relative offset reduces the space occupied by the index file.
    • position, Represents the absolute position of the message in the data file. Just open the file and move the file pointer to this position to read the corresponding message.

Instead of indexing each message in the data file, the index file uses sparse storage to create an index for every byte of data. This prevents the index file from taking up too much space so that the index file can be kept in memory. But the downside is that a message without an index cannot be positioned at once to its location in the data file, which requires a sequential scan, but the range of sequential scans is small.

In Kafka, the implementation class for the index file is Offsetindex, and its class diagram is as follows:

The main methods are:

    • Append method, add a pair of offset and position to the index file, where offset will be converted to relative offset.
    • Lookup, using binary lookup to find the largest offset that is less than or equal to the given offset

Kafka two magic weapon to solve the search efficiency

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.