International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Kafka two magic weapon to solve the search efficiency

Last Update:2015-05-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

fragmentation of data files

One of the ways Kafka solves query efficiency is to fragment data files, such as 100 message, whose offset is from 0 to 99. Assume that the data file is divided into 5 segments, the first paragraph is 0-19, the second segment is 20-39, and so on, each segment is placed in a separate data file, and the data file is named with the smallest offset in the paragraph. In this way, when looking for a message with the specified offset, a binary lookup can be used to navigate to which segment of the message.

index a data file

data file segmentation allows you to find a message of offset in a smaller data file, However, this still requires sequential scanning to find the message corresponding to offset. In order to further improve the efficiency of the search, Kafka created an index file for each segmented data file, and the file name is the same as the name of the data file, except that the extension is. Index.

Relative offset: Since the data file is segmented, the starting offset for each data file is not 0, and the relative offset indicates the size of this message relative to the smallest offset in the data file to which it belongs. For example, the offset of a data file after a fragment starts at 20, then the relative offset of the message with offset 25 in the index file is 25-20 = 5. Storing relative offset reduces the space occupied by the index file.
position, Represents the absolute position of the message in the data file. Just open the file and move the file pointer to this position to read the corresponding message.

Instead of indexing each message in the data file, the index file uses sparse storage to create an index for every byte of data. This prevents the index file from taking up too much space so that the index file can be kept in memory. But the downside is that a message without an index cannot be positioned at once to its location in the data file, which requires a sequential scan, but the range of sequential scans is small.

In Kafka, the implementation class for the index file is Offsetindex, and its class diagram is as follows:

The main methods are:

Append method, add a pair of offset and position to the index file, where offset will be converted to relative offset.
Lookup, using binary lookup to find the largest offset that is less than or equal to the given offset

Kafka two magic weapon to solve the search efficiency

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

OpenGL Series Tutorial Eight: OpenGL vertex buffer Object (VBO) 07-26

Methods for generating various waveform files Vcd,vpd,shm,fsdb 02-11

Mac Ping:sendto:Host is down Ping does not pass other people'... 09-01

Solution to the problem that WordPress cannot be opened after... 12-05

(SOLR is successfully installed on the office machine accordi... 12-07

Webmaster resources (site creation required) 12-07

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Kafka two magic weapon to solve the search efficiency

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support